Quick Start
This guide introduces the basics of querying data and building a semantic model with Accio. By the end of this tutorial, you will understand how to use Accio to model to build data models, build re-usable metric and do analysis on your data intuitively.
It’s easy to run Accio locally on your laptop. Follow the setup instructions to install Accio.
Start a Accio Demo Project
Start the Accio demo project using Accio CLI tool with the fellowing command:
accio hello
This command will:
- Initialize Accio in the current directory with an example template.
- Build the project and generate a manifest JSON file for Accio server.
- Start a demo Postgres database for you to play with.
Once the build process completes successfully, Accio will automatically serve the generated JSON output file. In the demo database, Accio has prepared a tiny TPCH dataset.
Now, we have a demo project to exploring the power of Accio. We can use the following command to start a command line interface to query the demo data.
accio cli
Try to solve some simple questions by Accio SQL
The top 10 popular supplier in 1996
This question solved by Accio SQL will be:
SELECT
partsupp.supplier.name,
count(*)
FROM Lineitem
WHERE orders.orderdate BETWEEN DATE '1996/01/01' AND DATE '1996/12/31'
GROUP BY 1
ORDER BY 2 desc
LIMIT 10
- Result
- Traditional SQL
- The Accio-transformed SQL
name | count
---------------------+-------
Supplier#000000075 | 120
Supplier#000000036 | 118
Supplier#000000021 | 115
Supplier#000000023 | 113
Supplier#000000032 | 113
Supplier#000000096 | 112
Supplier#000000090 | 110
Supplier#000000061 | 110
Supplier#000000030 | 107
Supplier#000000076 | 107
(10 rows)
SELECT
s_name,
count(*)
FROM
lineitem, partsupp, supplier, orders
WHERE
l_orderkey = o_orderkey
AND l_partkey = ps_partkey
AND l_suppkey = ps_suppkey
AND ps_suppkey = s_suppkey
GROUP BY 1
ORDER BY 2 desc
LIMIT 10
WITH
Lineitem AS (
SELECT
l_orderkey "orderkey"
, l_linenumber "linenumber"
, concat(l_orderkey, l_linenumber) "orderkey_linenumber"
, l_partkey "partkey"
, l_suppkey "suppkey"
, concat(l_partkey, l_suppkey) "partkey_suppkey"
, l_quantity "quantity"
, l_extendedprice "extendedprice"
, l_discount "discount"
, l_tax "tax"
, l_returnflag "returnflag"
, l_linestatus "linestatus"
, l_shipdate "shipdate"
, l_commitdate "commitdate"
, l_receiptdate "receiptdate"
, l_shipinstruct "shipinstruct"
, l_shipmode "shipmode"
, l_comment "comment"
, 'relationship<OrdersLineitem>' "orders"
, 'relationship<PartSuppLineitem>' "partsupp"
FROM
(
SELECT *
FROM
lineitem
) t
)
, Orders AS (
SELECT
o_orderkey "orderkey"
, o_custkey "custkey"
, o_orderstatus "orderstatus"
, o_totalprice "totalprice"
, o_orderdate "orderdate"
, o_orderpriority "orderpriority"
, o_clerk "clerk"
, o_shippriority "shippriority"
, o_comment "comment"
, 'relationship<OrdersCustomer>' "customer"
, 'relationship<OrdersLineitem>' "lineitems"
FROM
(
SELECT *
FROM
orders
) t
)
, PartSupp AS (
SELECT
ps_partkey "partkey"
, ps_suppkey "suppkey"
, concat(ps_partkey, ps_suppkey) "partkey_suppkey"
, ps_availqty "availqty"
, ps_supplycost "supplycost"
, ps_comment "comment"
, 'relationship<PartSuppPart>' "part"
, 'relationship<PartSuppSupplier>' "supplier"
FROM
(
SELECT *
FROM
partsupp
) t
)
, Supplier AS (
SELECT
s_suppkey "suppkey"
, s_name "name"
, s_address "address"
, s_nationkey "nationkey"
, s_phone "phone"
, s_acctbal "acctbal"
, s_comment "comment"
, 'relationship<PartSuppSupplier>' "partsupps"
, 'relationship<NationSupplier>' "nation"
FROM
(
SELECT *
FROM
supplier
) t
)
, rs_1usi104oox (partkey_suppkey, partkey, suppkey, availqty, supplycost, comment, part, supplier, bk) AS (
SELECT DISTINCT
t.partkey_suppkey
, t.partkey
, t.suppkey
, t.availqty
, t.supplycost
, t.comment
, t.part
, t.supplier
, s.orderkey_linenumber bk
FROM
(Lineitem s
LEFT JOIN PartSupp t ON (s.partkey_suppkey = t.partkey_suppkey))
)
, rs_1n0rqppeyi (suppkey, name, address, nationkey, phone, acctbal, comment, partsupps, nation, bk) AS (
SELECT DISTINCT
t.suppkey
, t.name
, t.address
, t.nationkey
, t.phone
, t.acctbal
, t.comment
, t.partsupps
, t.nation
, s.bk bk
FROM
(rs_1usi104oox s
LEFT JOIN Supplier t ON (s.suppkey = t.suppkey))
)
, rs_1en9jip1hj (orderkey, custkey, orderstatus, totalprice, orderdate, orderpriority, clerk, shippriority, comment, customer, lineitems, bk) AS (
SELECT DISTINCT
t.orderkey
, t.custkey
, t.orderstatus
, t.totalprice
, t.orderdate
, t.orderpriority
, t.clerk
, t.shippriority
, t.comment
, t.customer
, t.lineitems
, s.orderkey_linenumber bk
FROM
(Lineitem s
LEFT JOIN Orders t ON (s.orderkey = t.orderkey))
)
SELECT
rs_1n0rqppeyi.name
, count(*) count
FROM
((Lineitem
LEFT JOIN rs_1n0rqppeyi ON (Lineitem.orderkey_linenumber = rs_1n0rqppeyi.bk))
LEFT JOIN rs_1en9jip1hj ON (Lineitem.orderkey_linenumber = rs_1en9jip1hj.bk))
WHERE (rs_1en9jip1hj.orderdate BETWEEN DATE '1996/01/01' AND DATE '1996/12/31')
GROUP BY 1
ORDER BY 2 DESC
LIMIT 10
This code is very similar to traditional SQL, but there are some differences in partsupp.supplier.name
and orders.orderdate
.
partsupp
is a relation column inLineitem
model andsupplier
is a relation column in modelPartsupp
. These relation columns allow us to access data from the associated tables. Specifically, the relations between modelLineitem
andPartsupp
is aMANY_TO_ONE
relation, indicating that each row inLineitem
corresponds to only one record inPartsupp
. Columnsupplier
also establishes the connection between modelPartsupp
andSupplier
.orders
is also a relation column inLineitem
.
Though the relation columns, you just grab all the related records using the relation column from other models and then do the usual aggregation and group-by stuff.
How about the consumption amount of each customer in the Asia region?
This question solved by Accio SQL will be:
SELECT
name,
array_sum(transform(orders, o -> o.totalprice)) as consumption
FROM Customer
WHERE nation.region.name = 'ASIA'
- Result
- Traditional SQL
- The Accio-transformed SQL
name | consumption
---------------------+--------------------
Customer#000000007 | 3922020.98
Customer#000000019 | 3686492.93
Customer#000000025 | 2845869.53
Customer#000000028 | 3075314.4600000004
Customer#000000037 | 2428169.69
Customer#000000038 | 1336731.1199999999
SELECT
c_name,
sum(o_totalprice) as consumption
FROM
customer, orders, nation, region
WHERE
c_custkey = o_custkey,
AND c_nationkey = n_nationkey,
AND n_regionkey = r_regionkey
AND r_name = 'ASIA'
GROUP 1
WITH
Customer AS (
SELECT
c_custkey "custkey"
, c_name "name"
, c_address "address"
, c_nationkey "nationkey"
, c_phone "phone"
, c_acctbal "acctbal"
, c_mktsegment "mktsegment"
, c_comment "comment"
, 'relationship<OrdersCustomer>' "orders"
, 'relationship<NationCustomer>' "nation"
FROM
(
SELECT *
FROM
customer
) t
)
, Nation AS (
SELECT
n_nationkey "nationkey"
, n_name "name"
, n_regionkey "regionkey"
, n_comment "comment"
, 'relationship<NationRegion>' "region"
, 'relationship<NationSupplier>' "supplier"
FROM
(
SELECT *
FROM
nation
) t
)
, Orders AS (
SELECT
o_orderkey "orderkey"
, o_custkey "custkey"
, o_orderstatus "orderstatus"
, o_totalprice "totalprice"
, o_orderdate "orderdate"
, o_orderpriority "orderpriority"
, o_clerk "clerk"
, o_shippriority "shippriority"
, o_comment "comment"
, 'relationship<OrdersCustomer>' "customer"
, 'relationship<OrdersLineitem>' "lineitems"
FROM
(
SELECT *
FROM
orders
) t
)
, Region AS (
SELECT
r_regionkey "regionkey"
, r_name "name"
, r_comment "comment"
, 'relationship<NationRegion>' "nations"
FROM
(
SELECT *
FROM
region
) t
)
, rs_z7prc988jf (custkey, bk, orders) AS (
SELECT
o.custkey custkey
, o.custkey bk
, array_agg(m.orderkey ORDER BY m.orderkey ASC) FILTER (WHERE (m.orderkey IS NOT NULL)) orders
FROM
(Customer o
LEFT JOIN Orders m ON (o.custkey = m.custkey))
GROUP BY 1, 2
)
, rs_15etkcvhdv (custkey, bk, f1) AS (
SELECT
s.custkey custkey
, s.bk bk
, array_agg(t.totalprice ORDER BY t.orderkey ASC) FILTER (WHERE (t.totalprice IS NOT NULL)) f1
FROM
((rs_z7prc988jf s
CROSS JOIN UNNEST(s.orders) u (uc))
LEFT JOIN Orders t ON (u.uc = t.orderkey))
GROUP BY 1, 2
)
, rs_o1dvjhv3zt (custkey, bk, f1) AS (
SELECT
s.custkey custkey
, s.bk bk
, sum(u.uc) f1
FROM
(rs_15etkcvhdv s
CROSS JOIN UNNEST(s.f1) u (uc))
GROUP BY 1, 2
)
, rs_1rng05f2ev (nationkey, name, regionkey, comment, region, supplier, bk) AS (
SELECT DISTINCT
t.nationkey
, t.name
, t.regionkey
, t.comment
, t.region
, t.supplier
, s.custkey bk
FROM
(Customer s
LEFT JOIN Nation t ON (s.nationkey = t.nationkey))
)
, rs_1nrbmd32nb (regionkey, name, comment, nations, bk) AS (
SELECT DISTINCT
t.regionkey
, t.name
, t.comment
, t.nations
, s.bk bk
FROM
(rs_1rng05f2ev s
LEFT JOIN Region t ON (s.regionkey = t.regionkey))
)
SELECT
Customer.name name
, rs_o1dvjhv3zt.f1 consumption
FROM
((Customer
LEFT JOIN rs_o1dvjhv3zt ON (Customer.custkey = rs_o1dvjhv3zt.bk))
LEFT JOIN rs_1nrbmd32nb ON (Customer.custkey = rs_1nrbmd32nb.bk))
WHERE (rs_1nrbmd32nb.name = 'ASIA')
Let’s break down each part of this query.
array_sum(transform(orders, o -> o.totalprice)) as consumption
- This is a function chain designed to calculate the sum of
totalprice
from relatedorders
records. - The column
orders
establishes a relation, allowing us to access the associatedorders
records based on the predefinedrelation
betweenCustomer
andOrders
. - The
transform
function acts as a lambda function to process the results of theTO_MANY
relation. As per theOrdersCustomer
relation, eachCustomer
record corresponds to multipleOrders
records. - We utilize
array_sum
to add up the results of thetransform
function. By extracting all relatedtotalprice
values fromOrders
, we can then sum them up to determine the consumption of eachCustomer
.
- This is a function chain designed to calculate the sum of
nation.region.name
- The column
nation
in theCustomer
table is a relation column. Similarly, the columnregion
in theNation
table is also a relation column. Leveraging the relations amongCustomer
,Nation
, andRegion
, we can retrieve all records of customers from the Asia region.
- The column
How Accio describe Models and Relations among them
After executing hello
command, you may find a file called quick-strat.mdl
in your current folder. It’s a file to tell Accio how to access data. Let’s check the part of Lineitem
model.
Model Lineitem @sql('select * from lineitem') {
orderkey: INTEGER @expr("l_orderkey")
linenumber: INTEGER @expr("l_linenumber")
orderkey_linenumber: VARCHAR @expr("concat(l_orderkey, l_linenumber)") @primaryKey
partkey: INTEGER @expr("l_partkey")
suppkey: INTEGER @expr("l_suppkey")
partkey_suppkey: VARCHAR @expr("concat(l_partkey, l_suppkey)")
quantity: REAL @expr("l_quantity")
...
shipmode: VARCHAR @expr("l_shipmode")
comment: VARCHAR @expr("l_comment")
orders: Orders @relation(OrdersLineitem)
partsupp: PartSupp @relation(PartSuppLineitem)
}
Let’s break down each part in the file.
model
is like a virtual table created by the SQL code specified in the@sql
annotation. It can be any SQL query supported by your database, but it's advisable to keep it as simple as possible. Using a simpler SQL query can greatly contribute to the maintainability of your semantic layer. In this case, we define a model calledLineitem
and base on the SQLselect * from lineitem
.- In
model
block, we define the columns it contains. For instance, in theLineitem
model, we can clearly see that it has a column calledorderkey
with the data typeINTEGER
. The@expr
annotation is used to display the original name from the data source, andorderkey
can be thought of as an alias for this column. - It's worth noting that the column
orderkey_linenumber
is not an original column in the data source. Its@expr
annotation presents a SQL expression, specifically,concat(l_orderkey, l_linenumber)
. This implies that the expression can be a formula, composite, or any other valid expression supported by your data source. - The column
orderkey_linenumber
is marked as a@primaryKey
, which means it serves as a unique key for each record. Accio utilizes this primary key to trace back the record when processing relation access. - The column
orders
is of typeOrders
, indicating that it is a relation column. When a user attempts to accessorders
, Accio will construct the join condition based on the@relation
annotation. In this scenario,OrdersLineitem
instructs Accio on how to establish the connection betweenOrders
andLineitem
.
With the model definition, we can query this model like
SELECT orderkey_linenumber, quantity, shipmode FROM Lineitem
- Result
- The Accio-transformed SQL
orderkey_linenumber | quantity | shipmode
----------------------+----------+----------
65601 | 32 | SHIP
230092 | 26 | REG AIR
1344 | 47 | REG AIR
238771 | 47 | AIR
282257 | 11 | REG AIR
161662 | 43 | REG AIR
506634 | 9 | SHIP
273925 | 16 | AIR
442274 | 30 | SHIP
WITH
Lineitem AS (
SELECT
l_orderkey "orderkey"
, l_linenumber "linenumber"
, concat(l_orderkey, l_linenumber) "orderkey_linenumber"
, l_partkey "partkey"
, l_suppkey "suppkey"
, concat(l_partkey, l_suppkey) "partkey_suppkey"
, l_quantity "quantity"
, l_extendedprice "extendedprice"
, l_discount "discount"
, l_tax "tax"
, l_returnflag "returnflag"
, l_linestatus "linestatus"
, l_shipdate "shipdate"
, l_commitdate "commitdate"
, l_receiptdate "receiptdate"
, l_shipinstruct "shipinstruct"
, l_shipmode "shipmode"
, l_comment "comment"
, 'relationship<OrdersLineitem>' "orders"
, 'relationship<PartSuppLineitem>' "partsupp"
FROM
(
SELECT *
FROM
lineitem
) t
)
SELECT
Lineitem.orderkey_linenumber orderkey_linenumber
, Lineitem.quantity quantity
, Lineitem.shipmode shipmode
FROM
Lineitem
Next, let’s check the part of Relation
.
Relation OrdersLineitem @condition(Orders.orderkey = Lineitem.orderkey) @desc("Orders of Lineitem") {
models: [Orders, Lineitem]
type: "ONE_TO_MANY"
}
In this part, we defined a relation
called OrdersLineitem
which describes how to connect
Orders
and Lineitem
.
- The annotation
@condition(Orders.orderkey = Lineitem.orderkey)
mentions the join condition between them. @desc
serves as a comprehensible description for the data consumer or maintainer. When employing the Accio Chatgpt plugin, it additionally functions as a contextual guide for the AI to generate SQL statements effectively.- The attribute
models
specifies the models to which this relation is connected. - The attribute
type
specifies the relation is aONE_TO_MANY
relation. It means eachOrders
record corresponds to multipleLineitem
records.
With this relation, we can access Lineitem
data from Orders
model like
SELECT orderkey, array_sum(transform(lineitems, l -> l.extendedprice)) FROM Orders
- Result
- The Accio-transformed SQL
orderkey | f1
-----------+-----------
1 | 180734.63
2 | 36596.28
3 | 218430.61
4 | 53456.4
5 | 108296.94
6 | 48040.43
7 | 281463.65
32 | 198563.34
33 | 152399.90
34 | 72299.16
35 | 196320.93
36 | 46165.98
WITH
Lineitem AS (
SELECT
l_orderkey "orderkey"
, l_linenumber "linenumber"
, concat(l_orderkey, l_linenumber) "orderkey_linenumber"
, l_partkey "partkey"
, l_suppkey "suppkey"
, concat(l_partkey, l_suppkey) "partkey_suppkey"
, l_quantity "quantity"
, l_extendedprice "extendedprice"
, l_discount "discount"
, l_tax "tax"
, l_returnflag "returnflag"
, l_linestatus "linestatus"
, l_shipdate "shipdate"
, l_commitdate "commitdate"
, l_receiptdate "receiptdate"
, l_shipinstruct "shipinstruct"
, l_shipmode "shipmode"
, l_comment "comment"
, 'relationship<OrdersLineitem>' "orders"
, 'relationship<PartSuppLineitem>' "partsupp"
FROM
(
SELECT *
FROM
lineitem
) t
)
, Orders AS (
SELECT
o_orderkey "orderkey"
, o_custkey "custkey"
, o_orderstatus "orderstatus"
, o_totalprice "totalprice"
, o_orderdate "orderdate"
, o_orderpriority "orderpriority"
, o_clerk "clerk"
, o_shippriority "shippriority"
, o_comment "comment"
, 'relationship<OrdersCustomer>' "customer"
, 'relationship<OrdersLineitem>' "lineitems"
FROM
(
SELECT *
FROM
orders
) t
)
, rs_1fo2ofymnd (orderkey, bk, lineitems) AS (
SELECT
o.orderkey orderkey
, o.orderkey bk
, array_agg(m.orderkey_linenumber ORDER BY m.orderkey_linenumber ASC) FILTER (WHERE (m.orderkey_linenumber IS NOT NULL)) lineitems
FROM
(Orders o
LEFT JOIN Lineitem m ON (o.orderkey = m.orderkey))
GROUP BY 1, 2
)
, rs_18n3yudcqn (orderkey, bk, f1) AS (
SELECT
s.orderkey orderkey
, s.bk bk
, array_agg(t.extendedprice ORDER BY t.orderkey_linenumber ASC) FILTER (WHERE (t.extendedprice IS NOT NULL)) f1
FROM
((rs_1fo2ofymnd s
CROSS JOIN UNNEST(s.lineitems) u (uc))
LEFT JOIN Lineitem t ON (u.uc = t.orderkey_linenumber))
GROUP BY 1, 2
)
, rs_21cz2prh3z (orderkey, bk, f1) AS (
SELECT
s.orderkey orderkey
, s.bk bk
, sum(u.uc) f1
FROM
(rs_18n3yudcqn s
CROSS JOIN UNNEST(s.f1) u (uc))
GROUP BY 1, 2
)
SELECT
Orders.orderkey orderkey
, rs_21cz2prh3z.f1
FROM
(Orders
LEFT JOIN rs_21cz2prh3z ON (Orders.orderkey = rs_21cz2prh3z.bk))
Do some change for your Accio MDL
After trying the power of Accio, maybe you want to do some customization for your MDL.
Model Lineitem @sql('select * from lineitem') {
...
extendedprice: REAL @expr("l_extendedprice")
discount: REAL @expr("l_discount")
}
If you examine the TPCH queries, you'll come across a frequently used concept called revenue
, which is calculated as l_extendedprice * (1 - l_discount)
. For such a frequently used expression, we can add a column to the Lineitem
model.
Model Lineitem @sql('select * from lineitem') {
...
extendedprice: REAL @expr("l_extendedprice")
discount: REAL @expr("l_discount")
revenue: REAL @expr("l_extendedprice * (1 - l_discount)")
}
After modified your MDL, we can use the following commands:
stop
: Stop the Accio server and the demo database (If started).accio stop
serve
: Transform your MDL to the manifest JSON required by Accio server, then move the manifest json file to Accio server and start server.accio serve -i accio.mdl -o accio.json
(Optional)
demo
: Start the demo Postgres database.accio demo
Now, you can query Lineitem
through CLI:
Start CLI
accio cli
Access
revenue
fromLineitem
SELECT orderkey, linenumber, revenue FROM Lineitem
- Result
- The Accio-transformed SQL
orderkey | linenumber | revenue
----------+------------+----------
6560 | 1 | 33091.26
23009 | 2 | 34694.18
134 | 4 | 63093.73
23877 | 1 | 44603.88
28225 | 7 | 15371.49
16166 | 2 | 49631.75
50663 | 4 | 9180.09
27392 | 5 | 25161.72
44227 | 4 | 56608.4
27137 | 2 | 67930.30
WITH
Lineitem AS (
SELECT
l_orderkey "orderkey"
, l_linenumber "linenumber"
, concat(l_orderkey, l_linenumber) "orderkey_linenumber"
, l_partkey "partkey"
, l_suppkey "suppkey"
, concat(l_partkey, l_suppkey) "partkey_suppkey"
, l_quantity "quantity"
, l_extendedprice "extendedprice"
, l_discount "discount"
, (l_extendedprice - (1 - l_discount)) "revenue"
, l_tax "tax"
, l_returnflag "returnflag"
, l_linestatus "linestatus"
, l_shipdate "shipdate"
, l_commitdate "commitdate"
, l_receiptdate "receiptdate"
, l_shipinstruct "shipinstruct"
, l_shipmode "shipmode"
, l_comment "comment"
, 'relationship<OrdersLineitem>' "orders"
, 'relationship<PartSuppLineitem>' "partsupp"
FROM
(
SELECT *
FROM
lineitem
) t
)
SELECT
Lineitem.orderkey orderkey
, Lineitem.linenumber linenumber
, Lineitem.revenue revenue
FROM
Lineitem
If you're interested in applying Accio to more complex scenarios, take a look at the TPCH example. It demonstrates how to define the TPCH schema and effectively address TPCH business questions using Accio.