Skip to main content

Quick Start

This guide introduces the basics of querying data and building a semantic model with Accio. By the end of this tutorial, you will understand how to use Accio to model to build data models, build re-usable metric and do analysis on your data intuitively.

It’s easy to run Accio locally on your laptop. Follow the setup instructions to install Accio.

Start a Accio Demo Project

Start the Accio demo project using Accio CLI tool with the fellowing command:

accio hello

This command will:

  • Initialize Accio in the current directory with an example template.
  • Build the project and generate a manifest JSON file for Accio server.
  • Start a demo Postgres database for you to play with.

Once the build process completes successfully, Accio will automatically serve the generated JSON output file. In the demo database, Accio has prepared a tiny TPCH dataset.

Now, we have a demo project to exploring the power of Accio. We can use the following command to start a command line interface to query the demo data.

accio cli

Try to solve some simple questions by Accio SQL

This question solved by Accio SQL will be:

SELECT
partsupp.supplier.name,
count(*)
FROM Lineitem
WHERE orders.orderdate BETWEEN DATE '1996/01/01' AND DATE '1996/12/31'
GROUP BY 1
ORDER BY 2 desc
LIMIT 10
name                 | count 
---------------------+-------
Supplier#000000075 | 120
Supplier#000000036 | 118
Supplier#000000021 | 115
Supplier#000000023 | 113
Supplier#000000032 | 113
Supplier#000000096 | 112
Supplier#000000090 | 110
Supplier#000000061 | 110
Supplier#000000030 | 107
Supplier#000000076 | 107
(10 rows)

This code is very similar to traditional SQL, but there are some differences in partsupp.supplier.name and orders.orderdate.

  • partsupp is a relation column in Lineitem model and supplier is a relation column in model Partsupp . These relation columns allow us to access data from the associated tables. Specifically, the relations between model Lineitem and Partsupp is a MANY_TO_ONE relation, indicating that each row in Lineitem corresponds to only one record in Partsupp. Column supplier also establishes the connection between model Partsupp and Supplier.
  • orders is also a relation column in Lineitem.

Though the relation columns, you just grab all the related records using the relation column from other models and then do the usual aggregation and group-by stuff.

How about the consumption amount of each customer in the Asia region?

This question solved by Accio SQL will be:

SELECT
name,
array_sum(transform(orders, o -> o.totalprice)) as consumption
FROM Customer
WHERE nation.region.name = 'ASIA'
name                 |    consumption     
---------------------+--------------------
Customer#000000007 | 3922020.98
Customer#000000019 | 3686492.93
Customer#000000025 | 2845869.53
Customer#000000028 | 3075314.4600000004
Customer#000000037 | 2428169.69
Customer#000000038 | 1336731.1199999999

Let’s break down each part of this query.

  • array_sum(transform(orders, o -> o.totalprice)) as consumption
    • This is a function chain designed to calculate the sum of totalprice from related orders records.
    • The column orders establishes a relation, allowing us to access the associated orders records based on the predefined relation between Customer and Orders.
    • The transform function acts as a lambda function to process the results of the TO_MANY relation. As per the OrdersCustomer relation, each Customer record corresponds to multiple Orders records.
    • We utilize array_sum to add up the results of the transform function. By extracting all related totalprice values from Orders, we can then sum them up to determine the consumption of each Customer.
  • nation.region.name
    • The column nation in the Customer table is a relation column. Similarly, the column region in the Nation table is also a relation column. Leveraging the relations among Customer, Nation, and Region, we can retrieve all records of customers from the Asia region.

How Accio describe Models and Relations among them

After executing hello command, you may find a file called quick-strat.mdl in your current folder. It’s a file to tell Accio how to access data. Let’s check the part of Lineitem model.

Model Lineitem @sql('select * from lineitem') {
orderkey: INTEGER @expr("l_orderkey")
linenumber: INTEGER @expr("l_linenumber")
orderkey_linenumber: VARCHAR @expr("concat(l_orderkey, l_linenumber)") @primaryKey
partkey: INTEGER @expr("l_partkey")
suppkey: INTEGER @expr("l_suppkey")
partkey_suppkey: VARCHAR @expr("concat(l_partkey, l_suppkey)")
quantity: REAL @expr("l_quantity")

...

shipmode: VARCHAR @expr("l_shipmode")
comment: VARCHAR @expr("l_comment")
orders: Orders @relation(OrdersLineitem)
partsupp: PartSupp @relation(PartSuppLineitem)
}

Let’s break down each part in the file.

  • model is like a virtual table created by the SQL code specified in the @sql annotation. It can be any SQL query supported by your database, but it's advisable to keep it as simple as possible. Using a simpler SQL query can greatly contribute to the maintainability of your semantic layer. In this case, we define a model called Lineitem and base on the SQL select * from lineitem.
  • In model block, we define the columns it contains. For instance, in the Lineitem model, we can clearly see that it has a column called orderkey with the data type INTEGER. The @expr annotation is used to display the original name from the data source, and orderkey can be thought of as an alias for this column.
  • It's worth noting that the column orderkey_linenumber is not an original column in the data source. Its @expr annotation presents a SQL expression, specifically, concat(l_orderkey, l_linenumber). This implies that the expression can be a formula, composite, or any other valid expression supported by your data source.
  • The column orderkey_linenumber is marked as a @primaryKey, which means it serves as a unique key for each record. Accio utilizes this primary key to trace back the record when processing relation access.
  • The column orders is of type Orders, indicating that it is a relation column. When a user attempts to access orders, Accio will construct the join condition based on the @relation annotation. In this scenario, OrdersLineitem instructs Accio on how to establish the connection between Orders and Lineitem.

With the model definition, we can query this model like

SELECT orderkey_linenumber, quantity, shipmode FROM Lineitem
orderkey_linenumber   | quantity | shipmode 
----------------------+----------+----------
65601 | 32 | SHIP
230092 | 26 | REG AIR
1344 | 47 | REG AIR
238771 | 47 | AIR
282257 | 11 | REG AIR
161662 | 43 | REG AIR
506634 | 9 | SHIP
273925 | 16 | AIR
442274 | 30 | SHIP

Next, let’s check the part of Relation.

Relation OrdersLineitem @condition(Orders.orderkey = Lineitem.orderkey) @desc("Orders of Lineitem") {
models: [Orders, Lineitem]
type: "ONE_TO_MANY"
}

In this part, we defined a relation called OrdersLineitem which describes how to connect

Orders and Lineitem.

  • The annotation @condition(Orders.orderkey = Lineitem.orderkey) mentions the join condition between them.
  • @desc serves as a comprehensible description for the data consumer or maintainer. When employing the Accio Chatgpt plugin, it additionally functions as a contextual guide for the AI to generate SQL statements effectively.
  • The attribute models specifies the models to which this relation is connected.
  • The attribute type specifies the relation is a ONE_TO_MANY relation. It means each Orders record corresponds to multiple Lineitem records.

With this relation, we can access Lineitem data from Orders model like

SELECT orderkey, array_sum(transform(lineitems, l -> l.extendedprice)) FROM Orders
  orderkey |    f1     
-----------+-----------
1 | 180734.63
2 | 36596.28
3 | 218430.61
4 | 53456.4
5 | 108296.94
6 | 48040.43
7 | 281463.65
32 | 198563.34
33 | 152399.90
34 | 72299.16
35 | 196320.93
36 | 46165.98

Do some change for your Accio MDL

After trying the power of Accio, maybe you want to do some customization for your MDL.

Model Lineitem @sql('select * from lineitem') {
...
extendedprice: REAL @expr("l_extendedprice")
discount: REAL @expr("l_discount")

}

If you examine the TPCH queries, you'll come across a frequently used concept called revenue, which is calculated as l_extendedprice * (1 - l_discount). For such a frequently used expression, we can add a column to the Lineitem model.

Model Lineitem @sql('select * from lineitem') {
...
extendedprice: REAL @expr("l_extendedprice")
discount: REAL @expr("l_discount")
revenue: REAL @expr("l_extendedprice * (1 - l_discount)")
}

After modified your MDL, we can use the following commands:

  1. stop: Stop the Accio server and the demo database (If started).

    accio stop
  2. serve : Transform your MDL to the manifest JSON required by Accio server, then move the manifest json file to Accio server and start server.

    accio serve -i accio.mdl -o accio.json
  3. (Optional) demo: Start the demo Postgres database.

    accio demo

Now, you can query Lineitem through CLI:

  1. Start CLI

    accio cli
  2. Access revenue from Lineitem

    SELECT orderkey, linenumber, revenue FROM Lineitem
orderkey  | linenumber | revenue  
----------+------------+----------
6560 | 1 | 33091.26
23009 | 2 | 34694.18
134 | 4 | 63093.73
23877 | 1 | 44603.88
28225 | 7 | 15371.49
16166 | 2 | 49631.75
50663 | 4 | 9180.09
27392 | 5 | 25161.72
44227 | 4 | 56608.4
27137 | 2 | 67930.30

If you're interested in applying Accio to more complex scenarios, take a look at the TPCH example. It demonstrates how to define the TPCH schema and effectively address TPCH business questions using Accio.