Skip to main content

How to do Pre-Aggregation for a Metric


An Metric can be marked @preAgg as a pre-aggregated metric like

Metric Revenue @model(Orders) @preAgg {
total_revenue: INTEGER @measure(sum(totalprice))
orderstatus: OrderStatus @dim
custkey: INTEGER @dim
orderdate_grain: date @time_grain(orderdate, [DAY, MONTH])

A pre-aggregated metric will be generated when the Accio server is started. The result of a metric will be cached in DuckDB.

Query pre-aggregation tables

Mark Revenue as a pre-aggregated metric and we can query Revenue as the following:

Query Revenue and the results from DuckDB

SELECT * FROM Revenue;

custkey | orderstatus | totalrevenue
251 | F | 1369253.1199999999
1202 | F | 1699875.2499999998
1384 | F | 1795731.6300000001

Find orderstatus is O and custkey is 100 in Revenue

SELECT * FROM Revenue 
WHERE custkey = 100
AND orderstatus = 'O';

custkey | orderstatus | totalrevenue
100 | O | 1630118.7299999997

Note: Query pre-aggregation withroll_up is not executed in DuckDB

Refresh pre-aggregation

The pre-aggregation tables are refreshed periodically by refreshTime.

  • refreshTime: Refreshing pre-aggregation tables period; default is 30m; Refreshing the pre-aggregation frequently is unrecommended.
    • acceptable units: (combined units are unacceptable; 3m5s is unacceptable)
      • d: days; 1d
      • h: hours; 2h
      • m: minutes; 3m
      • s: seconds; 5s

BigQuery configurations:

  • bigquery.bucket-name: The bucket name in GCS. pre-agg-bucket-name-gcs
  • Need to create HMAC keys and declare them:

Sample steps:

  • Step 1: step1
  • Step 2: step2

Unsupported BigQuery type (Others are supported)

  • BIGNUMERICBigQuery type max precision is 76, but DuckDB only support to 38.
  • INTERVAL:BigQuery INTERVAL type is not currently supported for parquet exports
  • JSON:BigQuery can't group by JSON type
  • TIME