The database tables use a partitioning schema to divide the data into separate partition. These partitions MUST be used in SQL queries to indicate which data files should be used when executing the SQL query, this process is called partition pruning
. The partition schema contains 4 columns.
Column | Description |
---|---|
year | The capture year |
month | The capture month |
day | The capture day |
server | The destination server |
The following example SQL query only analyses data for server “ns1.dns.nl” that was captured on 05-12-2015. All other data in the table is skipped. Partitioning functions as an index to enable fast data lookups.
Using partition pruning the query engine can ignore most of the database data and this will save a lot of processing time and money in the case of Athena ( for Athena you pay per bytes scanned)
select qname
from entrada.dns
where year=2015 and month=12 and day=5 and server="ns1.dns.nl"
limit 10