Data model

ENTRADA creates a databases named entrada, in this database two data tables are created, the dns table contains the DNS data and the icmp that contains ICMP data.

Schema version 2.x is not compatible with 0.x

The new 2.x database schema is not compatible with theParquet files generated by the 0.x version of ENTRADA. Apache Impala uses index-based column indexing and this breaks when using the new schema because ENTRADA 2.x added and removed columns.
The fix for this is to make sure Impala uses named-based indexing, this can be enabled using the PARQUET_FALLBACK_SCHEMA_RESOLUTION option.

set PARQUET_FALLBACK_SCHEMA_RESOLUTION=name;
select count(1) from entrada.dns;

Removed tables

The staging table has been removed, all data is now directly inserted into entrada.dns by default.

Removed columns

Column Now use
unixtime (secs) time (millis)
len req_len, res_len
dns_len req_len, res_len
udp_sum -
is_google pub_resolver
is_open_dns pub_resolver

Added columns

pub_resolver Description
pub_resolver name of public resolver
req_len length of DNS request
res_len length of DNS response
tcp_hs_rtt RTT (ms) of TCP handshake
tcp_pk_rtt RTT (ms) of TCP server response