F.A.Q.

# What is ENTRADA?

ENTRADA is an open source system for storing and analyzing large volumes of network data. It is built on top of Apache Hadoop, Apache Parquet and Apache Impala

Why use Apache Parquet?

Parquet is a columnar storage format which allows for very efficient encoding and compression of the data. DNS data is highly structured and each column often has repeating values. Storing the data for each column sequential on disk and using run-length encoding, Parquet would only need to store a single 0 and a count of the number of zeroes. Compared to writing all the zeroes to disk, this saves a lot of bytes.

Why use Apache Impala?

Impala provides high-performance, low-latency SQL queries on large volumes of data stored on Apache Hadoop. The fast response for queries enables interactive exploration and fine-tuning of analytic queries, rather than long batch jobs traditionally associated with SQL-on-Hadoop technologies.

Why AWS S3 and Athena?

Although ENTRADA on Hadoop is a good solution, it does require knowledge of Hadoop to be able to deploy and maintain it. Also creating a in-house Hadoop cluster can be expensive.
For these reasons we decided to also include support for Amazon S3 and Athena, essentially creating a a server-less DNS analytics tool. S3 and Athena are managed by Amazon and require minimal knowledge to get started with.
Running ENTRADA in the AWS is also a very cheap solution, S3 storage cost little and Athena charges only for the amount of data queried.

How fast is ENTRADA?

This all the depends on the type SQL query and the volume of data that is to be analyzed.

Can i also use other Query engines?

Yes, you can also use Apache Spark to query the generated Parquet files.

What network protocols can ENTRADA handle?

Currently IP, TCP, UDP, DNS and ICMP protocols are supported.

Who developed ENTRADA?

ENTRADA was initially started by SIDN Labs, the research team of SIDN, the domain name registry for the .nl ccTLD.

What license is ENTRADA released under?

ENTRADA is released under the GNU GENERAL PUBLIC LICENSE version 3 license.

Is ENTRADA version 2.x compatible with version 0.x

The new 2.x database schema is not compatible with theParquet files generated by the 0.x version of ENTRADA. Apache Impala uses index-based column indexing and this breaks when using the new schema because ENTRADA 2.x added and removed columns.

The fix for this is to make sure Impala uses named-based indexing, this can be enabled using the PARQUET_FALLBACK_SCHEMA_RESOLUTION option.

Attribution

When building a product or service using ENTRADA, we kindly request that you include the following attribution text in all advertising and documentation.

This product includes ENTRADA created by <a href="https://www.sidnlabs.nl">SIDN Labs</a>, available from
<a href="http://entrada.sidnlabs.nl">http://entrada.sidnlabs.nl</a>.

Is support available?

Yes, please contact us for questions about support.

Is there any monitoring available?

Yes, metrics are gathered when data is processed, these metrics can be analysed using Graphite and Grafana.

Can I change the timezone? Why is everything in UTC?

To avoid any kind of timezone confusion, especially when daylight saving time is involved, we decided to exclusively use UTC for time related columns and display purposes in all components of ENTRADA.

What Geolocation service is used by ENTRADA?

The free MaxMind database files are automatically downloaded from: MaxMind download. There is also support for the paid Maxmind version, just configure your license key using the ENTRADA configuration option.