Frequently Asked Questions

What is ENTRADA?

ENTRADA is an open source system for storing and analyzing large volumes of network data. It is built on top of Apache Hadoop, Apache Parquet and Apache Impala

Why use Apache Parquet?

Parquet is a columnar storage format which allows for very efficient encoding and compression of the data. DNS data is highly structured and each column often has repeating values. Storing the data for each column sequential on disk and using run-length encoding, Parquet would only need to store a single 0 and a count of the number of zeroes. Compared to writing all the zeroes to disk, this saves a lot of bytes.

Why use Apache Impala?

Impala provides high-performance, low-latency SQL queries on large volumes of data stored on Apache Hadoop. The fast response for queries enables interactive exploration and fine-tuning of analytic queries, rather than long batch jobs traditionally associated with SQL-on-Hadoop technologies.

How fast is ENTRADA?

This all the depends on the type SQL query and the volume of data that is to be analyzed. For relatively simple queries using a couple of billion rows, expect to get a result within a few seconds to a couple of minutes (using a small cluster of 4 data nodes). For more detailed questions about performance please contact us.

Can ENTRADA be scaled out?

Yes, because ENTRADA is built on top of Hadoop it is very easy to scale out by adding more Hadoop nodes to the cluster to increase compute and storage capacity. Adding more harddrives to existing nodes is also possible if storage is the bottleneck.

Can i also use other Query engines?

Yes, you can also use Apache Spark to query the generated Parquet files.

What network protocols can ENTRADA handle?

Currently only the IP, TCP, UDP, DNS and ICMP protocols are supported.

Can ENTRADA be made highly available?

Yes, ENTRADA is built on top of Hadoop which has high availability features.

What language is ENTRADA written in?

The ENTRADA components used for converting network data are written in Java, the workflow used to tie everything together is done with Bash shell scripts.

Who developed ENTRADA?

ENTRADA was initially started by SIDN Labs, the R&D team of SIDN, the domainname registry for the .nl ccTLD. At SIDN Labs ENTRADA is used to analyze DNS network data, the SIDN Labs DNS database currently holds over 100 billion rows.

What license is ENTRADA released under?

Most ENTRADA components are released under the GNU GENERAL PUBLIC LICENSE version 3 license. The pcaplib4java project is based on code which is originally developed at RIPE NCC as a Hadoop PCAP library. This library uses the GNU LESSER GENERAL PUBLIC LICENSE version 3 license. Under LGLP a derived work such as the pcaplib4java project inherits the LGPL license.

Attribution

When building a product or service using ENTRADA, we kindly request that you include the following attribution text in all advertising and documentation.

This product includes ENTRADA created by <a href="https://www.sidnlabs.nl">SIDN Labs</a>, available from
<a href="http://entrada.sidnlabs.nl">http://entrada.sidnlabs.nl</a>.

Is support available?

Yes, see the support page.

Is there any monitoring?

Yes, metrics are gathered when network data is processed, these metrics can be analyzed with Graphite and Grafana.

Can I change the timezone? Why is everything in UTC?

To avoid any kind of timezone confusion, especially when daylight saving time is involved, we decided to exclusively use Unix time internally and UTC for display purposes in all components of ENTRADA.

What Geolocation service is used by ENTRADA?

The MaxMind legacy database files are automatically downloaded from: MaxMind download

Why does this website look so familiar?

We built this site using the website generator of the excellent Prometheus project which is released under an open source license. Many thanks to the Prometheus guys.