Parquet is a columnar storage format which allows for very efficient encoding and compression of the data. DNS data is highly structured and each column often has repeating values. Storing the data for each column sequential on disk and using run-length encoding, Parquet would only need to store a single 0 and a count of the number of zeroes. Compared to writing all the zeroes to disk, this saves a lot of bytes.
Impala provides high-performance, low-latency SQL queries on large volumes of data stored on Apache Hadoop. The fast response for queries enables interactive exploration and fine-tuning of analytic queries, rather than long batch jobs traditionally associated with SQL-on-Hadoop technologies.
This all the depends on the type SQL query and the volume of data that is to be analyzed. For relatively simple queries using a couple of billion rows, expect to get a result within a few seconds to a couple of minutes (using a small cluster of 4 data nodes). For more detailed questions about performance please contact us.
Yes, because ENTRADA is built on top of Hadoop it is very easy to scale out by adding more Hadoop nodes to the cluster to increase compute and storage capacity. Adding more harddrives to existing nodes is also possible if storage is the bottleneck.
Yes, you can also use Apache Spark to query the generated Parquet files.
Currently only the IP, TCP, UDP, DNS and ICMP protocols are supported.
Yes, ENTRADA is built on top of Hadoop which has high availability features.
The ENTRADA components used for converting network data are written in Java, the workflow used to tie everything together is done with Bash shell scripts.
ENTRADA was initially started by SIDN Labs, the R&D team of SIDN, the domainname registry for the .nl ccTLD. At SIDN Labs ENTRADA is used to analyze DNS network data, the SIDN Labs DNS database currently holds over 100 billion rows.
Most ENTRADA components are released under the GNU GENERAL PUBLIC LICENSE version 3 license. The pcaplib4java project is based on code which is originally developed at RIPE NCC as a Hadoop PCAP library. This library uses the GNU LESSER GENERAL PUBLIC LICENSE version 3 license. Under LGLP a derived work such as the pcaplib4java project inherits the LGPL license.
When building a product or service using ENTRADA, we kindly request that you include the following attribution text in all advertising and documentation.
This product includes ENTRADA created by <a href="https://www.sidnlabs.nl">SIDN Labs</a>, available from <a href="http://entrada.sidnlabs.nl">http://entrada.sidnlabs.nl</a>.
Yes, see the support page.
To avoid any kind of timezone confusion, especially when daylight saving time is involved, we decided to exclusively use Unix time internally and UTC for display purposes in all components of ENTRADA.
The MaxMind legacy database files are automatically downloaded from: MaxMind download
We built this site using the website generator of the excellent Prometheus project which is released under an open source license. Many thanks to the Prometheus guys.