ENTRADA is an open source system for storing and analyzing large volumes of network data. It is built on top of Apache Hadoop, Apache Parquet and Apache Impala
Parquet is a columnar storage format which allows for very efficient encoding and compression of the data. DNS data is highly structured and each column often has repeating values. Storing the data for each column sequential on disk and using run-length encoding, Parquet would only need to store a single 0 and a count of the number of zeroes. Compared to writing all the zeroes to disk, this saves a lot of bytes.
Impala provides high-performance, low-latency SQL queries on large volumes of data stored on Apache Hadoop. The fast response for queries enables interactive exploration and fine-tuning of analytic queries, rather than long batch jobs traditionally associated with SQL-on-Hadoop technologies.
Although ENTRADA on Hadoop is a good solution, it does require knowledge of Hadoop to be able to deploy and maintain it.
Also creating a in-house Hadoop cluster can be expensive.
For these reasons we decided to also include support for Amazon S3 and Athena, essentially creating a a server-less DNS analytics tool. S3 and Athena are managed by Amazon and require minimal knowledge to get started with.
Running ENTRADA in the AWS is also a very cheap solution, S3 storage cost little and Athena charges only for the amount
of data queried.
This all the depends on the type SQL query and the volume of data that is to be analyzed.
Yes, you can also use Apache Spark to query the generated Parquet files.
Currently IP, TCP, UDP, DNS and ICMP protocols are supported.
ENTRADA has builtin support for privacy, the options below allow you to run ENTRADA without saving client IP addresses or to delete client IP addresses after a configurable number of days.
Option | Default | Required | Description |
---|---|---|---|
ENTRADA_PRIVACY_ENABLED | false | N | Privacy mode, does not write client IP addresses to file. |
ENTRADA_PRIVACY_PURGE_AGE | 0 | N | Delete client IP addresses from older partitions, after x days, 0=disabled |
ENTRADA_PRIVACY_PURGE_INTERVAL | 10 | N | Interval ()# of minutes) between checks if IP addresses must be removed from older partitions. |
ENTRADA was initially started by SIDN Labs, the research team of SIDN, the domain name registry for the .nl ccTLD.
ENTRADA is released under the GNU GENERAL PUBLIC LICENSE version 3 license.
The new 2.x database schema is not compatible with theParquet files generated by the 0.x version of ENTRADA. Apache Impala uses index-based column indexing and this breaks when using the new schema because ENTRADA 2.x added and removed columns.
The fix for this is to make sure Impala uses named-based indexing, this can be enabled using the PARQUET_FALLBACK_SCHEMA_RESOLUTION option.
When building a product or service using ENTRADA, we kindly request that you include the following attribution text in all advertising and documentation.
This product includes ENTRADA created by <a href="https://www.sidnlabs.nl">SIDN Labs</a>, available from
<a href="http://entrada.sidnlabs.nl">http://entrada.sidnlabs.nl</a>.
Yes, please contact us for questions about support.
Yes, metrics are gathered when data is processed, these metrics can be analysed using Graphite and Grafana.
To avoid any kind of timezone confusion, especially when daylight saving time is involved, we decided to exclusively use UTC for time related columns and display purposes in all components of ENTRADA.
The free MaxMind database files are automatically downloaded from: MaxMind download. There is also support for the paid Maxmind version, just configure your license key using the ENTRADA configuration option.