ENTRADA (ENhanced Top-level domain Resilience through Advanced Data Analysis ) is a tool for analysing very large volumes of DNS data.
This is achieved by converting the DNS (which arrives in PCAP formatted files) to a more efficient columnar data format (Apache Parquet).
Analyzing the parquet data is done using a analytical query engine such Hadoop + Impala or Amazon Athena. ENTRADA has support for both SQL-engines.
All the workflow steps required to get from raw DNS data to Parquet data available for querying in a database are automated.
DNS data is converted, enriched and streamed into the database automatically, this means the data is ready to be analysed within minutes from being processed on the name server.
ENTRADA can be deployed on premise or in the cloud, the table below displays the possible options.
Deployment | Storage | SQL-engine | on premise | Cloud |
---|---|---|---|---|
Hadoop | HDFS | Impala | yes | yes |
AWS | S3 | Athena | no | yes |
Local | local disk | - | yes | yes |
The DNS request and its corresponding response are combined into a single database row and enriched.
These step are performed during the data import process to help speedup later SQL-queries.
The required resources, such as the IP geolocation database, are downloaded automatically by ENTRADA.
The following details are added to each DNS query and response tuple.
Apache Impala, AWS Athena or Apache Spark can be used to analyse the generated Parquet data,
ENTRADA will handle all the required workflow actions such as:
Making ENTRADA more efficient is supported by: JProfiler, the leading Java profiler