Installation

ENTRADA can be deployed using Docker Compose, download one of the example Docker Compose scripts and save it as docker-compose.yml and then edit the script to configure the variables to fit your requirements and then start the container using the docker-compose command:

   docker-compose up

There is no web interface, a limited HTTP based API is available mostly to cleanly stop the ENTRADA process inside the container, before stopping the container. When the container is started ENTRADA will monitor the input directory for new pcap files, when new files are detected the data is converted to Parquet format and uploaded to HDFS/AWS or saved on a local disk.

Deployment steps

H2 and AWS

Follow these steps to deploy ENTRADA using a H2 database and AWS Athena. The H2 database is only recommended for testing and evaluating of ENTRADA.

  1. Create a ENTRADA home directory ($ENTRADA_HOME), e.g. /data/entrada
  2. Download the H2 and AWS Docker Compose script to $ENTRADA_HOME.
  3. Rename docker-compose-h2-aws.yml to docker-compose.yml
  4. Open docker-compose.yml in an editor
  5. The input directory $ENTRADA_HOME/input must have a sub-directory for each name server data-feed. Add a comma-separated list of name server names to ENTRADA_NAMESERVERS. For this test use the value test-ns.
  6. Create a new ENTRADA IAM user in the AWS console and create an access key for this user. Use the access key id and key value for the AWS_ACCESS_KEY_ID and AWS_SECRET_KEY options.
  7. Add the name of the AWS S3 bucket to create to AWS_BUCKET. This bucket will be created by ENTRADA.
  8. Edit ENTRADA_LOCATION_OUTPUT, add the S3 location (bucket name + path prefix) for the location where ENTRADA will upload the Parquet files to.
  9. Start ENTRADA using docker-compose up, watch the log file $ENTRADA_HOME/log/*.log for errors.
  10. ENTRADA has created a database schema on AWS, use AWS Glue to verify the existence of the entrada database and tables.
  11. ENTRADA has created the required subdirectories in $ENTRADA_HOME
  12. create a new subdirectory test-ns in $ENTRADA_HOME/input
  13. Copy the sample pcap-file to $ENTRADA_HOME/input/test-ns
  14. ENTRADA should automatically pickup the pcap file, check the log file to see if the sample pcap file is found and processed, and after it has been processed use Athena to check if any data has been added to the entrada.dns table, A single row should have been added to the table.

H2 and Local

Follow these steps to deploy ENTRADA using a H2 database without a query engine (Impala or Athena). The generated parquet data will be saved in the configured location on local disk. The H2 database is only recommended for testing and evaluating of ENTRADA.

  1. Create a ENTRADA home directory ($ENTRADA_HOME), e.g. /data/entrada
  2. Download the H2 and AWS Docker Compose script to $ENTRADA_HOME.
  3. Rename docker-compose-h2-local.yml to docker-compose.yml
  4. Open docker-compose.yml in an editor
  5. The input directory $ENTRADA_HOME/input must have a sub-directory for each name server data-feed. Add a comma-separated list of name server names to ENTRADA_NAMESERVERS variable in the Docker compose file. For this test use the value test-ns.
  6. Start ENTRADA using docker-compose up, watch the log file $ENTRADA_HOME/log/*.log for errors.
  7. ENTRADA has created the required subdirectories in $ENTRADA_HOME
  8. create a new subdirectory test-ns in $ENTRADA_HOME/input
  9. Copy the sample pcap-file to $ENTRADA_HOME/input/test-ns
  10. ENTRADA should automatically pickup the pcap file, check the log file to see if the sample pcap file is found and processed. The $ENTRADA_HOME/output directory should contain a newly generated Parquet file.
  11. Optionally inspect the Parquet file using Parquet-tools

ENTRADA expects the input directory to contain a sub-directory for each name server. Each name server sub-directory should use the following format <ns>_<anycast_site>, the ns and anycast_site parts will be extracted and the ns part is used to partition the Parquet data on name server name value. The anycast_site part will be saved in the server_location column of the dns table.

Other

There are also example scripts for the following combinations.

Script Database Mode Description
H2+AWS H2 AWS Use only for testing and evaluation
H2+Local H2 Local Use only for testing and evaluation
PostgreSQL+AWS PostgreSQL AWS Can be used for production
PostgreSQL+Hadoop PostgreSQL Hadoop Standard Hadoop, can be used for production
PostgreSQL+Secure Hadoop PostgreSQL Hadoop Secure Hadoop (Kerberos), can be used for production
PostgreSQL+Local PostgreSQL Local Local storage, can be used for production

Supported modes

ENTRADA supports multiple modes of operation

Local

In local mode all input and output directories must be local and no SQL-engine is used.
Use this mode if you want to save the created Parquet files on the local system.

AWS (S3 and Athena)

In AWS mode the output directory must be on S3, and Athena is used as a SQL-engine.
The input and archive directories may be on S3 but may also be on the local filesystem.
ENTRADA can create a bucket and configure the correct security settings ( Access control and encryption).

Hadoop (HDFS and Impala)

In Hadoop mode the output directory must be on HDFS, and Impala is used as a SQL-engine.
The input and archive directories may be on HDFS but may also be on the local filesystem.

Supported databases

ENTRADA requires a database to persist information about processed files and created database partitions. There is support for both PostgreSQL and H2,

H2

H2 is a fast database that is useful for testing scenarios, you should only use it when evaluating or testing ENTRADA functionality.

PostgreSQL

PostgreSQL should be used when ENTRADA is deployed in a production environment.