Installation

ENTRADA can be deployed using Docker Compose, download one of the example Docker Compose scripts and save it as docker-compose.yml and then edit the script to configure the environment variables to fit your requirements.
Start the container using the docker-compose command:

   docker-compose up

Deployment steps

H2 and AWS

Follow these steps to deploy ENTRADA using a H2 database and AWS Athena. The H2 database is only recommended for testing and evaluating of ENTRADA.

  1. Create a ENTRADA home directory ($ENTRADA_HOME), e.g. /data/entrada
  2. Download the H2 and AWS Docker Compose script to $ENTRADA_HOME.
  3. Rename docker-compose-h2-aws.yml to docker-compose.yml
  4. Open docker-compose.yml in a editor
  5. The input directory $ENTRADA_HOME/input can have a sub-directory for each name server data-feed. Add a comma-separated list of name server names to ENTRADA_NAMESERVERS. For this test use the value test-ns.
  6. Create a new ENTRADA IAM user in the AWS console and create an access key for this user. Use the access key id and key value for the AWS_ACCESS_KEY_ID and AWS_SECRET_KEY options.
  7. Add the name of the AWS S3 bucket to create to AWS_BUCKET. This bucket will be created by ENTRADA.
  8. Edit ENTRADA_LOCATION_OUTPUT, add the S3 location (bucket name + path prefix) for the location where ENTRADA will upload the Parquet files to.
  9. Start ENTRADA using docker-compose up, watch the log file $ENTRADA_HOME/log/spring.log for errors.
  10. ENTRADA has created a database schema on AWS, use AWS Glue to verify the existence of the entrada database and tables.
  11. ENTRADA has also created the required subdirectories in $ENTRADA_HOME
  12. create a new subdirectory test-ns in $ENTRADA_HOME/input
  13. Copy the sample pcap-file to $ENTRADA_HOME/input/test-ns
  14. Watch the log file to see if the sample pcap file is found and processed, and after it has been processed use Athena to check if any data has been added to the entrada.dns table, A single row should have been added to the table.

ENTRADA expects the input directory to contain a sub-directory for each name server. Each name server sub-directory should use the following format <ns>_<anycast_site>, the ns and anycast_site parts will be extracted and the ns part will be used to partition the Parquet data on name server name. The anycast_site part will be saved in the server_location columns in the dns table.

Other

There are also example scripts for the following combinations.

Script Database Mode Description
H2+AWS H2 AWS Use only for testing and evaluation
H2+Local H2 Local Use only for testing and evaluation
PostgreSQL+AWS PostgreSQL AWS Can be used for production
PostgreSQL+Hadoop PostgreSQL Hadoop Secure Hadoop, can be used for production
PostgreSQL+Secure Hadoop PostgreSQL Hadoop Non-secure Hadoop, can be used for production
PostgreSQL+Local PostgreSQL Local Local storage, can be used for production

Supported modes

ENTRADA supports multiple modes of operation

Local

In local mode all input and output directories must be local and no SQL-engine is used.
Use this mode if you want to save the created Parquet files on the local system.

AWS (S3 and Athena)

In AWS mode the output directory must be on S3, and Athena is used as a SQL-engine.
The input and archive directories may be on S3 but may also be on the local filesystem.
ENTRADA can create a bucket and configure the correct security settings ( Access control and encryption).

Hadoop (HDFS and Impala)

In Hadoop mode the output directory must be on HDFS, and Impala is used as a SQL-engine.
The input and archive directories may be on HDFS but may also be on the local filesystem.

Supported databases

ENTRADA requires a database to persist information about processed files and created database partitions. There is support for both PostgreSQL and H2,

H2

H2 is a fast database that is useful for testing scenarios, you should only use it when evaluating or testing ENTRADA functionality.

PostgreSQL

PostgreSQL should be used when ENTRADA is deployed in a production environment.