ENTRADA can be deployed using Docker Compose, download one of the example Docker Compose scripts and save it as docker-compose.yml
and then edit the script to configure the variables to fit your requirements and then start the container using the docker-compose
command:
docker-compose up
There is no web interface, a limited HTTP based API is available mostly to cleanly stop the ENTRADA process inside the container, before stopping the container. When the container is started ENTRADA will monitor the input directory for new pcap files, when new files are detected the data is converted to Parquet format and uploaded to HDFS/AWS or saved on a local disk.
Follow these steps to deploy ENTRADA using a H2 database and AWS Athena. The H2 database is only recommended for testing and evaluating of ENTRADA.
docker-compose-h2-aws.yml
to docker-compose.yml
docker-compose.yml
in an editorENTRADA_NAMESERVERS
. For this test use the value test-ns
.AWS_BUCKET
. This bucket will be created by ENTRADA.docker-compose up
, watch the log file $ENTRADA_HOME/log/*.log for errors.entrada
database and tables.test-ns
in $ENTRADA_HOME/inputentrada.dns
table, A single row should have been added to the table.Follow these steps to deploy ENTRADA using a H2 database without a query engine (Impala or Athena). The generated parquet data will be saved in the configured location on local disk. The H2 database is only recommended for testing and evaluating of ENTRADA.
docker-compose-h2-local.yml
to docker-compose.yml
docker-compose.yml
in an editorENTRADA_NAMESERVERS
variable in the Docker compose file. For this test use the value test-ns
.docker-compose up
, watch the log file $ENTRADA_HOME/log/*.log for errors.test-ns
in $ENTRADA_HOME/inputENTRADA expects the input directory to contain a sub-directory for each name server.
Each name server sub-directory should use the following format <ns>_<anycast_site>
, the ns
and anycast_site
parts will be extracted and the ns
part is used to partition the Parquet data on name server name value. The anycast_site
part will be saved in the server_location
column of the dns table.
There are also example scripts for the following combinations.
Script | Database | Mode | Description |
---|---|---|---|
H2+AWS | H2 | AWS | Use only for testing and evaluation |
H2+Local | H2 | Local | Use only for testing and evaluation |
PostgreSQL+AWS | PostgreSQL | AWS | Can be used for production |
PostgreSQL+Hadoop | PostgreSQL | Hadoop | Standard Hadoop, can be used for production |
PostgreSQL+Secure Hadoop | PostgreSQL | Hadoop | Secure Hadoop (Kerberos), can be used for production |
PostgreSQL+Local | PostgreSQL | Local | Local storage, can be used for production |
ENTRADA supports multiple modes of operation
In local
mode all input and output directories must be local and no SQL-engine is used.
Use this mode if you want to save the created Parquet files on the local system.
In AWS
mode the output directory must be on S3, and Athena is used as a SQL-engine.
The input and archive directories may be on S3 but may also be on the local filesystem.
ENTRADA can create a bucket and configure the correct security settings ( Access control and encryption).
In Hadoop
mode the output directory must be on HDFS, and Impala is used as a SQL-engine.
The input and archive directories may be on HDFS but may also be on the local filesystem.
ENTRADA requires a database to persist information about processed files and created database partitions. There is support for both PostgreSQL and H2,
H2 is a fast database that is useful for testing scenarios, you should only use it when evaluating or testing ENTRADA functionality.
PostgreSQL should be used when ENTRADA is deployed in a production environment.