ENTRADA can be deployed using Docker Compose, download one of the example Docker Compose scripts and save it as
docker-compose.yml and then edit the script to configure the environment variables to fit your requirements.
Start the container using the
Follow these steps to deploy ENTRADA using a H2 database and AWS Athena. The H2 database is only recommended for testing and evaluating of ENTRADA.
docker-compose.ymlin a editor
ENTRADA_NAMESERVERS. For this test use the value
AWS_BUCKET. This bucket will be created by ENTRADA.
docker-compose up, watch the log file $ENTRADA_HOME/log/spring.log for errors.
entradadatabase and tables.
entrada.dnstable, A single row should have been added to the table.
ENTRADA expects the input directory to contain a sub-directory for each name server.
Each name server sub-directory should use the following format
anycast_site parts will be extracted and the
ns part will be used to partition the Parquet data on name server name. The
anycast_site part will be saved in the
server_location columns in the dns table.
There are also example scripts for the following combinations.
|H2+AWS||H2||AWS||Use only for testing and evaluation|
|H2+Local||H2||Local||Use only for testing and evaluation|
|PostgreSQL+AWS||PostgreSQL||AWS||Can be used for production|
|PostgreSQL+Hadoop||PostgreSQL||Hadoop||Standard Hadoop, can be used for production|
|PostgreSQL+Secure Hadoop||PostgreSQL||Hadoop||Secure Hadoop (Kerberos), can be used for production|
|PostgreSQL+Local||PostgreSQL||Local||Local storage, can be used for production|
ENTRADA supports multiple modes of operation
local mode all input and output directories must be local and no SQL-engine is used.
Use this mode if you want to save the created Parquet files on the local system.
AWS mode the output directory must be on S3, and Athena is used as a SQL-engine.
The input and archive directories may be on S3 but may also be on the local filesystem.
ENTRADA can create a bucket and configure the correct security settings ( Access control and encryption).
Hadoop mode the output directory must be on HDFS, and Impala is used as a SQL-engine.
The input and archive directories may be on HDFS but may also be on the local filesystem.
ENTRADA requires a database to persist information about processed files and created database partitions. There is support for both PostgreSQL and H2,
H2 is a fast database that is useful for testing scenarios, you should only use it when evaluating or testing ENTRADA functionality.
PostgreSQL should be used when ENTRADA is deployed in a production environment.