Configuration

All configuration options can be set by using environment options in the Docker Compose file.
Most of the options have sensible defaults and do not need to be changed.

Filesystem

When running using Docker, these paths refer to internal container location. Use the docker-composer volumes section to map an internal container location to location on the host running the container.
When a path is allowed to be on S3 or HDFS is must start with the scheme prefix, e.g. s3://

Option Default HDFS S3 Local Description
ENTRADA_LOCATION_CONF /entrada/data/conf N N Y Contains Hadoop config files
ENTRADA_LOCATION_LOG /entrada/data/log N N Y Contains log files
ENTRADA_LOCATION_WORK /entrada/data/work N N Y Contains temporary files
ENTRADA_LOCATION_INPUT /entrada/data/input Y Y Y Contains pcap input data
ENTRADA_LOCATION_OUTPUT /entrada/data/output Y Y Y Parquet files are written to here
ENTRADA_LOCATION_ARCHIVE /entrada/data/archive Y Y Y Archive location for pcap files

Database

Use either a H2 or PostgreSQL database, if not database is explicitly configured then the default H2 database is enabled and will create a database file in the work directory. When using H2 none of the options below have to be used, only use these options when you want to use the PostgreSQL database.

Option Default Required Description
SPRING_DATASOURCE_HIKARI_JDBCURL - N JDBC connection url
SPRING_DATASOURCE_HIKARI_USERNAME - N username
SPRING_DATASOURCE_HIKARI_PASSWORD - N password

If PostgreSQL is installed on the host running the container and a connection must be made from ENTRADA running inside the container to the PostgreSQL server then the hostname used for SPRING_DATASOURCE_HIKARI_JDBCURL must not be ‘localhost’.
Because localhost in this context will be localhost of the container and not localhost of the host running the container.
Also make sure the PostgreSQL configuration allows connections from non-localhost clients, configure this in the PostgreSQL pg_hba.conf file. For example, add the following line to allow IP address 192.68.1.1 to connect to PostgreSQL.
host all all 192.68.1.1/32 md5 For more info about PostgreSQL authentication see the PostgreSQL docs

ENTRADA uses the Hikari database connection pool library, see the Hikari Github page for more available configuration options.

ENTRADA

Option Default Required Description
ENTRADA_NODE_MASTER true Y Set 1 node/container as the master node
ENTRADA_ENGINE local N Operating mode to enable, local, aws or hadoop
ENTRADA_NAMESERVERS - N Comma separated list of name server to use
ENTRADA_DATABASE_NAME entrada N Name of the ENTRADA database
ENTRADA_DATABASE_TABLE_DNS dns N Name of the DNS table in the ENTRADA database
ENTRADA_DATABASE_TABLE_ICMP icmp N Name of the ICMP table in the ENTRADA database
ENTRADA_INPUT_FILE_SKIPFIRST false N Skip newest input file
ENTRADA_EXECUTION_DELAY 60 N Process new pcap data every x seconds
ENTRADA_PARQUET_MAX 3000000 N Max number of rows per Parquet files
ENTRADA_CACHE_TIMEOUT 2 N Timeout (seconds) for cached DNS queries
ENTRADA_CACHE_TIMEOUT_TCP_FLOWS 3 N Timeout (seconds) for cached TCP flows
ENTRADA_CACHE_TIMEOUT_IP_FRAGMENTED 2 N Timeout (seconds) for cached IP fragments
ENTRADA_INPUTSTREAM_BUFFER 64 N Read buffer size in KB
ENTRADA_ICMP_ENABLE true N Enable ICMP processing
ENTRADA_PCAP_ARCHIVE_MODE archive N Archive mode to use: archive, delete or none
ENTRADA_PARQUET_COMPACTION_ENABLED true N Enable Parquet file compaction
ENTRADA_PARQUET_COMPACTION_INTERVAL 5 N Interval (minutes) between compaction checks
ENTRADA_PARQUET_COMPACTION_AGE 120 N Minimal time (minutes) a partition is not written to before it can be compacted
ENTRADA_MAINTENANCE_INTERVAL 3600 N Interval (minutes) between maintenance job execution
ENTRADA_DATABASE_FILES_MAX_AGE 10 N Max age (days) for files in file archive database table
ENTRADA_ARCHIVE_FILES_MAX_AGE 3 N Max age (days) to keep archived pcap-files
ENTRADA_PRIVACY_ENABLED false N Privacy mode, does not write client IP addresses to file.
ENTRADA_PRIVACY_PURGE_AGE 0 N When privacy mode = false, then this option signals ENTRADA to delete client IP addresses from older partitions, after x days, 0=disabled
ENTRADA_PRIVACY_PURGE_INTERVAL 10 N Interval (minutes) between checks if IP addresses must be removed from older partitions.
ENTRADA_PARQUET_UPLOAD_BATCH false N Upload data only after all input data has been processed.
ENTRADA_TCP_ENABLE false N Enable/Disable TCP-decoding
ENTRADA_PARQUET_FILESIZE_MAX 128 N Max size (MB) for a generated Parquet file
ENTRADA_PARQUET_ROWGROUP_SIZE 128 N Row group size (MB) for generated Parquet file
ENTRADA_PARQUET_PAGE-ROW_LIMIT 20000 N Max rows per Row group for generated Parquet file

If you are running multiple ENTRADA Docker instances simultaneously and data is written to the same database then make sure only 1 Docker instance has compaction and privacy purge enabled. For the other containers disable compaction and privacy purge, using the following docker-compose lines:

 -ENTRADA_NODE_MASTER=false
 -ENTRADA_PARQUET_COMPACTION_ENABLED=false
 -ENTRADA_PRIVACY_PURGE_AGE=0

Also make sure ENTRADA_NODE_MASTER is true for the master instance and false for all the other nodes.
using the following docker-compose lines:

-ENTRADA_NODE_MASTER=true
-ENTRADA_PARQUET_COMPACTION_ENABLED=true
-ENTRADA_PRIVACY_PURGE_AGE=365

If you are running multiple ENTRADA Docker instances simultaneously and data is written to the same database, and you want to use both the compaction and privacy purge functionality. Then make sure that only 1 Docker instance is configured to enable the compaction and privacy purge functionality, otherwise multiple Docker instances could end up writing to the same partition at the same time.

For the other containers disable compaction, using the following docker-compose line: ENTRADA_PARQUET_COMPACTION_ENABLED=false and for the 1 container that is still doing compaction add the following line to enable the privacy purge functionality: ENTRADA_PRIVACY_PURGE_AGE=100. This example will cause ENTRADA to delete IP addresses for data older than 100 days, but only form partitions that are found in the PostgreSQL database entrada_partition table.

The ENTRADA_NAMESERVERS option must contain the name server sub-directories in the input directory.
ENTRADA expects the input directory to contain a sub-directory for each name server. Each name server sub-directory should use the following format “<ns>_<anycast_site>”, the “ns” and “anycast_site” parts will be extracted and the “ns” part will be used to partition the Parquet data on name server name. The “anycast_site” part will be saved in the “server_location” column of the dns table.

ENTRADA_PCAP_ARCHIVE_MODE can have any of the following values:
archive: move pcap file to archive location
delete: delete pcap file
none: no action taken

The database partition for the current day is not compacted until the next day + ENTRADA_PARQUET_COMPACTION_AGE

To remove old processed pcap-files, use the ENTRADA_ARCHIVE_FILES_MAX_AGE. When AWS is used a Lifecycly policy will be created. When Hadoop is used, ENTRADA will scan for old files and delete these.

Reactive Streams

ENTRADA uses Akka Streams to distribute data processing across multiple CPU-cores, see below for an overview of the flow graph that is created. The following options can be used to configure the runtime performance and resource usage of this flow graph. The optimal value for each of these settings will depend on the available resources on the system running ENTRADA. Start with the defaults and the tune these to get to the optimal performance.

Option Default Required Description
ENTRADA_ROW_DECODER_COUNT 10 N Number of parallel IP-packet decoders
ENTRADA_WRITER_DNS_COUNT 1 N Number of parallel Parquet writers for DNS output
ENTRADA_WRITER_ICMP_COUNT 1 N Number of parallel Parquet writers for ICMP output
ENTRADA_ROW_BUILDER_DNS_COUNT 10 N Number of parallel DNS row builder to use
ENTRADA_ROW_BUILDER_ICMP_COUNT 1 N Number of parallel ICMP row builder to use
ENTRADA_STREAM_BUFFER 200 N Size of buffer placed before each async operation
ENTRADA_STREAM_THREAD_THROUGPUT 10 N Number of messages each thread may process before switching to another thread
ENTRADA_STREAM_THREAD_COUNT 3 N Size of thread pool used by Akka Streaming to execute the flow

Keep the ENTRADA_STREAM_THREAD_COUNT value as low as possible to make sure there are not too many thread context switches. Having too many threads will negatively impact performance. Use a profiler such as VisualVM to see if the number of threads is correct, the threads should be in the ‘running’ state most of the time.

Stream

AWS

Option Default Required Description
AWS_ACCESS_KEY_ID - Y Secret AWS access key id
AWS_SECRET_KEY - Y Secret AWS access key
AWS_BUCKET - Y S3 bucket that should be created by ENTRADA
AWS_ENCRYPTION true N Use S3 encryption
CLOUD_AWS_STACK_AUTO false N Disable Spring Boot Cloudformation in Spring Cloud AWS
CLOUD_AWS_REGION_STATIC eu-west-1 N AWS region to use
CLOUD_AWS_CREDENTIALS_USEDEFAULTAWSCREDENTIALSCHAIN true N AWS authentication config
AWS_UPLOAD_MULTIPART_MB_SIZE 5 N Size of parts when doing uploading to S3
AWS_UPLOAD_PARALLELISM 10 N # of threads to use when uploading to S3
AWS_UPLOAD_UPLOAD_STORAGE_CLASS STANDARD_IA N S3 storage class for generated Parquet files
AWS_UPLOAD_ARCHIVE_STORAGE_CLASS STANDARD_IA N S3 storage class for archived pcap files
ATHENA_WORKGROUP primary N Athena workgroup to use
ATHENA_DRIVER_NAME com.simba.athena.jdbc.Driver N Driver class name
ATHENA_URL jdbc:awsathena://AwsRegion=${cloud.aws.region.static} N JDBC connection url
ATHENA_OUTPUT_LOCATION s3://${aws.bucket}/entrada-athena-output/ N Location for Athene results
ATHENA_OUTPUT_EXPIRATION 2 N How many days to keep Athena query results on S3
ATHENA_LOG_LEVEL 4 N Athena log level
ATHENA_LOG_PATH /entrada/data/work/athena_logs N location of log files

See Managing Access Keys for IAM Users for how to create AWS Access keys for IAM users.

For more information about the CLOUDAWS* options, see Spring Cloud docs

Athena logging, will only be enabled when LOGGING_LEVEL_NL_SIDNLABS is set to debug level.

Hadoop

When using Hadoop make sure the core-site.xml and hdfs-site.xml are available in the conf directory.
And if Kerberos authentication is used, also make sure krb5.conf and jaas.conf are also in the conf directory.

Option Default Required Description
HDFS_NAMESERVICE_HOST - Y HDFS name node
IMPALA_DAEMON_HOST - Y Impala deamon host
HDFS_USERNAME hdfs Y HDFS username for upload
HDFS_DATA_OWNER impala Y HDFS user that is owner of parquet files
HDFS_DATA_GROUP hive Y HDFS group that has access to parquet files
KERBEROS_REALM - N Kerberos REALM (when KRB is used)
KERBEROS_KEYTAB - N Kerberos KEYTAB file (when KRB is used)
IMPALA_SSL 0 Y Use SSL to connect to Impala (1= SSL on)

The Impala JDBC connection URL is created automatically based on the the above option values.

In case of a timeout when connecting to Impala, check the SSL config, when using Impala + SSL make sure that configuration option IMPALA_SSL has a value of 1.

IP database

ENTRADA will lookup the AS-number and Geographical location (country) for each source IP-address. This information is supplied by Maxmind, ENTRADA will automatically download the free or paid version of the Maxmind databases during startup.

Provide the license key using the GEOIP_MAXMIND_LICENSE_FREE or GEOIP_MAXMIND_LICENSE_PAID option. ENTRADA will then download and use the correct version.

Option Default Required Description
GEOIP_MAXMIND_AGE_MAX 30 N if days since last update database > x then update
GEOIP_MAXMIND_URL_COUNTRY URL for the free GeoLite2 country database N
GEOIP_MAXMIND_URL_ASN URL for the free GeoLite2 ASN database N
GEOIP_MAXMIND_URL_COUNTRY_PAID URL for the paid GeoIP2 country database N
GEOIP_MAXMIND_URL_ASN_PAID URL for the paid GeoIP2-ISP ASN database N
GEOIP_MAXMIND_LICENSE_FREE - N License/API key for the free version
GEOIP_MAXMIND_LICENSE_PAID - N License/API key for the paid version

METRICS

Metrics are disabled by default, use MANAGEMENT_METRICS_EXPORT_GRAPHITE.ENABLED to enable metrics.

Option Default Required Description
MANAGEMENT_METRICS_EXPORT_GRAPHITE_ENABLED false N Enable metrics
MANAGEMENT_METRICS_EXPORT_GRAPHITE_HOST - Y Graphite hostname
MANAGEMENT_METRICS_EXPORT_GRAPHITE_PREFIX entrada N Prefix for all metrics
MANAGEMENT_METRICS_ENABLE_JVM true N enable JVM metrics
MANAGEMENT_METRICS_ENABLE_PROCESS true N enable process metrics
MANAGEMENT_METRICS_ENABLE_SYSTEM true N enable system metrics
MANAGEMENT_METRICS_ENABLE_TOMCAT false N enable Tomcat metrics
MANAGEMENT_METRICS_ENABLE_HIKARICP false N enable Hikari metrics
MANAGEMENT_METRICS_ENABLE_JDBC false N enable JDBC metrics
MANAGEMENT_METRICS_ENABLE_LOGBACK false N enable Logback metrics

Logging

Option Default Required Description
LOGGING_LEVEL_NL_SIDNLABS info N log level of ENTRADA app

Web server

Option Default Required Description
SERVER_PORT 8080 N Port ENTRADA app should listen to