Configuration

All configuration options can be set by using environment options in the Docker Compose file.
Most of the options have sensible defaults and do not need to be changed.

Filesystem

When running using Docker, these paths refer to internal container location. Use the docker-composer volumes section to map an internal container location to location on the host running the container.
When a path is allowed to be on S3 or HDFS is must start with the scheme prefix, e.g. s3://

Option Default HDFS S3 Local Description
ENTRADA_LOCATION_CONF /entrada/data/conf N N Y Contains Hadoop config files
ENTRADA_LOCATION_LOG /entrada/data/log N N Y Contains log files
ENTRADA_LOCATION_WORK /entrada/data/work N N Y Contains temporary files
ENTRADA_LOCATION_INPUT /entrada/data/input Y Y Y Contains pcap input data
ENTRADA_LOCATION_OUTPUT /entrada/data/output Y Y Y Parquet files are written to here
ENTRADA_LOCATION_ARCHIVE /entrada/data/archive Y Y Y Archive location for pcap files

Database

Use either a H2 or PostgreSQL database, if not database is explicitly configured then the default H2 database is enabled and will create a database file in the work directory. When using H2 none of the options below have to be used, only use these options when you want to use the PostgreSQL database.

Option Default Required Description
SPRING_DATASOURCE_URL - N JDBC connection url
SPRING_DATASOURCE_USERNAME - N username
SPRING_DATASOURCE_PASSWORD - N password
SPRING_JPA_DATABASE_PLATFORM org.hibernate.dialect.H2Dialect N Database dialect
SPRING_DATASOURCE_DRIVER_CLASS_NAME org.h2.Driver N Driver class name

ENTRADA

Option Default Required Description
ENTRADA_ENGINE local N Operating mode to enable, local, aws or hadoop
ENTRADA_NAMESERVERS - N Comma separated list of name server to use
ENTRADA_DATABASE_NAME entrada N Name of the ENTRADA database
ENTRADA_DATABASE_TABLE_DNS dns N Name of the DNS table in the ENTRADA database
ENTRADA_DATABASE_TABLE_ICMP icmp N Name of the ICMP table in the ENTRADA database
ENTRADA_INPUT_FILE_SKIPFIRST false N Skip newest input file
ENTRADA_EXECUTION_DELAY 60 N Execute pcap processing every x minutes
ENTRADA_PARQUET_MAX 3000000 N Max number of rows per Parquet files
ENTRADA_CACHE_TIMEOUT 10 N Timeout (minutes) for cached DNS queries
ENTRADA_CACHE_TIMEOUT_TCP_FLOWS 10 N Timeout (minutes) for cached TCP flows
ENTRADA_CACHE_TIMEOUT_IP_FRAGMENTED 10 N Timeout (minutes) for cached IP fragments
ENTRADA_INPUTSTREAM_BUFFER 64 N Read buffer size in KB
ENTRADA_ICMP_ENABLE true N Enable ICMP processing
ENTRADA_PCAP_ARCHIVE_MODE archive N Archive mode to use: archive, delete or none
ENTRADA_PARQUET_COMPACTION_ENABLED true N Enable Parquet file compaction
ENTRADA_PARQUET_COMPACTION_INTERVAL 5 N Interval (minutes) between compaction checks
ENTRADA_PARQUET_COMPACTION_AGE 120 N Minimal time (minutes) a partition is not written to before it can be compacted
ENTRADA_MAINTENANCE_INTERVAL 3600 N Interval (minutes) between maintenance job execution
ENTRADA_DATABASE_FILES_MAX_AGE 10 N Max age (days) for files in file archive database table
ENTRADA_ARCHIVE_FILES_MAX_AGE 3 N Max age (days) to keep archived pcap-files

The ENTRADA_NAMESERVERS option must contain the name server sub-directories in the input directory.
ENTRADA expects the input directory to contain a sub-directory for each name server. Each name server sub-directory should use the following format “<ns>_<anycast_site>”, the “ns” and “anycast_site” parts will be extracted and the “ns” part will be used to partition the Parquet data on name server name. The “anycast_site” part will be saved in the “server_location” column of the dns table.

ENTRADA_PCAP_ARCHIVE_MODE can have any of the following values:
archive: move pcap file to archive location
delete: delete pcap file
none: no action taken

The database partition for the current day is not compacted until the next day + ENTRADA_PARQUET_COMPACTION_AGE

To remove old processed pcap-files, use the ENTRADA_ARCHIVE_FILES_MAX_AGE. When AWS is used a Lifecycly policy will be created. When Hadoop is used, ENTRADA will scan for old files and delete these.

AWS

Option Default Required Description
AWS_ACCESS_KEY_ID - Y Secret AWS access key id
AWS_SECRET_KEY - Y Secret AWS access key
AWS_BUCKET - Y S3 bucket that should be created by ENTRADA
AWS_ENCRYPTION true N Use S3 encryption
CLOUD_AWS_STACK_AUTO false N Disable Spring Boot Cloudformation in Spring Cloud AWS
CLOUD_AWS_REGION_STATIC eu-west-1 N AWS region to use
CLOUD_AWS_CREDENTIALS_USEDEFAULTAWSCREDENTIALSCHAIN true N AWS authentication config
AWS_UPLOAD_MULTIPART_MB_SIZE 5 N Size of parts when doing uploading to S3
AWS_UPLOAD_PARALLELISM 10 N # of threads to use when uploading to S3
AWS_UPLOAD_UPLOAD_STORAGE_CLASS STANDARD_IA N S3 storage class for generated Parquet files
AWS_UPLOAD_ARCHIVE_STORAGE_CLASS STANDARD_IA N S3 storage class for archived pcap files
ATHENA_WORKGROUP primary N Athena workgroup to use
ATHENA_DRIVER_NAME com.simba.athena.jdbc.Driver N Driver class name
ATHENA_URL jdbc:awsathena://AwsRegion=${cloud.aws.region.static} N JDBC connection url
ATHENA_OUTPUT_LOCATION s3://${aws.bucket}/entrada-athena-output/ N Location for Athene results
ATHENA_OUTPUT_EXPIRATION 2 N How many days to keep Athena query results on S3
ATHENA_LOG_LEVEL 4 N Athena log level
ATHENA_LOG_PATH /entrada/data/work/athena_logs N location of log files

See Managing Access Keys for IAM Users for how to create AWS Access keys for IAM users.

For more information about the CLOUDAWS* options, see Spring Cloud docs

Athena logging, will only be enabled when LOGGING_LEVEL_NL_SIDNLABS is set to debug level.

Hadoop

When using Hadoop make sure the core-site.xml and hdfs-site.xml are available in the conf directory.
And if Kerberos authentication is used, also make sure krb5.conf and jaas.conf are also in the conf directory.

Option Default Required Description
HDFS_NAMESERVICE_HOST - Y HDFS name node
IMPALA_DAEMON_HOST - Y Impala deamon host
HDFS_USERNAME hdfs Y HDFS username for upload
HDFS_DATA_OWNER impala Y HDFS user that is owner of parquet files
HDFS_DATA_GROUP hive Y HDFS group that has access to parquet files
KERBEROS_REALM - N Kerberos REALM (when KRB is used)
KERBEROS_KEYTAB - N Kerberos KEYTAB file (when KRB is used)

The Impala JDBC connection URL is created automatically based on the the above option values.

IP database

ENTRADA will lookup the AS-number and Geographical location (country) for each source IP-address. This information is supplied by Maxmind, ENTRADA will automatically download the free version of the Maxmind databases during startup.

If you have a paid Maxmind subscription you can provide the license key using the GEOIP_MAXMIND_LICENSE_KEY option. ENTRADA will then download and use the paid version.

Option Default Required Description
GEOIP_MAXMIND_AGE_MAX 30 Y if days since last update database > x then update
GEOIP_MAXMIND_URL_COUNTRY free URL Y URL to country db.
GEOIP_MAXMIND_URL_ASN free URL Y URL to ASN db.
GEOIP_MAXMIND_URL_COUNTRY_PAID subscription URL Y URL to paid country db.
GEOIP_MAXMIND_URL_ASN_PAID subscription URL Y URL to paid ASN db.
GEOIP_MAXMIND_LICENSE_KEY - N License key

METRICS

Metrics are disabled by default, use MANAGEMENT_METRICS_EXPORT_GRAPHITE.ENABLED to enable metrics.

Option Default Required Description
MANAGEMENT_METRICS_EXPORT_GRAPHITE_ENABLED false N Enable metrics
MANAGEMENT_METRICS_EXPORT_GRAPHITE_HOST - Y Graphite hostname
MANAGEMENT_METRICS_EXPORT_GRAPHITE_PREFIX entrada N Prefix for all metrics
MANAGEMENT_METRICS_ENABLE_JVM true N enable JVM metrics
MANAGEMENT_METRICS_ENABLE_PROCESS true N enable process metrics
MANAGEMENT_METRICS_ENABLE_SYSTEM true N enable system metrics
MANAGEMENT_METRICS_ENABLE_TOMCAT false N enable Tomcat metrics
MANAGEMENT_METRICS_ENABLE_HIKARICP false N enable Hikari metrics
MANAGEMENT_METRICS_ENABLE_JDBC false N enable JDBC metrics
MANAGEMENT_METRICS_ENABLE_LOGBACK false N enable Logback metrics

Logging

Option Default Required Description
LOGGING_LEVEL_NL_SIDNLABS info N log level of ENTRADA app

Web server

Option Default Required Description
SERVER_PORT 8080 N Port ENTRADA app should listen to