All configuration options can be set by using environment options in the Docker Compose file.
Most of the options have sensible defaults and do not need to be changed.
When running using Docker, these paths refer to internal container location. Use the docker-composer volumes
section to map an internal container location to location on the host running the container.
When a path is allowed to be on S3 or HDFS is must start with the scheme prefix, e.g. s3://
Option | Default | HDFS | S3 | Local | Description |
---|---|---|---|---|---|
ENTRADA_LOCATION_CONF | /entrada/data/conf | N | N | Y | Contains Hadoop config files |
ENTRADA_LOCATION_LOG | /entrada/data/log | N | N | Y | Contains log files |
ENTRADA_LOCATION_WORK | /entrada/data/work | N | N | Y | Contains temporary files |
ENTRADA_LOCATION_INPUT | /entrada/data/input | Y | Y | Y | Contains pcap input data |
ENTRADA_LOCATION_OUTPUT | /entrada/data/output | Y | Y | Y | Parquet files are written to here |
ENTRADA_LOCATION_ARCHIVE | /entrada/data/archive | Y | Y | Y | Archive location for pcap files |
Use either a H2 or PostgreSQL database, if not database is explicitly configured then the default H2 database is enabled and will create a database file in the work directory. When using H2 none of the options below have to be used, only use these options when you want to use the PostgreSQL database.
Option | Default | Required | Description |
---|---|---|---|
SPRING_DATASOURCE_HIKARI_JDBCURL | - | N | JDBC connection url |
SPRING_DATASOURCE_HIKARI_USERNAME | - | N | username |
SPRING_DATASOURCE_HIKARI_PASSWORD | - | N | password |
If PostgreSQL is installed on the host running the container and a connection must be made from ENTRADA running inside the container to the PostgreSQL server then the hostname used for SPRING_DATASOURCE_HIKARI_JDBCURL
must not be ‘localhost’.
Because localhost in this context will be localhost of the container and not localhost of the host running the container.
Also make sure the PostgreSQL configuration allows connections from non-localhost clients, configure this in the PostgreSQL pg_hba.conf file. For example, add the following line to allow IP address 192.68.1.1 to connect to PostgreSQL.
host all all 192.68.1.1/32 md5
For more info about PostgreSQL authentication see the PostgreSQL docs
ENTRADA uses the Hikari database connection pool library, see the Hikari Github page for more available configuration options.
Option | Default | Required | Description |
---|---|---|---|
ENTRADA_NODE_MASTER | true | Y | Set 1 node/container as the master node |
ENTRADA_ENGINE | local | N | Operating mode to enable, local, aws or hadoop |
ENTRADA_NAMESERVERS | - | N | Comma separated list of name server to use |
ENTRADA_DATABASE_NAME | entrada | N | Name of the ENTRADA database |
ENTRADA_DATABASE_TABLE_DNS | dns | N | Name of the DNS table in the ENTRADA database |
ENTRADA_DATABASE_TABLE_ICMP | icmp | N | Name of the ICMP table in the ENTRADA database |
ENTRADA_INPUT_FILE_SKIPFIRST | false | N | Skip newest input file |
ENTRADA_EXECUTION_DELAY | 60 | N | Process new pcap data every x seconds |
ENTRADA_PARQUET_MAX | 3000000 | N | Max number of rows per Parquet files |
ENTRADA_CACHE_TIMEOUT | 2 | N | Timeout (seconds) for cached DNS queries |
ENTRADA_CACHE_TIMEOUT_TCP_FLOWS | 3 | N | Timeout (seconds) for cached TCP flows |
ENTRADA_CACHE_TIMEOUT_IP_FRAGMENTED | 2 | N | Timeout (seconds) for cached IP fragments |
ENTRADA_INPUTSTREAM_BUFFER | 64 | N | Read buffer size in KB |
ENTRADA_ICMP_ENABLE | true | N | Enable ICMP processing |
ENTRADA_PCAP_ARCHIVE_MODE | archive | N | Archive mode to use: archive, delete or none |
ENTRADA_PARQUET_COMPACTION_ENABLED | true | N | Enable Parquet file compaction |
ENTRADA_PARQUET_COMPACTION_INTERVAL | 5 | N | Interval (minutes) between compaction checks |
ENTRADA_PARQUET_COMPACTION_AGE | 120 | N | Minimal time (minutes) a partition is not written to before it can be compacted |
ENTRADA_MAINTENANCE_INTERVAL | 3600 | N | Interval (minutes) between maintenance job execution |
ENTRADA_DATABASE_FILES_MAX_AGE | 10 | N | Max age (days) for files in file archive database table |
ENTRADA_ARCHIVE_FILES_MAX_AGE | 3 | N | Max age (days) to keep archived pcap-files |
ENTRADA_PRIVACY_ENABLED | false | N | Privacy mode, does not write client IP addresses to file. |
ENTRADA_PRIVACY_PURGE_AGE | 0 | N | When privacy mode = false, then this option signals ENTRADA to delete client IP addresses from older partitions, after x days, 0=disabled |
ENTRADA_PRIVACY_PURGE_INTERVAL | 10 | N | Interval (minutes) between checks if IP addresses must be removed from older partitions. |
ENTRADA_PARQUET_UPLOAD_BATCH | false | N | Upload data only after all input data has been processed. |
ENTRADA_TCP_ENABLE | false | N | Enable/Disable TCP-decoding |
ENTRADA_PARQUET_FILESIZE_MAX | 128 | N | Max size (MB) for a generated Parquet file |
ENTRADA_PARQUET_ROWGROUP_SIZE | 128 | N | Row group size (MB) for generated Parquet file |
ENTRADA_PARQUET_PAGE-ROW_LIMIT | 20000 | N | Max rows per Row group for generated Parquet file |
If you are running multiple ENTRADA Docker instances simultaneously and data is written to the same database then make sure only 1 Docker instance has compaction and privacy purge enabled. For the other containers disable compaction and privacy purge, using the following docker-compose lines:
-ENTRADA_NODE_MASTER=false
-ENTRADA_PARQUET_COMPACTION_ENABLED=false
-ENTRADA_PRIVACY_PURGE_AGE=0
Also make sure ENTRADA_NODE_MASTER
is true for the master instance and false for all the other nodes.
using the following docker-compose lines:
-ENTRADA_NODE_MASTER=true
-ENTRADA_PARQUET_COMPACTION_ENABLED=true
-ENTRADA_PRIVACY_PURGE_AGE=365
If you are running multiple ENTRADA Docker instances simultaneously and data is written to the same database, and you want to use both the compaction and privacy purge functionality. Then make sure that only 1 Docker instance is configured to enable the compaction and privacy purge functionality, otherwise multiple Docker instances could end up writing to the same partition at the same time.
For the other containers disable compaction, using the following docker-compose line: ENTRADA_PARQUET_COMPACTION_ENABLED=false
and for the 1 container that is still doing compaction add the following line to enable the privacy purge functionality: ENTRADA_PRIVACY_PURGE_AGE=100
. This example will cause ENTRADA to delete IP addresses for data older than 100 days, but only form partitions that are found in the PostgreSQL database entrada_partition
table.
The ENTRADA_NAMESERVERS option must contain the name server sub-directories in the input directory.
ENTRADA expects the input directory to contain a sub-directory for each name server.
Each name server sub-directory should use the following format “<ns>_<anycast_site>”, the “ns” and “anycast_site” parts will be extracted and the “ns” part will be used to partition the Parquet data on name server name. The “anycast_site” part will be saved in the “server_location” column of the dns table.
ENTRADA_PCAP_ARCHIVE_MODE
can have any of the following values:
archive: move pcap file to archive location
delete: delete pcap file
none: no action taken
The database partition for the current day is not compacted until the next day + ENTRADA_PARQUET_COMPACTION_AGE
To remove old processed pcap-files, use the ENTRADA_ARCHIVE_FILES_MAX_AGE
. When AWS is used a Lifecycly policy will be created. When Hadoop is used, ENTRADA will scan for old files and delete these.
ENTRADA uses Akka Streams to distribute data processing across multiple CPU-cores, see below for an overview of the flow graph that is created. The following options can be used to configure the runtime performance and resource usage of this flow graph. The optimal value for each of these settings will depend on the available resources on the system running ENTRADA. Start with the defaults and the tune these to get to the optimal performance.
Option | Default | Required | Description |
---|---|---|---|
ENTRADA_ROW_DECODER_COUNT | 10 | N | Number of parallel IP-packet decoders |
ENTRADA_WRITER_DNS_COUNT | 1 | N | Number of parallel Parquet writers for DNS output |
ENTRADA_WRITER_ICMP_COUNT | 1 | N | Number of parallel Parquet writers for ICMP output |
ENTRADA_ROW_BUILDER_DNS_COUNT | 10 | N | Number of parallel DNS row builder to use |
ENTRADA_ROW_BUILDER_ICMP_COUNT | 1 | N | Number of parallel ICMP row builder to use |
ENTRADA_STREAM_BUFFER | 200 | N | Size of buffer placed before each async operation |
ENTRADA_STREAM_THREAD_THROUGPUT | 10 | N | Number of messages each thread may process before switching to another thread |
ENTRADA_STREAM_THREAD_COUNT | 3 | N | Size of thread pool used by Akka Streaming to execute the flow |
Keep the ENTRADA_STREAM_THREAD_COUNT
value as low as possible to make sure there are not too many thread context switches.
Having too many threads will negatively impact performance. Use a profiler such as VisualVM to see if the number of threads is correct, the threads should be in the ‘running’ state most of the time.
Option | Default | Required | Description |
---|---|---|---|
AWS_ACCESS_KEY_ID | - | Y | Secret AWS access key id |
AWS_SECRET_KEY | - | Y | Secret AWS access key |
AWS_BUCKET | - | Y | S3 bucket that should be created by ENTRADA |
AWS_ENCRYPTION | true | N | Use S3 encryption |
CLOUD_AWS_STACK_AUTO | false | N | Disable Spring Boot Cloudformation in Spring Cloud AWS |
CLOUD_AWS_REGION_STATIC | eu-west-1 | N | AWS region to use |
CLOUD_AWS_CREDENTIALS_USEDEFAULTAWSCREDENTIALSCHAIN | true | N | AWS authentication config |
AWS_UPLOAD_MULTIPART_MB_SIZE | 5 | N | Size of parts when doing uploading to S3 |
AWS_UPLOAD_PARALLELISM | 10 | N | # of threads to use when uploading to S3 |
AWS_UPLOAD_UPLOAD_STORAGE_CLASS | STANDARD_IA | N | S3 storage class for generated Parquet files |
AWS_UPLOAD_ARCHIVE_STORAGE_CLASS | STANDARD_IA | N | S3 storage class for archived pcap files |
ATHENA_WORKGROUP | primary | N | Athena workgroup to use |
ATHENA_DRIVER_NAME | com.simba.athena.jdbc.Driver | N | Driver class name |
ATHENA_URL | jdbc:awsathena://AwsRegion=${cloud.aws.region.static} | N | JDBC connection url |
ATHENA_OUTPUT_LOCATION | s3://${aws.bucket}/entrada-athena-output/ | N | Location for Athene results |
ATHENA_OUTPUT_EXPIRATION | 2 | N | How many days to keep Athena query results on S3 |
ATHENA_LOG_LEVEL | 4 | N | Athena log level |
ATHENA_LOG_PATH | /entrada/data/work/athena_logs | N | location of log files |
See Managing Access Keys for IAM Users for how to create AWS Access keys for IAM users.
For more information about the CLOUDAWS* options, see Spring Cloud docs
Athena logging, will only be enabled when LOGGING_LEVEL_NL_SIDNLABS
is set to debug level.
When using Hadoop make sure the core-site.xml and hdfs-site.xml are available in the conf directory.
And if Kerberos authentication is used, also make sure krb5.conf and jaas.conf are also in the conf directory.
Option | Default | Required | Description |
---|---|---|---|
HDFS_NAMESERVICE_HOST | - | Y | HDFS name node |
IMPALA_DAEMON_HOST | - | Y | Impala deamon host |
HDFS_USERNAME | hdfs | Y | HDFS username for upload |
HDFS_DATA_OWNER | impala | Y | HDFS user that is owner of parquet files |
HDFS_DATA_GROUP | hive | Y | HDFS group that has access to parquet files |
KERBEROS_REALM | - | N | Kerberos REALM (when KRB is used) |
KERBEROS_KEYTAB | - | N | Kerberos KEYTAB file (when KRB is used) |
IMPALA_SSL | 0 | Y | Use SSL to connect to Impala (1= SSL on) |
The Impala JDBC connection URL is created automatically based on the the above option values.
In case of a timeout when connecting to Impala, check the SSL config, when using Impala + SSL make sure
that configuration option IMPALA_SSL
has a value of 1.
ENTRADA will lookup the AS-number and Geographical location (country) for each source IP-address. This information is supplied by Maxmind, ENTRADA will automatically download the free or paid version of the Maxmind databases during startup.
Provide the license key using the GEOIP_MAXMIND_LICENSE_FREE
or GEOIP_MAXMIND_LICENSE_PAID
option. ENTRADA will then download and use the correct version.
Option | Default | Required | Description |
---|---|---|---|
GEOIP_MAXMIND_AGE_MAX | 30 | N | if days since last update database > x then update |
GEOIP_MAXMIND_URL_COUNTRY | URL for the free GeoLite2 country database | N | |
GEOIP_MAXMIND_URL_ASN | URL for the free GeoLite2 ASN database | N | |
GEOIP_MAXMIND_URL_COUNTRY_PAID | URL for the paid GeoIP2 country database | N | |
GEOIP_MAXMIND_URL_ASN_PAID | URL for the paid GeoIP2-ISP ASN database | N | |
GEOIP_MAXMIND_LICENSE_FREE | - | N | License/API key for the free version |
GEOIP_MAXMIND_LICENSE_PAID | - | N | License/API key for the paid version |
Metrics are disabled by default, use MANAGEMENT_METRICS_EXPORT_GRAPHITE.ENABLED
to enable metrics.
Option | Default | Required | Description |
---|---|---|---|
MANAGEMENT_METRICS_EXPORT_GRAPHITE_ENABLED | false | N | Enable metrics |
MANAGEMENT_METRICS_EXPORT_GRAPHITE_HOST | - | Y | Graphite hostname |
MANAGEMENT_METRICS_EXPORT_GRAPHITE_PREFIX | entrada | N | Prefix for all metrics |
MANAGEMENT_METRICS_ENABLE_JVM | true | N | enable JVM metrics |
MANAGEMENT_METRICS_ENABLE_PROCESS | true | N | enable process metrics |
MANAGEMENT_METRICS_ENABLE_SYSTEM | true | N | enable system metrics |
MANAGEMENT_METRICS_ENABLE_TOMCAT | false | N | enable Tomcat metrics |
MANAGEMENT_METRICS_ENABLE_HIKARICP | false | N | enable Hikari metrics |
MANAGEMENT_METRICS_ENABLE_JDBC | false | N | enable JDBC metrics |
MANAGEMENT_METRICS_ENABLE_LOGBACK | false | N | enable Logback metrics |
Option | Default | Required | Description |
---|---|---|---|
LOGGING_LEVEL_NL_SIDNLABS | info | N | log level of ENTRADA app |
Option | Default | Required | Description |
---|---|---|---|
SERVER_PORT | 8080 | N | Port ENTRADA app should listen to |