Schema compatibility

The new 2.x database schema is not compatible with theParquet files generated by the 0.x version of ENTRADA. Apache Impala uses index-based column indexing and this breaks when using the new schema because ENTRADA 2.x added and removed columns.

The fix for this is to make sure Impala uses named-based indexing, this can be enabled using the PARQUET_FALLBACK_SCHEMA_RESOLUTION option.

Upgrade steps

ENTRADA will automatically create the database schema if it does not yet exist, to upgrade an existing ENTRADA 0.x installation:

  • Drop the old schema and tables. Because these are external tables, the data files will not be deleted from HDFS.
  • Configure HDFS data location, to make sure ENTRADA uses the correct HDFS location when recreating the database schema, use the ENTRADA_LOCATION_OUTPUT option: ENTRADA_LOCATION_OUTPUT=hdfs://hadoop-example.com:8020/user/hive/queries