Because new data arrives in relatively small batches (5-10 minute pcap files) it is not possible to create optimally sized Parquet files. That is why all new data is written into many small Parquet files. When the table partition is no longer active (no new data added for a configurable period) then the small files are combined into a smaller number of larger files, this process is called compaction
.
During this process the query_ts
column is also updated, before compaction this column has a null value.