Skip to main content

Indexing Settings

The Indexing settings tab contains all configuration related to event and property catalog creation.

image

Options​

  • Lookback window: Mitzu only indexes the most recent events in your data warehouse. The lookback window setting defines the time window the indexing process looks at.

  • Sample size: Mitzu only picks a small sample of events for indexing. Increasing the sample size improves indexing precision but makes the indexing process longer.

  • Bucketed column table indexing: by default, Mitzu efficiently indexes any table using a single SQL query. However, this process takes longer if your data warehouse contains wide tables (tables with many columns). Processing the table in buckets of columns can improve performance, especially in data lakes with Parquet or ORC files.

  • Bucketed event indexing: by default, Mitzu efficiently indexes any table using a single SQL query. However, this process can run into limitations if your event tables contain too many events (1,000+). Indexing every event in these tables at once may hit the data warehouse's limits. Indexing the events in buckets can avoid these limits. This setting controls the bucket size used for indexing.

  • Multi-step indexing: by default, Mitzu efficiently indexes any table using a single SQL query that samples the table. However, this SQL can run into limitations if the selected sample has too many rows. This setting splits sample selection across multiple executions. The number of executions equals the value in the input box.

  • Dimension table sample size: Mitzu only picks a small sample of rows for indexing. Increasing the sample size improves indexing precision but makes the indexing process longer.

  • Data scrambling: by default, Mitzu indexing reads the data warehouse in its default order (no ordering), which may result in skewed data reads. Data scrambling randomizes the reading order. This setting randomizes the indexing but makes it slower.

Changes are saved automatically.