Indexing
Before creating the first insight, Mitzu must collect information about how you structured the events in your data warehouse. We call this process indexing.
Stored data​
To generate the insights and Catalog pages, Mitzu collects the following information during the indexing process:
- events: Lists all events Mitzu finds in the configured event tables. Mitzu will use it to generate the insights and catalog pages.
- event properties: Lists all event properties assigned to each found event.
- event filter values: Lists the possible values of each found event property.
- dimension properties: Lists all dimension properties that Mitzu finds in the configured dimension tables.
- dimension filter values: List of the possible values of each found event property.
Mitzu will never copy data from your data warehouse or fetch data not required to generate the forms in the Mitzu web app.
For performance reasons, the number of Event filter values
and Dimension filter values
is limited to 500. For example, Mitzu won't store the possible values of the event time property due to their high cardinality. If you contact mitzu support, we will increase this limit.
Event table indexing​
When you add a new event table or re-index an existing table, then Mitzu will fetch a sample from the event table and store the events found in this sample. You can configure the sample size in several ways:
- Mitzu calculates the time window of the sample that it will end on the Default End Date Config and begin N days earlier, which you can set as the Lookback Days.
- You can configure the sample size on the indexing settings page.
- You can enable Data Scrambling to randomize the order of the queried records.
You should configure the sample size to maximize the cardinality of your events in the sample; otherwise, Mitzu will not recognize some of your events. If you need help with this configuration, please contact support at support@mitzu.io.
Once Mitzu has identified your events, it will index each event's available properties. Like event indexing, Mitzu will fetch a sample of the events containing all event properties. This sampling uses the same configuration as event sampling. If your event tables contain many columns, consider enabling the Bucketed table indexing. Then, you can configure how many columns Mitzu should index simultaneously, decreasing the indexing time of huge tables.
Mitzu can generate the insights page from the recognized event
and event properties
. Mitzu indexes the event filter values only when needed. For example, when you add a new event filter on the insight page, Mitzu will index the possible values of that specific event property using the same sampling mechanism. Suppose indexing is slow when triggered from the insight page. In that case, you can index all event filter values
by clicking the Re-index all events with filter values
button on the catalog page.
Dimension table indexing​
When you add a new dimension table, Mitzu will fetch a sample from the dimension table and store the dimension properties in this sample. You can configure the sample size in several ways:
- You can configure the dimension sample size on the indexing settings page.
- You can enable Data Scrambling to randomize the order of the queried records.
- You can enable Bucketed table indexing. This way, the sample will contain a randomized selection of rows from the source table.
You should configure the sample size to maximize the cardinality of your dimension properties in the sample; otherwise, Mitzu will not recognize some of your properties. If you need help with this configuration, please contact support at support@mitzu.io.
Mitzu can generate the insights page from the recognized 'dimension properties'. Mitzu indexes the dimension filter values only when needed. For example, when you add a new dimension filter on the insight page, Mitzu will index the possible values of that specific event property using the same sampling mechanism.