Skip to main content

Indexing

Before creating the first insight, Mitzu must collect information about how you structured the events in your data warehouse. We call this process indexing. Indexing populates Mitzu's semantic layer with the events, properties, and filter values discovered in your warehouse, and keeps it in sync as the underlying data changes.

Stored data​

To generate the insights and Catalog pages, Mitzu collects the following information during the indexing process:

  • events: Lists all events Mitzu finds in the configured event tables. Mitzu uses this to generate the insights and catalog pages.
  • event properties: Lists all event properties assigned to each event found.
  • event filter values: Lists the possible values of each event property found.
  • dimension properties: Lists all dimension properties that Mitzu finds in the configured dimension tables.
  • dimension filter values: Lists the possible values of each dimension property found.

Together, these elements form the runtime semantic layer the analytics engine uses to answer questions and construct queries.

success

Mitzu will never copy data from your data warehouse or fetch data not required to generate the forms in the Mitzu web app.

info

For performance reasons, the number of Event filter values and Dimension filter values is limited to 500. For example, Mitzu won't store the possible values of the event time property due to its high cardinality. If you contact Mitzu support, we will increase this limit.

Event table indexing​

When you add a new event table or re-index an existing table, Mitzu fetches a sample from the event table and stores the events found in this sample. You can configure the sample size in several ways:

info

You should configure the sample size to maximize the cardinality of your events in the sample; otherwise, Mitzu will not recognize some of your events. If you need help with this configuration, please contact support at support@mitzu.io.

Once Mitzu has identified your events, it indexes each event's available properties. As with event indexing, Mitzu fetches a sample of the events containing all event properties. If your event tables contain many columns, consider enabling Bucketed table indexing. You can then configure how many columns Mitzu should index simultaneously, decreasing the indexing time of large tables.

Mitzu can generate the insights page from the recognized events and event properties.

Dimension table indexing​

When you add a new dimension table, Mitzu fetches a sample from the dimension table and stores the dimension properties found in this sample. You can configure the sample size in several ways:

  • You can configure the dimension sample size on the indexing settings page.
  • You can enable Data Scrambling to randomize the order of the queried records.
  • You can enable Bucketed table indexing. This way, the sample will contain a randomized selection of rows from the source table.
info

You should configure the sample size to maximize the cardinality of your dimension properties in the sample; otherwise, Mitzu will not recognize some of your properties. If you need help with this configuration, please contact support at support@mitzu.io.

Mitzu can generate the insights page from the recognized dimension properties. Mitzu only indexes dimension filter values when needed. For example, when you add a new dimension filter on the insight page, Mitzu will index the possible values of that specific dimension property using the same sampling mechanism.