Skip to main content

Introduction To Data Modeling

info

Mitzu can work on top of RAW data.

It doesn't require any transformations, normalizations, or other modeling. This makes it a great tool for datasets that are ingested into the data warehouse via:

  • ELT/ETL tools, like Fivetran, Airbyte, Rivery, Stitch, etc.
  • CDPs, like Segment, RudderStack, Jitsu, etc.

However, you might have use cases where other tools and services require your data to be modeled. In this section, we will cover the basics of data modeling for Mitzu.

Together, the event tables, dimension tables, and the relationships between them form the semantic layer that Mitzu queries against. Mitzu builds this layer from your warehouse schema during configuration and indexing.

In this section, we will cover the basics of data modeling for Mitzu.

Events and dimensions​

Mitzu works well on top of event data models. If you are familiar with the facts and dimensions model (Star schema), the event data model is very similar.

The main difference is that every fact is always related to a User. This means each "event table" contains two columns:

  • User ID: The ID of the user that performed the event.
  • Event time: The time when the event occurred.

Optionally, you can also add a Group ID column to the event table. This column is used to mark the team or organization the user belongs to.

Every event performed by the user in the product or service must have a single row in the event table.

info

Some event tables might contain an event_name or event_type column. This column represents the name of the event the user performed at a given time.

Event tables that have an event_name column are called multi event tables. They are also referred to as "one big table" or "wide tables". Tables that don't have an event_name column are called single event tables.

success

Mitzu supports both single and multi event tables.

Event properties​

Any column that is present in an event table is considered an event property. If your tables have complex column types such as JSON, MAP, VARIANT, STRUCT, etc., the nested key-value pairs can be considered event properties as well. More on this subject here.

Dimension tables​

Dimension tables contain information about entities in your business. Typically, these are:

  • Orders
  • Users
  • Products
  • Items

These don't have an event_time or event_name column. However, they can have a "primary key" column used to identify the entity.

info

Mitzu currently supports two types of dimension tables:

  • User
  • Group

In the future, we will introduce support for arbitrary dimension tables.

The primary keys in the dimension tables should match the user_id and group_id columns in the event tables.

What Mitzu's semantic layer covers​

Mitzu's semantic layer captures the elements product analytics depends on: events, event properties, filter values, user and group entities, dimension properties, and the relationships between them. The analytics engine uses this layer to construct funnel, retention, and segmentation queries deterministically.