Skip to main content

Supported Data Types

High-level overview

Mitzu distinguishes several data type categories:

  • scalar - numbers with floating point precision, integers
  • text - strings, varchars, and chars
  • date - date types
  • timestamp - timestamp types with or without timezones
  • map/json/variant - dynamically varying complex type
  • struct - static complex types
  • array - repeated value type
  • exotic types - UUID, Blob, etc.

Generic type support

Mitzu generally supports all type categories except "Exotic types" across all data warehouse and data lake types.

TypeSupport
Scalar
Text
Date
Timestamp
Map / JSON / Variant
Struct
Array - only arrays of scalars and arrays of strings
Exotic types (GEO spatial types, UUIDs, Binaries etc)

Exotic types support is not included, as every data warehouse has a different set of exotic types.

Type recommendations per use-case

In this section, we describe our recommended types for each use case.

Data type for user ids

Generally, the data community has agreed to use UUID4 (or later versions once released) or CUIDs to identify users in data. For simplicity, we recommend using UUID4s. Except for a single character in the middle, these are strings of random characters.

The recommended type category for user ids is text.

Data type for event times

Event times can mean four things in product event tracking:

  • client time - the client's timestamp for when the event was logged
  • client upload time - the client's timestamp for when the event was sent to a server or CDP
  • server time - the timestamp for when the server received the event
  • corrected client time - the timestamp calculated from the client time and the difference between the server time and client upload time

Generally, we recommend using the corrected client time for product analytics. Most CDPs store the corrected client timestamp as the client time in the data warehouse.

The recommended type category for event times is timestamp (without timezones).

Data type for event names

Event names should be human-readable or quasi-human-readable text.

The recommended type category for event names is text.

Data type for date partitions

Date partitions help efficiently read data from a data lake. Data lakes store files in "folders" with formats such as:

/date=2024-04-01/raw_data.parquet

Naturally, you might assume that date partitions should use a date type. However, experience shows that many data teams keep this information in text format in the table definition. Text format is acceptable as long as the date string is in ISO format.

Mitzu supports both text and date formats for partition columns.

The recommended type category for date partitions is DATE or TEXT.

Other recommendations

Dynamic complex type support for data warehouses

This section discusses which data types can be used in each data warehouse for storing dynamic types.

Data warehouseSupported dynamic type
Snowflake Flat Object type - no nesting
Databricks Map<TEXT,SCALAR> or MAP<TEXT, TEXT>
BigQuery JSON Strings
Trino Map<TEXT,SCALAR> or MAP<TEXT, TEXT>
Athena Map<TEXT,SCALAR> or MAP<TEXT, TEXT>
Postgres Json or jsonb
Clickhouse Json types

Dynamic complex types are essential for modeling a large number of events with different sets of properties.

Static complex type support

Generally, we don't recommend using static complex types when storing events in a data warehouse or data lake.

These types are common in:

  • Databricks, Athena, Trino - struct types
  • Clickhouse - tuples

Try to avoid these types and use a flat-column hierarchy instead.