Supported Data Types

Page lists the supported data types for each data warehouse

High-level overview

Mitzu distinguishes several data type categories

  • scalar - numbers with floating point precision, integers

  • text - strings, varchars, and chars

  • date - date types

  • timestamp - timestamp types with or without timezones

  • map/json - dynamically varying complex type

  • struct - static complex types

  • array - repeated value type

  • exotic types - UUID, Blob, etc.

Generic type support

Mitzu generally supports all type categories except "Exotic types" for all data warehouse or data lake types.

TypeSupport

Scalar

Text

Date

Timestamp

Map / JSON

Struct

Array

Exotic types

Exotic types support is not added as every data warehouse have different set of exotic types.

Type recommendations per use-case

In this section, we are describing our recommended types per use case.

Data type for user ids

Generally, the data communities have agreed to use UUID4 (or later versions once released) or CUIDs to identify users in data. For simplicity, we recommend using UUID4s. Except for a single character in the middle, these are strings of random characters.

The recommended type category for user ids is text.

Data type for event times

Event times can mean 4 things in product event tracking:

  • client time - the client's timestamp for logging the event

  • client upload time - the client's timestamp for sending the event to a server or CDP

  • server time - the timestamp for the server receiving the event.

  • corrected client time - the calculated timestamp based on the client time and the difference between the server time and client upload time.

Generally, the recommendation is to use the corrected client time for product analytics. Most CDPs store the corrected client timestamp as the client time in the data warehouse.

The recommended type category for event times is the timestamp (without timezones)

Data type for event names

Event names should be human-readable or quasi-human-readable text.

The recommended type category for event names is text.

Data type for date partitions

Date partitions help efficiently read the data from a data lake. Data lakes store files in "folders" that have formats, e.g.:

/date=2024-04-01/raw_data.parquet

Naturally, it would come to that date partitions should have date type. However, experience shows that many data teams keep this information in text format in the table definition. Text format is acceptable if the date string is in ISO format.

Mitzu supports both text and date formats for partition columns.

The recommended type category for event names is DATE or TEXT.

Other recommendations

Dynamic complex type support for data warehouses

This section discusses which data types can be used in each data warehouse for storing dynamic types.

Data warehouseSupported dynamic type

Snowflake

Databricks

BigQuery

Trino

Athena

Postgres

Clickhouse

Dynamic complex types are essential for modeling a large number of events with a different set of properties.

Static complex type support

Generally, we don't recommend using static complex types while storing events in a data warehouse or data lake.

These types are common in:

  • Databricks, Athena, Trino - struct types

  • Clickhouse - tupples

Try avoiding these types and use a flat-column hierarchy instead.

Last updated