Supported Data Types
Page lists the supported data types for each data warehouse
High-level overview
Mitzu distinguishes several data type categories
scalar - numbers with floating point precision, integers
text - strings, varchars, and chars
date - date types
timestamp - timestamp types with or without timezones
map/json - dynamically varying complex type
struct - static complex types
array - repeated value type
exotic types - UUID, Blob, etc.
Generic type support
Mitzu generally supports all type categories except "Exotic types" for all data warehouse or data lake types.
Type | Support |
---|---|
Scalar | |
Text | |
Date | |
Timestamp | |
Map / JSON | |
Struct | |
Array | |
Exotic types |
Exotic types support is not added as every data warehouse have different set of exotic types.
Type recommendations per use-case
In this section, we are describing our recommended types per use case.
Data type for user ids
Generally, the data communities have agreed to use UUID4 (or later versions once released) or CUIDs to identify users in data. For simplicity, we recommend using UUID4s. Except for a single character in the middle, these are strings of random characters.
✅The recommended type category for user ids is text.
Data type for event times
Event times can mean 4 things in product event tracking:
client time - the client's timestamp for logging the event
client upload time - the client's timestamp for sending the event to a server or CDP
server time - the timestamp for the server receiving the event.
corrected client time - the calculated timestamp based on the client time and the difference between the server time and client upload time.
Generally, the recommendation is to use the corrected client time for product analytics. Most CDPs store the corrected client timestamp as the client time in the data warehouse.
✅The recommended type category for event times is the timestamp (without timezones)
Data type for event names
Event names should be human-readable or quasi-human-readable text.
✅The recommended type category for event names is text.
Data type for date partitions
Date partitions help efficiently read the data from a data lake. Data lakes store files in "folders" that have formats, e.g.:
/date=2024-04-01/raw_data.parquet
Naturally, it would come to that date partitions should have date type. However, experience shows that many data teams keep this information in text format in the table definition. Text format is acceptable if the date string is in ISO format.
Mitzu supports both text and date formats for partition columns.
✅The recommended type category for event names is DATE or TEXT.
Other recommendations
Dynamic complex type support for data warehouses
This section discusses which data types can be used in each data warehouse for storing dynamic types.
Data warehouse | Supported dynamic type |
---|---|
Snowflake | |
Databricks | |
BigQuery | |
Trino | |
Athena | |
Postgres | |
Clickhouse |
Dynamic complex types are essential for modeling a large number of events with a different set of properties.
Static complex type support
Generally, we don't recommend using static complex types while storing events in a data warehouse or data lake.
These types are common in:
Databricks, Athena, Trino - struct types
Clickhouse - tupples
Try avoiding these types and use a flat-column hierarchy instead.
Last updated