Skip to content

Braze Cloud Data Ingestion overview

Braze Cloud Data Ingestion allows you to set up a direct connection from your data warehouse or file storage system to Braze to sync relevant user or catalog data. When synced to Braze, this data can be leveraged for use cases such as personalization, triggering, or segmentation.

How it works

With Braze Cloud Data Ingestion (CDI), you set up an integration between your data warehouse instance and Braze workspace to sync data on a recurring basis. This sync runs on a schedule you set, and each integration can have a different schedule. Syncs can run as frequently as every 15 minutes or as infrequently as once per month. For customers who need syncs to occur more frequently than 15 minutes, please speak with your customer success manager, or consider using REST API calls for real-time data ingestion.

When a sync runs, Braze will directly connect to your data warehouse instance, retrieve all new data from the specified table, and update the corresponding data on your Braze dashboard. Each time the sync runs, any updated data will be reflected in Braze.

Supported data sources

Cloud Data Ingestion can sync data from the following sources to Braze:

  • Data warehouse sources
    • Amazon Redshift
    • Databricks
    • Google BigQuery
    • Snowflake
  • File storage sources
    • Amazon S3

Supported data types

Cloud Data Ingestion supports the following data types:

  • User attributes, including:
    • Nested custom attributes
    • Arrays of objects
    • Subscription statuses
  • Custom events
  • Purchase events
  • Catalog items
  • User delete requests

User data can be updated by external ID, user alias, Braze ID, email, or phone number. Users can be deleted by external ID, user alias, or Braze ID.

What gets synced

Each time a sync runs, Braze looks for rows that have not previously been synced. We check this using the UPDATED_AT column in your table or view. Any rows where UPDATED_AT is later than the last synced row will be selected and pulled into Braze.

In your data warehouse, add the following users and attributes to your table, setting the UPDATED_AT time to the time you add this data:

UPDATED_AT EXTERNAL_ID PAYLOAD
2022-07-19 09:07:23 customer_1234 {
    “attribute_1”:”abcdefg”,
    “attribute_2”: {
        “attribute_a”:”example_value_2”,
        “attribute_b”:”example_value_2”
    },
    “attribute_3”:”2019-07-16T19:20:30+1:00”
}
2022-07-19 09:07:23 customer_3456 {
    “attribute_1”:”abcdefg”,
    “attribute_2”:42,
    “attribute_3”:”2019-07-16T19:20:30+1:00”,
    “attribute_5”:”testing”
}
2022-07-19 09:07:23 customer_5678 {
    “attribute_1”:”abcdefg”,
    “attribute_4”:true,
    “attribute_5”:”testing_123”
}

During the next scheduled sync, all rows with a UPDATED_AT timestamp later than the most recent timestamp will be synced to the Braze user profiles. Fields will be updated or added, so you do not need to sync the full user profile each time. After the sync, users will reflect the new updates:

1
2
3
4
5
6
7
8
9
10
11
12
{
  "external_id":"customer_1234",
  "email":"[email protected]",
  "attribute_1":"abcdefg",
  "attribute_2":{
        "attribute_a":"example_value_1",
        "attribute_b":"example_value_2"
    },
  "attribute_3":"2019-07-16T19:20:30+1:00",
  "attribute_4":false,
  "attribute_5":"testing"
}
1
2
3
4
5
6
7
8
9
{
  "external_id":"customer_3456",
  "email":"[email protected]",
  "attribute_1":"abcdefg",
  "attribute_2":42,
  "attribute_3":"2019-07-16T19:20:30+1:00",
  "attribute_4":true,
  "attribute_5":"testing"
}
1
2
3
4
5
6
7
8
9
{
  "external_id":"customer_5678",
  "email":"[email protected]",
  "attribute_1":"abcdefg",
  "attribute_2":42,
  "attribute_3":"2017-08-10T09:20:30+1:00",
  "attribute_4":true,
  "attribute_5":"testing_123"
}

Use case: First time sync and subsequent updates

This example shows the general process for syncing data for the first time, then only updating changing data (deltas) in the subsequent updates. Let’s say we have a table EXAMPLE_DATA with some user data. On day 1, it has the following values:

external_id attribute_1 attribute_2 attribute_3 attribute_4
12345 823 blue 380 FALSE
23456 28 blue 823 TRUE
34567 234 blue 384 TRUE
45678 245 red 349 TRUE
56789 1938 red 813 FALSE

To get this data into the format that CDI expects, you could run the following query:

1
2
3
4
5
6
7
8
9
10
11
12
SELECT
    CURRENT_TIMESTAMP AS UPDATED_AT,
    EXTERNAL_ID AS EXTERNAL_ID,
    TO_JSON(
        OBJECT_CONSTRUCT(
            'attribute_1', attribute_1,
            'attribute_2', attribute_2,
            'attribute_3', attribute_3,
            'attribute_4', attribute_4
        )
    ) AS PAYLOAD
FROM EXAMPLE_DATA;

None of this has synced to Braze before, so add all of it to the source table for CDI:

A sync runs, and Braze records that you synced all available data up until “2023-03-16 15:00:00”. Then, on the morning of day 2, you have an ETL that runs and some fields in your users table are updated (highlighted):

external_id attribute_1 attribute_2 attribute_3 attribute_4
12345 145 red 380 TRUE
23456 15 blue 823 TRUE
34567 234 blue 495 FALSE
45678 245 green 349 TRUE
56789 1938 red 693 FALSE

Now you need to add only the changed values into the CDI source table. These rows can be appended rather than updating the old rows. That table now looks like this:

CDI will only sync the new rows, so the next sync that runs will only sync the last five rows.

Use case: Update a field in an existing array of objects

This example shows how to update a field in an existing array of objects. Let’s say we have a source table with the following definition:

1
2
3
4
5
6
7
8
Create table BRAZE_CLOUD_INGESTION_DEMO.BRAZE_SCHEMA.pet_list (
    pet_id int IDENTITY(1,1), 
    breed VARCHAR, 
    type VARCHAR, 
    name VARCHAR, 
    owner_id VARCHAR, 
    age int
);

In this example, we want to add an array of pets owned by each user, which corresponds to owner_id. Specifically, we want to include identification, breed, type, and name. We can use the following query to populate a table or view:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
SELECT 
CURRENT_TIMESTAMP as UPDATED_AT,
owner_id as EXTERNAL_ID,
TO_JSON(
    OBJECT_CONSTRUCT(
        '_merge_objects','true',
       'pets',
        OBJECT_CONSTRUCT(
           '$add', ARRAY_AGG( OBJECT_CONSTRUCT(
                'id',
                pet_id,
                'breed',
                breed,
                'type',
                type,
                'name',
                name
                )) WITHIN GROUP (ORDER BY type ASC)    
        )
    )
)
as PAYLOAD from BRAZE_CLOUD_INGESTION_DEMO.BRAZE_SCHEMA.pet_list group by EXTERNAL_ID;

The expected output would look like this:

1
2
3
4
UPDATED_AT	EXTERNAL_ID	PAYLOAD
2023-10-02 19:56:17.377 +0000	03409324	{"_merge_objects":"true","pets":{"$add":[{"breed":"parakeet","id":5,"name":"Mary","type":"bird"}]}}
2023-10-02 19:56:17.377 +0000	21231234	{"_merge_objects":"true","pets":{"$add":[{"breed":"calico","id":2,"name":"Gerald","type":"cat"},{"breed":"beagle","id":1,"name":"Gus","type":"dog"}]}}
2023-10-02 19:56:17.377 +0000	12335345	{"_merge_objects":"true","pets":{"$add":[{"breed":"corgi","id":3,"name":"Doug","type":"dog"},{"breed":"salmon","id":4,"name":"Larry","type":"fish"}]}}

Next, to send an updated name field and new age field for each owner, we can use the following query to populate a table or view:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
SELECT 
CURRENT_TIMESTAMP as UPDATED_AT,
owner_id as EXTERNAL_ID,
TO_JSON(
    OBJECT_CONSTRUCT(
        '_merge_objects','true',
       'pets',
        OBJECT_CONSTRUCT(
           '$update', ARRAY_AGG( OBJECT_CONSTRUCT(
                '$identifier_key','id',
                '$identifier_value',pet_id,
                '$new_object',OBJECT_CONSTRUCT(
                    'name',name,
                    'age',age
                )
                )) WITHIN GROUP (ORDER BY type ASC)    
        )
    )
)
as PAYLOAD from BRAZE_CLOUD_INGESTION_DEMO.BRAZE_SCHEMA.pet_list group by EXTERNAL_ID; 

The expected output would look like this:

1
2
3
4
UPDATED_AT	EXTERNAL_ID	PAYLOAD
2023-10-02 19:50:25.266 +0000	03409324	{"_merge_objects":"true","pets":{"$update":[{"$identifier_key":"id","$identifier_value":5,"$new_object":{"age":7,"name":"Mary"}}]}}
2023-10-02 19:50:25.266 +0000	21231234	{"_merge_objects":"true","pets":{"$update":[{"$identifier_key":"id","$identifier_value":2,"$new_object":{"age":3,"name":"Gerald"}},{"$identifier_key":"id","$identifier_value":1,"$new_object":{"age":3,"name":"Gus"}}]}}
2023-10-02 19:50:25.266 +0000	12335345	{"_merge_objects":"true","pets":{"$update":[{"$identifier_key":"id","$identifier_value":3,"$new_object":{"age":6,"name":"Doug"}},{"$identifier_key":"id","$identifier_value":4,"$new_object":{"age":1,"name":"Larry"}}]}}

Data point usage

Data point billing for Cloud Data Ingestion is equivalent to billing for updates through the /users/track endpoint. Refer to Data points for more information.

Data setup recommendations

Only write new or updated attributes to minimize consumption

Each time a sync runs, Braze looks for rows that have not previously been synced. We check this using the UPDATED_AT column in your table or view. Any rows where UPDATED_AT is later than the last synced row will be selected and pulled into Braze, regardless of whether they are the same as what’s currently on the user profile. Given that, we recommend only syncing attributes you want to add or update.

Data point consumption is identical using CDI as for other ingestion methods like REST APIs or SDKs, so it is up to you to make sure that you’re only adding new or updated attributes into your source tables.

Use a UTC timestamp for the UPDATED_AT column

The UPDATED_AT column should be in UTC to prevent issues with daylight savings time. Prefer UTC-only functions, such as SYSDATE() instead of CURRENT_DATE() whenever possible.

Make sure the UPDATED_AT time isn’t the same time as your sync

Your CDI sync might have duplicate data if any UPDATED_AT fields are at the exact same time as your previous sync time. This is because the CDI will choose an “inclusive boundary” when it spots any row that is the same time as the previous sync, and will make the rows viable to sync. The CDI will re-ingest those rows and create duplicate data.

Separate EXTERNAL_ID from PAYLOAD column

The PAYLOAD object should not include an external ID or other ID type.

Remove an attribute

You can set it to null if you want to omit an attribute from a user’s profile. If you want an attribute to remain unchanged, don’t send it to Braze until it’s been updated. To completely remove an attribute, use TO_JSON(OBJECT_CONSTRUCT_KEEP_NULL(...)).

Make incremental updates

Make incremental updates to your data so you can prevent unintentional overwrites when simultaneous updates are made.

In the following example, a user has two attributes:

  • Color: “Green”
  • Size: “Large”

Then Braze receives the following two updates to that user simultaneously:

  • Request 1: Change color to “Red”
  • Request 2: Change size to “Medium”

Because Request 1 occurs first, the user’s attributes are updated to the following:

  • Color: “Red”
  • Size: “Large”

However, when Request 2 occurs, Braze starts with the original attribute values (“Green” and “Large”), then updates the user’s attributes to the following:

  • Color: “Green”
  • Size: “Medium”

When the requests are finished, Request 2 will overwrite the update from Request 1. Due to this, it’s best to stagger your updates so you can prevent requests from being overwritten.

Create JSON string from another table

If you prefer to store each attribute in its own column internally, you need to convert those columns to a JSON string to populate the sync with Braze. To do that, you can use a query like:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
CREATE TABLE "EXAMPLE_USER_DATA"
    (attribute_1 string,
     attribute_2 string,
     attribute_3 number,
     my_user_id string);

SELECT
    CURRENT_TIMESTAMP as UPDATED_AT,
    my_user_id as EXTERNAL_ID,
    TO_JSON(
        OBJECT_CONSTRUCT (
            'attribute_1',
            attribute_1,
            'attribute_2',
            attribute_2,
            'yet_another_attribute',
            attribute_3)
    )as PAYLOAD FROM "EXAMPLE_USER_DATA";
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
CREATE TABLE "EXAMPLE_USER_DATA"
    (attribute_1 string,
     attribute_2 string,
     attribute_3 number,
     my_user_id string);

SELECT
    CURRENT_TIMESTAMP as UPDATED_AT,
    my_user_id as EXTERNAL_ID,
    JSON_SERIALIZE(
        OBJECT (
            'attribute_1',
            attribute_1,
            'attribute_2',
            attribute_2,
            'yet_another_attribute',
            attribute_3)
    ) as PAYLOAD FROM "EXAMPLE_USER_DATA";
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
CREATE OR REPLACE TABLE BRAZE.EXAMPLE_USER_DATA (attribute_1 string,
     attribute_2 STRING,
     attribute_3 NUMERIC,
     my_user_id STRING);

SELECT
    CURRENT_TIMESTAMP as UPDATED_AT,
    my_user_id as EXTERNAL_ID,
    TO_JSON(
      STRUCT(
        'attribute_1' AS attribute_1,
        'attribute_2'AS attribute_2,
        'yet_another_attribute'AS attribute_3
      )
    ) as PAYLOAD 
  FROM BRAZE.EXAMPLE_USER_DATA;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
CREATE OR REPLACE TABLE BRAZE.EXAMPLE_USER_DATA (
    attribute_1 string,
    attribute_2 STRING,
    attribute_3 NUMERIC,
    my_user_id STRING
);

SELECT
    CURRENT_TIMESTAMP as UPDATED_AT,
    my_user_id as EXTERNAL_ID,
    TO_JSON(
      STRUCT(
        attribute_1,
        attribute_2,
        attribute_3
      )
    ) as PAYLOAD 
  FROM BRAZE.EXAMPLE_USER_DATA;

Use the UPDATED_AT timestamp

We use the UPDATED_AT timestamp to track what data has been synced successfully to Braze. If many rows are written with the same timestamp while a sync is running, this may lead to duplicate data being synced to Braze. Some suggestions to avoid duplicate data:

  • If you are setting up a sync against a VIEW, do not use CURRENT_TIMESTAMP as the default value. This will cause all data to sync every time the sync runs because the UPDATED_AT field will evaluate to the time our queries are run.
  • If you have very long-running pipelines or queries writing data to your source table, avoid running these concurrently with a sync, or avoid using the same timestamp for every row inserted.
  • Use a transaction to write all rows that have the same timestamp.

Table configuration

We have a public GitHub repository for customers to share best practices or code snippets. To contribute your own snippets, create a pull request!

Data formatting

Any operations that are possible through the Braze /users/track endpoint are supported through Cloud Data Ingestion, including updating nested custom attributes, adding subscription status, and syncing custom events or purchases.

Fields within the payload should follow the same format as the corresponding /users/track endpoint. For detailed formatting requirements, refer to the following:

Data type Formatting specifications
attributes See user attributes object
events See events object
purchases See purchases object

Note the special requirement for capturing dates in nested attributes.

You may include nested custom attributes in the payload column for a custom attributes sync.

1
2
3
4
5
6
7
8
9
10
11
12
{
      "most_played_song": {
        "song_name": "Solea",
        "artist_name": "Miles Davis",
        "album_name": "Sketches of Spain",
        "genre": "Jazz",
        "play_analytics": {
            "count": 1000,
            "top_10_listeners": true
        }
      }
}

To sync events, an event name is required. The time field should be formatted as an ISO 8601 string or in yyyy-MM-dd'T'HH:mm:ss:SSSZ format. If the time field is not present, the UPDATED_AT column value is used as the event time. Other fields including app_id and properties are optional.

1
2
3
4
5
6
7
8
9
{
    "app_id" : "your-app-id",
    "name" : "rented_movie",
    "time" : "2013-07-16T19:20:45+01:00",
    "properties": {
        "movie": "The Sad Egg",
        "director": "Dan Alexander"
    }
} 

To sync purchase events, event name, product_id, currency, and price are required. The time field, which is optional, should be formatted as an ISO 8601 string or in yyyy-MM-dd'T'HH:mm:ss:SSSZ format. If the time field is not present, the UPDATED_AT column value is used as the event time. Other fields, including app_id, quantity and properties are optional.

1
2
3
4
5
6
7
8
9
10
11
12
{
    "app_id" : "11ae5b4b-2445-4440-a04f-bf537764c9ad",
    "product_id" : "Completed Order",
    "currency" : "USD",
    "price" : 219.98,
    "time" : "2013-07-16T19:20:30+01:00",
    "properties" : {
        "products" : [ { "name": "Monitor", "category": "Gaming", "product_amount": 19.99, },
        { "name": "Gaming Keyboard", "category": "Gaming ", "product_amount": 199.99, }
        ]
    }
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
{
    "subscription_groups" : [
        {
            "subscription_group_id": "subscription_group_identifier_1",
            "subscription_state": "unsubscribed"
        },
        {
            "subscription_group_id": "subscription_group_identifier_2",
            "subscription_state": "subscribed"
        },
        {
            "subscription_group_id": "subscription_group_identifier_3",
            "subscription_state": "subscribed"
        }
      ]
}

Avoiding timeouts for data warehouse queries

We recommend that queries be completed within one hour for optimal performance and to avoid potential errors. If queries exceed this timeframe, consider reviewing your data warehouse configuration. Optimizing resources allocated to your warehouse can help improve query execution speed.

Product limitations



HOW HELPFUL WAS THIS PAGE?
New Stuff!