How to sync batched data with workflowStaticData in n8n (Baserow + Crunchbase API)

— Juliet Edjere

Building robust data synchronization workflows in n8n often involves handling large datasets, interacting with paginated APIs, and making decisions about whether to create new records or update existing ones. A common challenge in such scenarios is maintaining state, i.e. tracking information that needs to persist and be accessible across different nodes, especially when processing data in batches using nodes like SplitInBatches.

Unlike traditional programming environments where variables are easily accessible throughout a script's execution, n8n's architecture processes data item by item or in chunks between nodes. This means a variable or piece of data available in one node doesn't automatically carry over to the next node in a way that allows for easy accumulation or global lookup within a single workflow run.

This is where n8n's workflowStaticData comes in. It provides a mechanism to store and mutate data globally for the duration of a single workflow execution. This tutorial will guide you through building a practical data sync workflow, specifically syncing product feature data from an external API (like Crunchbase) into a database (like Baserow), demonstrating how to use workflowStaticData to effectively manage state across batches, distinguishing between records that need to be created and those that need to be updated.

Why workflowStaticData for batch processing

When dealing with large datasets that need to be processed in batches (to respect API limits, manage memory, or improve resilience), you often need to accumulate results or maintain a lookup reference across those batches.

Consider our sync scenario:

  1. You fetch thousands of records from Crunchbase.
  2. You split these into batches of 50.
  3. For each batch, you need to determine if the items already exist in Baserow.
  4. You need to collect all the "new" items into one list for a batch create operation later.
  5. You need to collect all the "existing" items (with their database IDs) into another list for a batch update operation later.

Without workflowStaticData, collecting these lists reliably across batches processed by SplitInBatches is tricky. Each execution of the nodes after SplitInBatches only sees the items from the current batch. workflowStaticData provides that necessary global, persistent container within that single workflow run to achieve this.

Key characteristics of workflowStaticData:

  • Persistent within execution: Data stored here remains available and mutable across all nodes in a single workflow run.
  • Global scope: Accessible from any node using $getWorkflowStaticData().
  • Ideal for accumulation: Perfect for building lists, lookup maps, counters, or flags that need to persist as data flows through different nodes and branches.
  • Not persistent between runs: The data is cleared when the workflow execution finishes. It's temporary memory for one specific run.

Let's build the Crunchbase to Baserow sync workflow.

The workflow structure

Here's the sequence of nodes we'll use:

  1. Baserow List Rows: Fetch existing records from the target database.

  2. Code: Process fetched data to build a lookup map and initialize state in workflowStaticData.

  3. HTTP Request: Fetch source data from the external API.

  4. SplitInBatches: Divide source data into manageable chunks.

  5. Code: Process each batch against the lookup map, populating create/update lists in workflowStaticData.

  6. Baserow Batch Create: Insert new records using the list accumulated in workflowStaticData.

  7. Baserow Batch Update: Update existing records using the list accumulated in workflowStaticData.

Step-by-step: Track global state with workflowStaticData

Step 1: Fetch existing rows from Baserow

  • Add a Baserow node.

  • Set the Operation to List Rows.

  • Configure it to connect to your Baserow database and the specific table where you store the product data.

  • Purpose: This fetches all current records in your Baserow table. We need this data to determine which records from Crunchbase already exist and get their Baserow ID for potential updates.

    Issues with N8N and Baserow Integration - Technical Help - Baserow

Step 2: Build a Lookup Map and initialize state

  • Add a Code node after the Baserow node.
  • Purpose: This node takes the Baserow data, transforms it into an easy-to-use lookup map, and initializes the global state container (workflowStaticData) that will be used by subsequent nodes.
// Step 2: Build Lookup Map and Initialize Lists
const productMap = {};
// Iterate through the items received from the previous Baserow node
for (const item of items) {
    const data = item.json;
    const permalink = data.crunchbase_permalink; // Assuming this field exists in Baserow
    const rowId = data.id; // Get the Baserow row ID
    const status = data.status;
    const lastLlmRunAt = data.last_llm_run_at; // Example fields for update logic

    if (permalink) {
        // Store Baserow data, keyed by the unique permalink
        productMap[permalink] = {
            row_id: rowId,
            status,
            last_llm_run_at: lastLlmRunAt
        };
    } else {
        // Log a warning if a record is missing the key field
        console.warn(`Skipping row ${rowId} due to missing permalink.`);
    }
}

// Get the workflowStaticData object. 'global' is a common key.
const workflowStaticData = $getWorkflowStaticData('global');

// Store the built lookup map in workflowStaticData
workflowStaticData.baserowProductMap = productMap;

// Initialize empty lists for records to be created or updated.
// These lists will be populated by subsequent nodes processing batches.
workflowStaticData.batch_create_list = [];
workflowStaticData.batch_update_list = [];

// Return the original items or an empty array, as this node's primary
// output is setting static data, not necessarily transforming input items
return items;
  • Explanation: The code iterates through the items received from the Baserow node. It creates an object productMap where the crunchbase_permalink (from Baserow data) is the key, and the value is an object containing essential Baserow details like the row_id. It then accesses workflowStaticData using $getWorkflowStaticData('global') (you can use any string key). It stores the productMap there and initializes two empty arrays: batch_create_list and batch_update_list. These arrays will be filled later as we process the Crunchbase data in batches.

Step 3: Fetch Crunchbase product list

  • Add an HTTP Request node after the Code node.
  • Configure it to connect to the Crunchbase API endpoint that lists products.
  • Set up parameters for any required authentication, pagination, or filtering.
  • Ensure the response includes necessary fields like name and a unique identifier like permalink.
  • Purpose: This node retrieves the source data that we want to sync into Baserow.

Step 4: Split into manageable batches

  • Add a SplitInBatches node after the HTTP Request node.
  • Set the "Batch Size" to a reasonable number (e.g., 50 or 100).
  • Purpose: This is crucial for handling large datasets from Crunchbase. It breaks the list of products into smaller groups. The nodes following SplitInBatches will execute once for each batch.

Step 5: Process batches & track state

  • Add a Code node after the SplitInBatches node.
  • Purpose: This node receives items for the current batch from the SplitInBatches node. It accesses the productMap stored in workflowStaticData to determine if each item in the batch is new or existing in Baserow and appends it to the corresponding list (batch_create_list or batch_update_list) also stored in workflowStaticData.
// Step 5: Process Batches & Track State
// Get the workflowStaticData object again to access the shared state
const workflowStaticData = $getWorkflowStaticData('global');

// Access the previously built lookup map and the batch lists from static data
const productMap = workflowStaticData.baserowProductMap;
let createList = workflowStaticData.batch_create_list; // Get the current state of the create list
let updateList = workflowStaticData.batch_update_list; // Get the current state of the update list

// Define date thresholds for the LLM logic example
const MonthsAgo = new Date();
MonthsAgo.setMonth(MonthsAgo.getMonth() - 1);
const today = new Date().toISOString().split('T')[0];

// Iterate through the items IN THE CURRENT BATCH received from SplitInBatches
for (const item of items) {
    const cbProduct = item.json; // Data for the current item from Crunchbase
    const existing = productMap[cbProduct.permalink]; // Check if permalink exists in our Baserow lookup map

    if (!existing) {
        // If the permalink is NOT in the map, it's a new record for Baserow
        createList.push({
            // Structure the data for Baserow creation
            name: cbProduct.name,
            crunchbase_permalink: cbProduct.permalink, // Assuming 'permalink' is the Crunchbase field name
            created_at: today,
            last_synced_cb_at: today,
            status: 'New - Needs LLM' // Example initial status
        });
    } else {
        // If the permalink IS in the map, it's an existing record
        const { row_id, status, last_llm_run_at } = existing; // Get existing Baserow data from the map
        const updatePayload = {
            id: row_id, // Include the Baserow row ID for the update operation
            last_synced_cb_at: today // Mark when it was last synced from CB
        };

        // Example logic: Check if the status indicates needing an LLM run
        const statusNeedsLLM = ['New - Needs LLM', 'Needs LLM Re-run', 'LLM Failed'].includes(status);
        const llmNeverRan = !last_llm_run_at;
        let llmRunIsOld = false;

        if (last_llm_run_at) {
            try {
                const lastRun = new Date(last_llm_run_at);
                llmRunIsOld = lastRun < MonthsAgo;
            } catch {
                // Handle potential date parsing errors
                llmRunIsOld = true;
            }
        }

        // If LLM status check applies and it needs a re-run, update the status field
        if (!statusNeedsLLM && (llmNeverRan || llmRunIsOld)) {
            updatePayload.status = 'Needs LLM Re-run';
        }

        // Add the update payload to the list for batch updates
        updateList.push(updatePayload);
    }
}

// IMPORTANT: Update workflowStaticData with the potentially modified lists.
// The changes made to createList and updateList variables are local to this
// node's execution for the current batch. We need to save them back to the
// global static data for the next batch and the final sync nodes to see.
workflowStaticData.batch_create_list = createList;
workflowStaticData.batch_update_list = updateList;

// Return the original items or an empty array. The output of this node
// is less important than its side effect of updating workflowStaticData.
return items;
  • Explanation: This Code node executes once for each batch output by SplitInBatches. Inside the loop, it looks up the permalink of the current Crunchbase item in the productMap retrieved from workflowStaticData.

    • If !existing is true, the item is new, and a payload for creating a new Baserow row is added to createList.
    • If existing is true, the item exists. We retrieve its Baserow row_id and add a payload for updating that row to updateList. The example includes logic to conditionally set a 'Needs LLM Re-run' status based on existing data.
    • Crucially, after processing the batch, the potentially modified createList and updateList (which are local variables in this code block) are assigned back to workflowStaticData.batch_create_list and workflowStaticData.batch_update_list. This saves the accumulated state for the next batch to build upon, and eventually, for the final Baserow nodes to use.

Step 6: Baserow Batch Create

  • Add a Baserow node after the Code node from Step 5.

  • Set the Operation to Batch Create Rows.

  • In the "Items" field, configure it to read from the accumulated list in workflowStaticData. The expression should look something like:

    {{ $getWorkflowStaticData('global').batch_create_list }}
    
  • Configure Batch Size and Batch Interval as needed for performance and API limits (e.g., 100 items per batch, 500ms interval).

  • Purpose: This node takes all the items collected in the batch_create_list across all processed batches and performs a single batch create operation in Baserow. It waits for all batches from SplitInBatches and the subsequent Code node to finish before executing.

Step 7: Baserow Batch Update

  • Add another Baserow node after the Batch Create node.

  • Set the Operation to Batch Update Rows.

  • In the "Items" field, configure it to read from the accumulated list in workflowStaticData:

    {{ $getWorkflowStaticData('global').batch_update_list }}
    
  • Configure Batch Size and Batch Interval similarly to the Batch Create node.

  • Purpose: This node takes all the update payloads collected in the batch_update_list across all processed batches and performs a single batch update operation in Baserow. Like the Batch Create node, it waits for all preceding batch processing to complete.

Conceptual data flow

Think of the workflow like this:

  1. Data from Baserow enters.
  2. The first Code node extracts key info and stores it in the global workflowStaticData bucket, also initializing empty lists in the bucket. The Baserow data then exits or is ignored.
  3. Data from Crunchbase enters.
  4. SplitInBatches holds the full list and releases items in batches.
  5. The second Code node receives a batch of items. It accesses the workflowStaticData bucket (seeing the lookup map and the current state of the lists). It processes the batch, modifies the lists in the bucket, and then the batch items exit or are ignored. This repeats for every batch.
  6. After all batches are processed, the Batch Create node runs. It accesses the workflowStaticData bucket, retrieves the final, complete batch_create_list, and sends it to Baserow.
  7. Finally, the Batch Update node runs, accesses the workflowStaticData bucket, retrieves the batch_update_list, and sends it to Baserow.

Best practices for using workflowStaticData

  • Keep it manageable: Avoid storing extremely large, deeply nested objects or millions of simple items if possible. While powerful, it still uses memory during the workflow run. Pre-validate or transform data before storing it.

  • Flat keys: Store data using simple keys (like baserowProductMap, batch_create_list). Avoid excessive nesting within workflowStaticData itself.

  • Synchronous use: workflowStaticData is designed for state persistence within a single, synchronous workflow execution. It is not shared between different workflow runs or active users.

  • Initialize per run: Always initialize or clear the relevant keys in workflowStaticData at the beginning of your workflow run (as done in Step 2) to ensure a clean state for each execution.

  • Access and re-assign in loops: When modifying lists or objects within SplitInBatches loops (like in Step 5's Code node), ensure you:

    1. Retrieve the current state from workflowStaticData.
    2. Modify the local copy of the state (e.g., createList.push(...)).
    3. Crucially, re-assign the modified local copy back to workflowStaticData (workflowStaticData.batch_create_list = createList;). Changes to local variables in a Code node do not automatically sync back to workflowStaticData.

Conclusion

By effectively leveraging workflowStaticData, you've built a sophisticated data synchronization pipeline in n8n. You can now reliably fetch large datasets, process them in batches, accurately identify which records are new versus existing using a dynamic lookup map, collect items for distinct batch operations across the entire dataset, and finally, execute efficient batch create and update operations in your target database. This pattern is highly adaptable and forms a solid foundation for many complex data integration tasks in n8n that involve state management across multiple processing steps.


ABOUT ME

I'm Juliet Edjere, a no-code professional focused on automation, product development, and building scalable solutions with no coding knowledge.

I document all things MVP validation and how designs, data, and market trends connect.

Click. Build. Launch.

Visit my website → built with Carrd and designed in Figma

Powered By Swish