Preferences

maxgrinev parent
Good catch! Yes, recalculating metrics across all historical data every run would be expensive in Snowflake. I chose this example for simplicity to show how the three operations work together, but you're absolutely right about the inefficiency. The flow can easily be optimized for incremental processing - pull only recent orders and update metrics for just the affected customers:

steps:

  # Step 1: Pull only NEW orders since last run

  - op: http_request
    request:
      source: "shopify"
      url: "https://{{ var('store_name') }}.myshopify.com/admin/api/{{ var('api_version') }}/orders.json"
      method: GET
      parameters:
        status: any
        updated_at_min_expression: "{{ last_run_timestamp() or '2024-01-01' }}"
      headers:
        "Accept": "application/json"
    response:
      success_status: [200]
      tables:
        - source: "snowflake"
          table: "shopify_orders_incremental"
          columns: { ... }
          data_expression: response.json()['orders']

  # Step 2: Update metrics ONLY for customers with new/changed orders
  - op: transform
    source: "snowflake"
    query: |
      MERGE INTO customer_metrics cm
      USING (
        SELECT 
          customer_id,
          SUM(total_price::FLOAT) as total_spend,
          COUNT(*) as order_count
        FROM shopify_orders 
        WHERE customer_id IN (
          SELECT DISTINCT customer_id 
          FROM shopify_orders_incremental
        )
        GROUP BY customer_id
      ) new_metrics
      ON cm.customer_id = new_metrics.customer_id
      WHEN MATCHED THEN 
        UPDATE SET 
          total_spend = new_metrics.total_spend,
          order_count = new_metrics.order_count,
          updated_at = CURRENT_TIMESTAMP()
      WHEN NOT MATCHED THEN
        INSERT (customer_id, total_spend, order_count, updated_at)
        VALUES (new_metrics.customer_id, new_metrics.total_spend, new_metrics.order_count, CURRENT_TIMESTAMP())

  # Step 3: Sync only customers whose metrics were just updated
  - op: http_request
    input:
      source: "snowflake"
      query: |
        SELECT customer_id, email, total_spend, order_count
        FROM customer_metrics 
        WHERE updated_at >= '{{ run_start_timestamp() }}'
    request:
      source: "mailchimp"
      url_expression: |
        f"https://us1.api.mailchimp.com/3.0/lists/{var('list_id')}/members/{hashlib.md5(record['email'].encode()).hexdigest()}"
      method: PATCH
      body_expression: |
        {
          "merge_fields": {
            "TOTALSPEND": record['total_spend'],
            "ORDERCOUNT": record['order_count']
          }
        }
This scales much better: if you have 100K customers but only 50 new orders, you're recalculating metrics for ~50 customers instead of all 100K. Same simple workflow pattern, just production-ready efficiency.

Does this address your concern or did you mean something else? Would you suggest I use a slightly more complex but optimized example for the main demo? Your feedback is welcome and appreciated!


bz_bz_bz
I appreciate the response and detail. The code in your response definitely piqued my interest in the product more than the initial demo code does, but I do understand why you’d want simplicity on your homepage.

This item has no comments currently.