bz_bz_bz parent
Recalculating customer metrics like that in your main example seems like a massive waste of snowflake resources, no?
Good catch! Yes, recalculating metrics across all historical data every run would be expensive in Snowflake. I chose this example for simplicity to show how the three operations work together, but you're absolutely right about the inefficiency.
The flow can easily be optimized for incremental processing - pull only recent orders and update metrics for just the affected customers:
steps:
# Step 1: Pull only NEW orders since last run
- op: http_request
request:
source: "shopify"
url: "https://{{ var('store_name') }}.myshopify.com/admin/api/{{ var('api_version') }}/orders.json"
method: GET
parameters:
status: any
updated_at_min_expression: "{{ last_run_timestamp() or '2024-01-01' }}"
headers:
"Accept": "application/json"
response:
success_status: [200]
tables:
- source: "snowflake"
table: "shopify_orders_incremental"
columns: { ... }
data_expression: response.json()['orders']
# Step 2: Update metrics ONLY for customers with new/changed orders
- op: transform
source: "snowflake"
query: |
MERGE INTO customer_metrics cm
USING (
SELECT
customer_id,
SUM(total_price::FLOAT) as total_spend,
COUNT(*) as order_count
FROM shopify_orders
WHERE customer_id IN (
SELECT DISTINCT customer_id
FROM shopify_orders_incremental
)
GROUP BY customer_id
) new_metrics
ON cm.customer_id = new_metrics.customer_id
WHEN MATCHED THEN
UPDATE SET
total_spend = new_metrics.total_spend,
order_count = new_metrics.order_count,
updated_at = CURRENT_TIMESTAMP()
WHEN NOT MATCHED THEN
INSERT (customer_id, total_spend, order_count, updated_at)
VALUES (new_metrics.customer_id, new_metrics.total_spend, new_metrics.order_count, CURRENT_TIMESTAMP())
# Step 3: Sync only customers whose metrics were just updated
- op: http_request
input:
source: "snowflake"
query: |
SELECT customer_id, email, total_spend, order_count
FROM customer_metrics
WHERE updated_at >= '{{ run_start_timestamp() }}'
request:
source: "mailchimp"
url_expression: |
f"https://us1.api.mailchimp.com/3.0/lists/{var('list_id')}/members/{hashlib.md5(record['email'].encode()).hexdigest()}"
method: PATCH
body_expression: |
{
"merge_fields": {
"TOTALSPEND": record['total_spend'],
"ORDERCOUNT": record['order_count']
}
}
This scales much better: if you have 100K customers but only 50 new orders, you're recalculating metrics for ~50 customers instead of all 100K. Same simple workflow pattern, just production-ready efficiency.Does this address your concern or did you mean something else? Would you suggest I use a slightly more complex but optimized example for the main demo? Your feedback is welcome and appreciated!