You seem to have stepped on the same landmine that Ansible did, by defaulting to the jinja2 [aka text/template silliness in golang] of using double mustaches in YAML. I hope you enjoy quoting things because you're going to be quoting everything for all time because "{" is a meaningful character in YAML. Contrast
parameters:
status: "{{ var('order_status') }}"
with
parameters:
# made famous by GitHub Actions
status: ${{ var('order_status') }}
# or the ASP.Net flavor:
status2: <%= var('order_status2') %>
# or the PHP flavor:
status3: <?= var('order_status3') ?>
and, just like Ansible, it's going to get insaneo when your inner expression has a quote character, too, since you'll need to escape it from the YAML parser leading to leaning toothpick syndrome e.g.
If you find my "but what about the DX?" compelling, also gravely consider why in the world `data_expression:` seems to get a pass, in that it is implicitly wrapped in the mustaches
str_rendered = Template(template_str, undefined=StrictUndefined,
variable_start_string="${{",
variable_end_string="}}"
).render(jinja_context)
# et al, if you want to fix the {# and {%, too
Thank you for such an insightful suggestion and deep dive into the code - this is amazing feedback! I'll definitely switch to the ${{}} syntax you suggested.
Quick clarification on _expression: we intentionally use two templating systems - Jinja {{ }} for simple variable injection, and Python *_expression for complex logic that Jinja can't handle.
Actually, since we only use Jinja for variable substitution, should I just drop it entirely? We have another version implemented in Java/JavaScript that uses simple ${var-name} syntax, and we already have Python expressions for advanced scenarios. Might be cleaner to unify on ${var-name} + Python expressions.
Given how deeply you've looked into our system, would you consider using Sequor? I can promise full support including fundamental changes like these - your technical insight would be invaluable for getting the design right early on.
mdaniel
I'm not the target audience for this product, but I experience the pain from folks who embed jinja2/golang in yaml every single day, so I am trying to do whatever I can to nip those problems in the bud so maybe one day it'll stop becoming the default pattern
As for "complex logic that jinja can't handle," I am not able to readily identify what that would mean given that jinja has executable blocks but I do agree with you that its mental model can make writing imperative code inside those blocks painful (e.g. {% set _ = my_dict.update({"something":"else}) %} type silliness)
it ultimately depends on whether those _expression: stanzas are always going to produce a Python result or they could produce arbitrary output. If the former, then I agree with you jinja2 would be terrible for that since it's a templating language[1]. If the latter, then using jinja2 would be a harmonizing choice so the author didn't have to keep two different invocation styles in their head at once
1: one can see that in ansible via this convolution:
body: >-
{%- set foo = {} -%}
{%- for i in ... -%}
{%- endfor -%}
{# now emit the dict as json #}
{{ foo | to_json }}
vivzkestrel
forgive me for asking a few daft questions but i want to know a few things
- who is the target audience for this (programmers / sql admins / companies with these guys)
- what are they gaining using this tool
- who are some other providers that offer similar stuff
- how is your offering different from theirs
- is this a commercial product, do you have plans to commercialize it like turning it into a subscription based model?
maxgrinevOP
Great questions! Let me break this down:
Target audience:
1) Enterprise IT teams who already know SQL/YAML - they can build complex integrations after ~1 hour of training using our examples, no prior Python needed
2) Modern data teams using dbt - Sequor complements it perfectly for data ingestion and activation
What they gain:
Full flexibility with structure. Enterprise IT folks go from zero to building end-to-end solutions in an hour without needing developer support. Think "dbt but for API integrations."
Competitors & differentiation:
1) Zapier/n8n: GUI looks easy but gets complex fast, poor database integration, can't handle bulk data
2) Fivetran/Airbyte: Pre-built connectors only, zero customization, ingestion-only
3) Us: Only code-first solution using open tech stack (SQL+YAML+Python) - gives you flexibility with Fivetran reliability
Business model:
1) Core engine: Open source, free forever
2) Revenue: On-premise server with enterprise features (RBAC, observability and execution monitoring with notifications, audit logs) - flat fee per installation, no per-row costs like competitors
3) Services: Custom connector development and app-to-app integration flows (we love this work!)
4) Cloud version maybe later - everyone wants on-premise now
The key difference:
we're the only tool that's both easy to learn AND highly customizable for all major API integration patterns: data ingestion, reverse ETL, and multi-step iPaaS workflows - all in one platform.
bz_bz_bz
Recalculating customer metrics like that in your main example seems like a massive waste of snowflake resources, no?
maxgrinevOP
Good catch! Yes, recalculating metrics across all historical data every run would be expensive in Snowflake. I chose this example for simplicity to show how the three operations work together, but you're absolutely right about the inefficiency.
The flow can easily be optimized for incremental processing - pull only recent orders and update metrics for just the affected customers:
steps:
# Step 1: Pull only NEW orders since last run
- op: http_request
request:
source: "shopify"
url: "https://{{ var('store_name') }}.myshopify.com/admin/api/{{ var('api_version') }}/orders.json"
method: GET
parameters:
status: any
updated_at_min_expression: "{{ last_run_timestamp() or '2024-01-01' }}"
headers:
"Accept": "application/json"
response:
success_status: [200]
tables:
- source: "snowflake"
table: "shopify_orders_incremental"
columns: { ... }
data_expression: response.json()['orders']
# Step 2: Update metrics ONLY for customers with new/changed orders
- op: transform
source: "snowflake"
query: |
MERGE INTO customer_metrics cm
USING (
SELECT
customer_id,
SUM(total_price::FLOAT) as total_spend,
COUNT(*) as order_count
FROM shopify_orders
WHERE customer_id IN (
SELECT DISTINCT customer_id
FROM shopify_orders_incremental
)
GROUP BY customer_id
) new_metrics
ON cm.customer_id = new_metrics.customer_id
WHEN MATCHED THEN
UPDATE SET
total_spend = new_metrics.total_spend,
order_count = new_metrics.order_count,
updated_at = CURRENT_TIMESTAMP()
WHEN NOT MATCHED THEN
INSERT (customer_id, total_spend, order_count, updated_at)
VALUES (new_metrics.customer_id, new_metrics.total_spend, new_metrics.order_count, CURRENT_TIMESTAMP())
# Step 3: Sync only customers whose metrics were just updated
- op: http_request
input:
source: "snowflake"
query: |
SELECT customer_id, email, total_spend, order_count
FROM customer_metrics
WHERE updated_at >= '{{ run_start_timestamp() }}'
request:
source: "mailchimp"
url_expression: |
f"https://us1.api.mailchimp.com/3.0/lists/{var('list_id')}/members/{hashlib.md5(record['email'].encode()).hexdigest()}"
method: PATCH
body_expression: |
{
"merge_fields": {
"TOTALSPEND": record['total_spend'],
"ORDERCOUNT": record['order_count']
}
}
This scales much better: if you have 100K customers but only 50 new orders, you're recalculating metrics for ~50 customers instead of all 100K. Same simple workflow pattern, just production-ready efficiency.
Does this address your concern or did you mean something else? Would you suggest I use a slightly more complex but optimized example for the main demo? Your feedback is welcome and appreciated!
bz_bz_bz
I appreciate the response and detail. The code in your response definitely piqued my interest in the product more than the initial demo code does, but I do understand why you’d want simplicity on your homepage.
maxgrinevOP
Dynamic YAML with computed properties could have applications beyond API integrations. We use Python since it's familiar to data engineers, but our original prototype with JavaScript had even more compact syntax. Would love feedback on our approach and other use cases for dynamic YAML.
seebeen
What a great idea - let's combine one of the worst languages ever invented with database backend which wasn't ever meant to be used as a "processing engine"
If you find my "but what about the DX?" compelling, also gravely consider why in the world `data_expression:` seems to get a pass, in that it is implicitly wrapped in the mustaches
---
edit: ah, that's why https://github.com/paloaltodatabases/sequor/blob/v1.2.0/src/... but https://github.com/paloaltodatabases/sequor/blob/v1.2.0/src/... is what I would suggest changing before you get a bunch of tech debt and have to introduce a breaking change. From
to per https://jinja.palletsprojects.com/en/stable/api/#jinja2.Temp...Quick clarification on _expression: we intentionally use two templating systems - Jinja {{ }} for simple variable injection, and Python *_expression for complex logic that Jinja can't handle.
Actually, since we only use Jinja for variable substitution, should I just drop it entirely? We have another version implemented in Java/JavaScript that uses simple ${var-name} syntax, and we already have Python expressions for advanced scenarios. Might be cleaner to unify on ${var-name} + Python expressions.
Given how deeply you've looked into our system, would you consider using Sequor? I can promise full support including fundamental changes like these - your technical insight would be invaluable for getting the design right early on.
As for "complex logic that jinja can't handle," I am not able to readily identify what that would mean given that jinja has executable blocks but I do agree with you that its mental model can make writing imperative code inside those blocks painful (e.g. {% set _ = my_dict.update({"something":"else}) %} type silliness)
it ultimately depends on whether those _expression: stanzas are always going to produce a Python result or they could produce arbitrary output. If the former, then I agree with you jinja2 would be terrible for that since it's a templating language[1]. If the latter, then using jinja2 would be a harmonizing choice so the author didn't have to keep two different invocation styles in their head at once
1: one can see that in ansible via this convolution:
Target audience:
1) Enterprise IT teams who already know SQL/YAML - they can build complex integrations after ~1 hour of training using our examples, no prior Python needed
2) Modern data teams using dbt - Sequor complements it perfectly for data ingestion and activation
What they gain:
Full flexibility with structure. Enterprise IT folks go from zero to building end-to-end solutions in an hour without needing developer support. Think "dbt but for API integrations."
Competitors & differentiation:
1) Zapier/n8n: GUI looks easy but gets complex fast, poor database integration, can't handle bulk data
2) Fivetran/Airbyte: Pre-built connectors only, zero customization, ingestion-only
3) Us: Only code-first solution using open tech stack (SQL+YAML+Python) - gives you flexibility with Fivetran reliability
Business model:
1) Core engine: Open source, free forever
2) Revenue: On-premise server with enterprise features (RBAC, observability and execution monitoring with notifications, audit logs) - flat fee per installation, no per-row costs like competitors
3) Services: Custom connector development and app-to-app integration flows (we love this work!)
4) Cloud version maybe later - everyone wants on-premise now
The key difference:
we're the only tool that's both easy to learn AND highly customizable for all major API integration patterns: data ingestion, reverse ETL, and multi-step iPaaS workflows - all in one platform.
steps:
This scales much better: if you have 100K customers but only 50 new orders, you're recalculating metrics for ~50 customers instead of all 100K. Same simple workflow pattern, just production-ready efficiency.Does this address your concern or did you mean something else? Would you suggest I use a slightly more complex but optimized example for the main demo? Your feedback is welcome and appreciated!
/s