Preferences

I agree that YAML templating is kind of insane, but I will never understand why we don't stop using fake languages and simply use a real language.

If you need complex logic, use a programming language and generate the YAML/JSON/whatever with it. There you go. Fixed it for you.

Ruby, Python, or any other language really (I only favor scripting ones because they're generally easier to run), will give you all of that without some weird pseudo-language like Jsonnet or Go templates.

Write the freaking code already and you'll get bitten way less by obscure weird issues that these template engines have.

Seriously, use any real programing language and it'll be WAY better.


I once took a job that involved managing Ansible playbooks for an absolutely massive number of servers that would run them semi-regularly for things like bootstrapping and patching. I had used Chef before for a similar task, and I loved it because it's just ruby and I could easily define any logic I wanted while using loops and proper variables.

I understand that Ansible was designed for non-programmers, but there is no worse hell for someone who is actually familiar with basic programming than being confined to the hyper-verbose nonsense that is Jinja templating of Ansible playbooks when you need to have a lot of conditional tasks and loops.

I agree. And to make matters worse, the DSL on YAML has grown so large in features, it may as well be a programming language now.
https://yamlscript.org/ was posted here a while back: https://www.hackerneue.com/item?id=38726370

I thought I remembered more comments on that thread, but I guess nothing more than what's there needs to be said.

It technically is. Long ago as a junior sysadmin I created turing complete nightmares in Jinja.
Chef vs Ansible was the first example that popped into my mind. I had a very love/hate relationship with Chef when I used it, but writing cookbooks was definitely one of the good parts.
Ansible has a great module/plugin system. It's trivial to handle complex tasks or computations in a custom module or action.
So why is there this massive ecosystem around not writing modules then? RedHat invented automation controller just so they didn't have to implement proper error handling with Ansible.
The 'not writing modules' approach is for people that aren't comfortable writing code. I think most capable users for non-trivial things should write custom modules a lot of the time.
That's not how Ansible is meant to be used by default though. Modules are, in general, meant to be generic.

I bet you if I started writing modules for everything in most companies, people would complain. Unfortunately defaults matter.

I think language embedding is kind of a lost architecture in modern stacks. It used to be if you had a sufficiently complex application you'd code the guts in C/C++/Java/Whatever and then if you needed to script it, you'd embed something like a LISP/Lua/whatever on top.

But today, you have plenty of off-the-shelf JSON/TOML/YAML parsers you can just import into your app and a function called readConfig in place of where an embedded interpreter might be more appropriate.

It's just easier for developers to add complexity to a config format rather than provide a full language embedding and provide bindings into the application. So people have forgotten how to do it (or even that they can do it - I don't think it occurs to people anymore)

Pulumi is enticing because it allows you to write in your preferred language and abandon HCL, but it is strictly worse in my opinion. IaC should be declarative in my opinion. That allows for greater predictability, reproducibility and maintainability. In general, I think wanting to use Python or Ruby or whatever language you're going to use with Pulumi is not a good basis for choosing the tool.

There are many graveyards filled with places that tried to start writing logic into their IaC back in the Chef/Puppet era and made a huge mess that was impossible to upgrade or maintain (recall that Chef is more imperative/procedural, whereas in Puppet you describe the desired end state). The Chef/Pulumi approach can work, but it requires one person who is draconian about style and maintenance. Otherwise, it turns into a pile of garbage very quick.

Terraform/Puppet's model is a lot more maintainable for longer terms with bigger teams. It's just a better default for discouraging patterns that necessitate an outsized investment to maintain. Yes HCL can be annoying and it feels freeing to use Python/TS/whatever, but pure declarative code prevents a lot of spaghetti.

Pulumi is declarative. The procedural code (Python, Go, etc) generates the declaration of the desired state, which Pulumi then effects on the providers.

HCL is also not pure declarative code either. It can invoke non-declarative functions and can do loops based on environment variables, so in that sense there is really no difference between Pulumi and Terraform. The only real difference is that HCL is a terrible language compared to say Python.

I'm actually fairly sure HCL is Turing complete, it has loops and variables. But even if it is not all the way turing complete it's pretty close.

Pulumi may be declarative, but you use imperative languages to define your end state. The language you're actually writing your Pulumi in is what's most relevant to the point I'm making about maintainability. HCL isn't turing comlete, but even if it was, the point is that doing the types of things you can do in Python or other "real" languages is a major pain in HCL which effectively discourages you from doing that. I'm arguing that is actually a good thing for maintainability.
> recall that Chef is more imperative/procedural, whereas in Puppet you describe the desired end state

Chef's resources and resource collection and notifications scheme is entirely declarative. And after watching users beat their heads against Chef for a decade the thing that users really like is using declarative resources that other people wrote. The thing that they hate doing is trying to think declaratively themselves and write their own declarative resources or use the resource collection properly. People really want the glue code that they need to write to be imperative and simple.

The biggest issue that Chef had was the "two-pass parsing" design (build the entire resource collection, then execute the entire resource collection) along with the way that the resource collection and attributes were two enormous global variables which were mutable across the entire collection of recipe code which was being run, and then the design encouraged you to do that. And recipes were kind of a shit design since they weren't really like procedures or methods in a real programming language, but more like this gigantic concatenated 'main context' script. Local variables didn't bleed through so you got some isolation but attributes and the resource collection flowing through all of them as god-object global variables was horrible. Along with some people getting a bit too clever with Ruby and Chef internals.

I had dreams of freezing the entire node attribute tree after attribute file processing before executing resources to force the whole model into something more like a functional programming style of "here's all your immutable description of your data fed into your functional code of how to configure your system" but that would have been so much worse than Python 2.7-vs-3.0 and blown up the world.

Just looking at imperative-vs-declarative is way too simplistic of an analysis of what went wrong with Chef.

The fact that HCL has poor/nonexistent multi-language parsing support makes building tooling around terraform really annoying. I shouldn't have to install Python or a Go library to read my HCL.
The limitations of HCL are actually a good thing!

I have never seen Pulumi or CDKTF stuff work well. At some point are you simply writing a script and abandoning the advantages of a declarative approach

Right. That's what I'm arguing.
The existence of the YAML language for Pulumi and the CDK for TF both confound this explanation, it’s just not grounded in reality.
> I agree that YAML templating is kind of insane, but I will never understand why we don't stop using fake languages and simply use a real language.

The problem is language nerds write languages for other language nerds.

They all want it to be whatever the current sexiness is in language design and want it to be self-hosting and be able to write fast multithreaded webservers in it and then it becomes conceptually complicated.

What we need is like a "Logo" for systems engineers / devops which is a simple toy language that can be described entirely in a book the size of the original K&R C book. It probably needs to be dynamically typed, have control structures that you can learn in a weekend, not have any threading or concurrency, not be object oriented or have inheritance and be functional/modular in design. And have a very easy to use FFI model so it can call out to / be called from other languages and frameworks.

The problem is that language nerds can't control themselves and would add stuff that would grow the language to be more complex, and then they'd use that in core libraries and style guides so that newbies would have to learn it all. I myself would tend towards adding "each/map" kinds of functions on arrays/hashmaps instead of just using for loops and having first class functions and closures, which might be mistakes. There's that immutable FP language for configuration which already exists (i can't google this morning yet) which is exactly the kind of language which will never gain any traction because >95% of the people using templated YAML don't want to learn to program that way.

> What we need is like a "Logo" for systems engineers / devops which is a simple toy language that can be described entirely in a book the size of the original K&R C book.

I would argue that Tcl is exactly that. It's hard to make things any simpler than "everything is a string, and then you get a bunch of commands to treat strings as code or data". The entire language definition boils down to 12 simple rules ("dodekalogue"); everything else is just commands from the standard library. Simple Tcl code looks pretty much exactly like a typical (pre-XML, pre-JSON, pre-YAML) config file, and then you have conditionals, loops, variables etc added seamlessly on top of that, all described in very simple terms.

What we need is like a "Logo" for systems engineers / devops which is a simple toy language that can be described entirely in a book the size of the original K&R C book. It probably needs to be dynamically typed, have control structures that you can learn in a weekend, not have any threading or concurrency, not be object oriented or have inheritance and be functional/modular in design. And have a very easy to use FFI model so it can call out to / be called from other languages and frameworks.

I think Scheme would work, as long as you ban all uses of call/cc and user-defined macros. It's simple and dynamically typed, and doesn't have built-in classes or hash maps. Only problem is that it seems like most programmers dislike Lisp syntax, or at least aren't used to it.

There's also Awk, although it's oriented towards text, and doesn't have modules (the whole program has to be in one file).

It probably wouldn't be that hard to make this language yourself. Read the book Crafting Interpreters, which guides you through making a toy language called Lox. It's close to the toy language you describe.

If you combine Awk with the C preprocessor, you have a way for an Awk program to load modules, relative to where that file is located.

There is such a combination project: cppawk.

https://www.kylheku.com/cgit/cppawk/about/

Thanks for the link! It seems interesting.
There’s plenty to choose from that support embedding: Python, Perl, Lua. Heck, even EMCAScript (JavaScript, VBA, etc).

As another commenter rightfully stated, this used to be the norm.

I wouldn’t say LOGO is the right example though. It’s basically a LISP and is tailored for geometry (of course you can do a heck of a lot more with it but its strength is in geometry).

You're really missing the point. Logo was super simple and we learned it in elementary school as children, that's all that I'm talking about. And those other languages have accreted way too many features to be simple enough.
> You're really missing the point.

I got your point. I think it is you who is missing mine:

> You're really missing the point. Logo was super simple and we learned it in elementary school as children

You wouldn't have learned conditionals and other such things though. That stuff wasn't as easy to learn in LOGO because LOGO is basically a LISP. eg

    IFELSE :num = 1 [print [Number is 1]] [print [Number is 0]]
vs

    if { $num == 1 } then { print "number is 1" } else { print "number is 0" }
or

    if num == 1:
        print "number is 1"
    else:
        print "number is 0"
I'm not saying these modern languages don't have their baggage. But LOGO wasn't exactly a walk in the park for anything outside of it's main domain either. Your memory of LOGO here is rose tinted.

> And those other languages have accreted way too many features to be simple enough.

I agree (though less so with Lua) but you don't need to use those features. Sure, my preference would be "less is more" and thus my personal opinion of modern Python isn't particularly high. And Perl is rather old fashioned these days (though I think modern Perl gets more criticism than it deserves). But the fact is we don't need to reinvent the wheel here. Visual Basic could make raw DLL calls meaning you had unfettered access to Win32 APIs (et al) but that doesn't mean every VBScript out there was making DLL calls left right and centre. Heck, if you really want to distil things down then there's nothing even stopping someone implementing a "PythonScript" type language which is a subset of Python.

I just don't buy "simplicity of the language" as the reason languages aren't often embedded these days. I think it's the opposite problem: "simplicity of the implementation". It's far easier to load a JSON or YAML document into a C(++|#|Objective|whatever) struct than it is it to add API hooks for an embedded scripting language. And that's precisely why software written in dynamic languages do often expose their language runtime for configuration. Eg Ruby in Puppet and Chef, half of PHP applications having config written in PHP, XMPP servers written in Haskell, etc. In those kinds of languages, it is easy to read config from source files (sometimes even importing via `eval`) so there often isn't any need to stick config in JSON documents.

I'm deeply uninterested in continuing to have this discussion with you.
I mean... Nix satisfies every single one of what you mentioned and people say its too complicated. It's literally just the JSON data structure with lambdas, which really is basic knowledge for any computer scientist, and yet people complain about it.

It's fairly straightforward to 'embed' and as a bonus it generates json anyway (you can use the Nix command line to generate JSON). Me personally, I use it as my templating system (independent of nixpkgs) and it works great. It's a real language, but also restrictive enough that you don't do anything stupid (no IO really, and the IO it does have is declarative, functional and pure -- via hashing).

In Nix's favor:

1. Can be described in a one page flier. An in-depth exhaustive explanation of the language's features is a few pages (https://nixos.org/manual/nix/stable/language/)

2. dynamically typed

3. Turing complete and based on the lambda calculus so has access to the full suite of functional control structures. Also has basic if/then/else statements for the most common cases and for intuition.

4. no threading, no concurrency, no real IO

5. definitely not object-oriented and no inheritance

6. It is functional in design and has an extremely thin set of builtins

7. FFI model is either embed libnix directly (this does not require embedding the nix store stuff, which is a completely separate modular system), or use the command line to generate json (nix-instantiate --eval --json).

Note: do not confuse nixpkgs and NixOS with the nix language. The former is a system to build linux packages and entire linux distributions that use the latter as a configuration language. The nix language is completely independent and can be used for whatever.

Tried to use Nix as a homebrew replacement and failed to get it installed correctly with it blowing up with crazy error messages that I couldn't google. I didn't even get to the point of assessing the language. It really seems like the right kind of idea, but it doesn't seem particularly stable or easy enough to get to that initial payoff. If there's a nice language under there it is crippled by the fact that the average user is going to have a hard time getting to it.
You can use nix without using nixpkgs (you seemed to be trying to use nixpkgs). The nix language is accessible via several command line tools, nix repl, nix eval, nix-instantiate, etc, and can emit json via everal flags, as well as a builtin function.
I agree with the point's in Nix's favor except for 2. dynamically typed. Defining structs as part of the language would be nice. In fact, type checking is done ad-hoc now by passing data through type checking functions.
I think I'd rather just have logicless templates than use anything dynamically typed...

Jinja2 makes a lot of sense when you're trying to make it hard to add bugs, and you also don't want everyone to have to learn Rust or Elixir or something.

It would be interesting to extend a template language with a minimal FP language that could process data before the templated get it.

Dhall is the FP config language you're thinking of, I think.
I agree, and I just want to highlight what you said about generating a config file. It's extremely useful to constrain the config itself to something that can go in a json file or whatever. It makes the config simpler, easier to consume, and easier to document. But when it comes to _writing_ the config file, we should all use a programming language, and preferably a statically typed language that can check for errors and give nice auto complete and inline documentation.

I think aws cdk is a good example of this. Writing plain cloudformation is a pain. CDK solves this not by extending cloudformation with programming capabilities, but by generating the cloudformation for you. And the cloudformation is still a fairly simple, stable input for aws to consume.

You shouldn't need the full complexity and power of a Turing complete programming language to do config. The point of config is to describe a state, it's just data. You don't need an application within an application to describe state.

Inevitably, the path of just using a programming language for config leads to your config becoming more and more complex until it inevitably needs its own config, etc. You wind up with a sprawling, Byzantine mess.

The complexity is already there. If you only need static state like you say, then YAML/JSON/whatever is fine. But that's not what happens as software grows.

You need data that is different depending on environments, clouds, teams, etc. This complexity will still exist if you use YAML, it'll just be a ridiculous mess where you can break your scripts because you have an extra space in the YAML or added an incorrect `True` somewhere.

Complexity growth is inevitable. What is definitely avoidable is shoving concepts that in fact describe a "business" rule (maybe operational rule is a better name?) in unreadable templates.

Rules like: a deployment needs add these things when in production, or change those when in staging, etc exist whether they are hidden behind shitty Go templates or they are structured inside of a class/struct, a method with a descriptive name, etc.

The only downside is that you need to understand some basics of programming. But for me that's not a downside at all, since it's a much more useful skill than only knowing how to stitch Go templates together.

Why are we writing software that needs so much configuration? Not all of it is needed. We could do things more like consumer software, which assumes nobody will even consider your app if they have to edit a config file.
> your config becoming more and more complex until it inevitably needs its own config, etc. You wind up with a sprawling, Byzantine mess.

We're already there with Helm.

People write YAML because it's "just data". Then they want to package it up so they put it in a helm chart. Then they add variable substitution so that the name of resources can be configured by the chart user. Then they want to do some control flow or repetitiveness, so they use ifs and loops in templates. Then it needs configuring, so they add a values.yaml configuration file to configure the YAML templating engine's behaviour. Then it gets complicated so they define helper functions in the templating language, which are saved in another template file.

So we have a YAML program being configured by a YAML configuration file, with functions written in a limited templating language.

But that's sometimes not enough, so sometimes variables are also defined in the values.yaml and referenced elsewhere in the values.yaml with templating. This then gets passed to the templating system, which then evaluates that template-within-a-template, to produce YAML.

At the end of the day, Helm's issues stem from two competing interests:

(1) I want to write something where I can visualize exactly what will be sent to Kubernetes, and visually compare it to the wealth of YAML-based documentation and tutorials out there

(2) I have a set of resources/runners/cronjobs that each require similar, but not identical, setups and environments, so I need looping control flow and/or best-in-class template inclusion utilities

--

People who have been working in k8s for years can dispense with (1), and thus can use various abstractions for generating YAML/JSON that don't require the user to think about {toYaml | indent 8}.

But for a team that's still skilling up on k8s, Helm is a very reasonable choice of technology in that it lets you preserve (1) even if (2) is very far from a best-in-class level.

I have a recent example of rolling out IPv6 in AWS:

1. Create a new VPC, get an auto-assigned /56 prefix from AWS.

2. Create subnets within the VPC. Each subnet needs an explicitly-specified /64 prefix. (Maybe it can be auto-assigned by AWS, but you may still want to follow a specific pattern for your subnets).

3. Add those subnet prefixis to security / Firewall rules.

You can do this with a sufficiently-advanced config language - perhaps it has a built-in function to generate subnets from a given prefix. But in my experience, using a general-purpose programming language makes it really easy to do this kind of automation. For reference, I did this using Pulumi with TypeScript, which works really well for this.

That kind of ignores the entire pipeline involved in computing the correct config. Nobody wants to be manually writing config for dozens of services in multiple environments.

The number of configurations you need to create is multiplicative, take the number of applications, multiply by number of environments, multiply by number of complete deploys (i.e. multiple customers running multiple envs) and very quickly end up with an unmanageable number of unique configurations.

At that point you need a something at least approaching Turing completeness to correctly compute all the unique configs. Whether you decide to achieve that by embedding that computation into your application, or into a separate system that produces pure static config, is kind of academic. The complexity exists either way, and tools are needed to make it manageable.

That's not my experience after using AWS CDK since 2020 in the same company.

Most of our code is plain boring declarative stuff.

However, tooling is lightyears ahead of YAML (we have types, methods, etc...), we can encapsulate best practices and distribute as libs and, finally, escape hatches are possible when declarative code won't cut.

We need turing completeness in the strangest of places. We can often limit these places to a smaller part of the code. But it's really hard to know beforehand where those places will occur. Whenever we think we have found a clear separation we invent a config language.

And then we realize that we need scripting so we invent a templating language. Then everybody looses their minds and invents 5 more config languages that surely will make us not need the templating language.

Let's just call it code and use clever types to separate turing and non-turing completeness?

A really good solution here is to use a full programming language but run the config generator on every CI run and show the diff in review. This way you have a real language to make conditions as necessary but also can see the concrete results easily.

Unfortunately few review tools handle this well. Checked-in snapshot tests are the closest approximation that I have seen.

> You don't need an application within an application to describe state.

As shown in the article, you apparently do.

It happens because config is dual purpose: its state, but it's also the text-UI for your program. It spirals out of control because people want the best of it being "just text" and being a nice clean UI.
I agree, I think a language like dhall (https://dhall-lang.org/) strikes a good balance.
Yeah, YAML is good at declarative things. It’s when you start using it imperatively eg CI/CD is when it really starts to get ugly.
Agreed, and I almost feel silly for pointing this out, but for writing JSON (JavaScript Object Notation), I'd recommend using JavaScript...
For JSON I'd stick with Typescript to be honest. You end up executing Javascript and producing Javascript-native objects, but the typing in Typescript to ensure the objects you produce are actually valid will save a lot of debugging.
JS is actually not that great for this IMO. You probably need an NPM package to even deal with YAML because JS has a shitty standard library.

Sticking to a scripting language with a strong standard library is way better.

Any unix system can get Ruby/Python and read/write YAML/JSON immediately without caring too much about versions.

Of course in today's upside down world most developers seem to only know JS, so it would at least be "familiar". Still a bad choice in my view.

The way this industry is going, give it a few years and we'll have React-Kubernetes for generating templates. And I wish I was joking.

Parent is talking specifically about writing JSON, not YAML.
Yeah, but the article is about YAML and my original comment was about configuration in multiple formats.

So, to clarify, for JSON JS is definitely not the worse option. For me though, even for JSON, you have much better options.

I'm very happy using Typescript to templatize JSON. You can define a template as a class, compose them if needed, and when you are done, just write an object to a file.
The problem with imperative languages in configs is that they become harder to read. Webpack configs always devolve into this.

We need better tooling to allow tracing a how final configuration values are being generated.

And a _live programming_ environment so we can see the final generated configuration in one view.

Completely agree, my wish is that anything that risks getting complex uses a Ruby-based DSL.

For example, I like using Capistrano, which is wrapper around rake, which is a Ruby based DSL. That means that if things get tricky I can just drop down to using a programming language. Split stuff into logical parts that I load where needed and, for example, I can do something like YAML.load(..file..).dig('attribute name') or JSON.load from somewhere else.

Yes, you risk someone building spaghetti that way, but the flip side is that a good devops can build something much easier to maintain than dozens of YAML and JSON files, and you get all the power from your IDE and linters that are already available for the programming language, so silly syntax errors are caught without needing to run anything.

This. It's why things like Cloud Development Kit and Pulumi are quite interesting to me.
Throwing in a plug for https://dhall-lang.org/

> Dhall is a programmable configuration language that you can think of as: JSON + functions + types + imports

> I heard you liked configuration languages, so I made this configuration language for your configuration language generation scripts. It supports templates, of course.
Because the security surface of "any language" is tricky and most (all?) popular languages do not have nice data literal syntax better than JSON and YAML.
Helm would probably benefit from something like JSX for YAML/JSON. Just being able to script a chart instead of this templating hell.
I wonder if there isn't a place for both:

1. a full-blown language that can generate complex output

2. a declarative static data file

I hope I'm not just pulling my punches with #2

on the other hand, some complexity spirals out of control, especially when people use it without any need. Some great things come out of creating boundaries.

I argued that point in my article some time ago https://beepb00p.xyz/configs-suck.html also HN discussion at the time news.ycombinator.com/item?id=22787332
This is how config actually works in Scala.

This item has no comments currently.

Keyboard Shortcuts

Story Lists

j
Next story
k
Previous story
Shift+j
Last story
Shift+k
First story
o Enter
Go to story URL
c
Go to comments
u
Go to author

Navigation

Shift+t
Go to top stories
Shift+n
Go to new stories
Shift+b
Go to best stories
Shift+a
Go to Ask HN
Shift+s
Go to Show HN

Miscellaneous

?
Show this modal