(评论)
(comments)

原始链接: https://news.ycombinator.com/item?id=39101828

Toml 听起来确实是一个更干净、更直接的选择。 然而,Markdown 中的前文还有其他替代方案,包括“commonmark”,它试图扩展 Markdown 语法以包含更多功能,而无需求助于额外的内联标签。 此外,还有一种名为“Mermaid”的提议格式,其中包括对图表、表格和代码块以及其他功能的支持。 最终,选择使用哪种格式在很大程度上取决于个人喜好和特定用例的需求。

相关文章

原文
Hacker News new | past | comments | ask | show | jobs | submit login
Why are we templating YAML? (2019) (leebriggs.co.uk)
427 points by olestr 1 day ago | hide | past | favorite | 626 comments










I'm completely done with configs written in YAML. Easily the worst part of Github Actions, even worse than the reliability. When I see some cool tool require a YAML file for config, I immediately get hit with a wave of apprehension. These same feelings extend to other proprietary config languages like HCL for Terraform, ASL for AWS Step Functions, etc. It's fine that you want a declarative API, but let me generate my declaration programatically.

Config declared in and generated by code has been a superior experience. It's one of the things that AWS CDK got absolutely right. My config and declarative definition of my cloud infra is all written in a typesafe language with great IDE support without the need for random plugins that some rando wrote and never updated since 2 years ago.



At this point, I even prefer plain JSON to YAML. What pushed me over the edge is that "deno fmt" comes with a JSON formatter, but not a YAML formatter. It's a single binary that runs in milliseconds. For YAML auto-formatting you basically have to use Prettier, and Prettier depends on half of NPM and takes a good 2 seconds to startup and run. So, I literally moved every YAML file in our repository at work that could be JSON to JSON and I think everyone has been much happier. Or, at least I have been, and nobody has complained to me about it.

Various editors also support a $schema tag in the JSON. I added this feature to our product (which has a flow that invokes your editor on a JSON file), and it works great. You can just press tab and make a config file without reading the docs. Truly wonderful.

YAML has this too with the YAML language server, but you need your tab key to indent stuff, so the ergonomics are pretty un-fun. JSON isn't perfect, but at least the text "no" is true.



At work we're currently expanding to another country. Which means that many services now need a country label etc., which is fun when you're adding "no" to all our existing services. Luckily it's quick to catch, but man... why?


Yeah, I'm pretty sure there are exactly two substantive problems with JSON for (static) configuration file use cases, which are comments and multiline strings (especially with sane handling of indentation). YAML fixes these, but it adds so much complexity in the process including such a predictable footgun of unquoted strings (the no/false problem is particularly glaring/absurd, but it's also easy to forget to quote other boolean values or numbers in a long list of other strings).


Solutions abound, but one option is to use either Javascript (config.js):

    // comments!
    ({
       no_quotes: [1, 2, (() => /* code! */)()],
       ...
     })

Or, let the whole thing be a function. Then your config can have parameters, maybe mapped from environment variables or something.

    ({foo, bar, ...kwargs}) => ({
      datacenter: foo === 'old' ? 'useast1' : 'uswest',
      ...
     })
Can do as Crockford says, and write in the JSON subset of Javascript, but with comments, and convert it to JSON by running it through a JS minifier. I think you need parens around the object, though, or else it looks like a code block (boooo...):

    ({
      // This is very much like JSON
      "foo": [1, 2, "bar"]
     })
Python also has JSON-like syntax, so you could use config.py:

    {
      'foo': [1, 2, 'bar']
    }
That would require a wrapper script. Or, you can have the self-contained convention:

    import json
    import sys
    
    # Yay, comments!
    json.dump({
      # more comments!
      'foo': [1, 2, 'bar']
    }, sys.stdout)


Yeah, I'm mostly just not sure I want to put a full programming language interpreter in my application, especially Python which is not designed to be embeddable. Moreover, I would really want something that is typed, like TypeScript, but libraries for embedded TypeScript interpreters are even more rare :/.


Can I add "trailing commas are invalid" to the list?


Please do.


json5 is pretty good, if you can use it


AFAIK Prettier has 0 dependencies and runs fast enough that triggering formatting on save wasn't ever noticeable (granted I've never tried it with YAML specifically). Curious what kind of setup you had to push it to 2 seconds - maybe bulk formatting in CI for whole repository?


I prefer JSON to YAML as well. The lack of comments is a problem though. But I feel like this is a false dichotomy. Both kind of suck for this need, but I can accept that JSON is at least reasonable to work with if you need language agnostic config.


An often-heard benefit for using YAML is that JSON does not have comment. What I don't understand is why we would switch to a whole new language. Just add a filter before loading the configuration, which can't be harder than switching to YAML, right?

Another reason for YAML is that it is easier to read. That I don't understand either. The endless pain of dealing with configuration does seem come from saving a few seconds of parsing off braces and brackets, but from not being about easily figure out what goes wrong, especially when what's wrong is a missing space or tab embedded in hundreds of lines of configurations.



I like json a lot.

That said, I think json would benefit from only two things:

1) comments

2) allow extra commas, like ["a", "b", "c",] or {"a":"b", "c":"d", }

or more properly:

  {
    "a":"b",
    "c":"d",
  },
EDIT: and json5 does both, plus a few more niceties. (hmm. too much?)




Not sure why people just don't settle with TOML.


It has atrocious arrays. Example: https://youtu.be/n9mGk8_tQtM?t=367


Just make another named list key called "comment". Problem solved.


This is not always an option when JSON is propagated as is, nor does it allow for comments on specific object properties.


GitHub actions would suck whatever you "configured" them in, because you are trying to describe a program in a data structure.

Ansible makes the same mistake, as do countless other tools.



"because you are trying to describe a program in a data structure"

(cries in lisp)



The best interpretation of weebull's comment is not that describing a program in a data structure is "bad" per se, but that doing that in a configuration language (or requiring configuration constructs to be programming constructs) might not be a hot idea.

Even Lisp software that uses Lisp for configuration does not necessarily allow programming in that configuration notation.



Yeah, I think describing a program in a data structure is fine. I honestly prefer it to any syntax that a "real" programming language has brought me. It's so consistent and you can really focus on what you care about. What is unhappy about Github Actions and similar is that your programming language has like 2 keywords; "download a container" and "run a shell script". I would have preferred starting with "func", "handle this error", and "retry this operation if the error is type Foo" ;)

Since this article is about helm, I'll point out that Go templates are very lispy. I often have things in them that look like {{ and (foo bar) (bar baz) }} and it only gets crazier as you add more parentheses ;)



The problem I have with GitHub Actions is that I usually want to metaprogram them. I have a monorepo and I want a particular action to run for each "project" subdirectory. I've written a program that generates GitHub Actions YAML files, but all of the ways to make sure the generator was run before each commit are fairly unsatisfying.

The problem I have with infra-as-code tools is that what I really want is a pretty simple representation for "the state of the world" that some reconciliation can use, and then I want to generate that stuff in a typesafe, expression-based language like TypeScript or Python (Dhall exists, but its Haskell-like syntax and conventions are too steep a learning curve to get mainstream adoption). Instead we get CloudFormation and Terraform which shoehorn programming language constructs into a configuration language (which isn't strictly an objection to code-as-data generally) or things like Helm which uses text templates to generate a "state of the world" description or these CDKs which all seem to depend on a full JavaScript engine for reasons that don't make sense to me (why do I need JavaScript to generate configuration?).



I often wonder if the only reason we haven't used lisp more as a society, and certainly in the devops world, is because our brains find it easier to parse nested indentation than nested parentheses.

But in doing so, we've thrown out the other important part of lisp, which is that you can use the same syntax for data that you do for control flow. And so we're stuck in this world where a "modern-looking" program is seen as a thing that must be evaluated to make sense, not a data structure in and of itself.

https://www.reddit.com/r/lisp/comments/1pyg07/why_not_use_in... is a fascinating 10 year old discussion. And of course, there's Smalltalk, which guided others to a treasure it could not possess. But most younger programmers have never even had these conversations.



The vast majority of Lisp code is assiduously written with nested indentation! So that can't be it.

Non-lisp languages have parentheses, brackets and braces, using indentation to clarify the structure. Nobody can reasonably work with minified Javascript, without reformatting it first to span multiple lines, with indentation.

Lisp has great support for indentation; reformatting Lisp nicely, though not entirely trivial, is easier than other languages.

Oh, have you seen parinfer? It's an editing mode that infers indentation from nesting, and nesting from indentation (both directions) in real-time. It also infers closing parentheses. You can just delete lines and it reshuffles the closers.

The github.io site has animations:

https://shaunlebron.github.io/parinfer/



To me it seems a lot of the benefit of declarative programming is just that you can use less powerful tools that don't allow constructs you don't want to have to deal with .

LISP seems great for tinkerers and researchers, but not so much corporate devs who want extreme amounts of consistency and predictability, but don't need the absolute most elegant solution.



> you are trying to describe a program in a data structure

This describes 100% of software development, though! Programming is just designing data structures that represent some computation. Each language lends itself better to some computations than to others (and some, like YAML, are terrible for describing any kind of computation at all), but they're all just data structures describing programs.

The problem isn't that GitHub Actions tries to describe a program in a data structure, the problem is that the language that they chose to represent those programs (YAML and the meta language on top) is ill-suited to the task.



> Ansible makes the same mistake, as do countless other tools.

My favorite example of this is chown/chmod taking 4-5 lines, in yaml. Sure you can do it a bunch of different ways, sure it allows for repeatable commands. But, it just sucks.



The same reason I don't like AWS' Step Functions. The spec in JSON is horrible. On the other hand, Step Functions is pretty scalable and reliable and can take practically unlimited throughput. It's a good story for how a product can succeed by getting the primitives right and by removing just the key obstacle for users. Now that Step Functions has gained momentum, they can construct higher-level APIs and SDKs to translate user spec to the low-level JSON/YAML payload.


In the case of GitHub Actions, it's made more painful by the lack of support for YAML anchors, which would provide a bare minimum of composability.

https://github.com/actions/runner/issues/1182



If configs had well-adopted schema support, it wouldn't be so bad.


Even then, it gets messy. From a tooling standpoint, how will I load your schema? How will my editor respect it? How do I run a validator against it? I know XML kind of solves some of these problems, but it has its own thorns and despite what anyone says, it is not easy to work with. XSD, XSLT, etc. So much complexity that needs to be managed in a different way in every runtime. And then type safety goes out at the boundary where it connects to your code.


That's how it used to be for your suggestion too.

We're living in a dream state now where the creators of IDEs like Visual Studio (Code) or IntelliJ actively implement common languages and frameworks. It used to be 'find a half-baked community plugin so JSON works.'

If someone made a standard schema and people used it, I can assure you the magic you are expecting from your tooling would suddenly pop in just like how JSON support appeared one day. But they can't do nothin' if there is no community support for it.

XSD and XSLT are complicated because XML is complicated.



For real, I want a real language (Lua/JS/Lisp) for configuration but without 3rd party imports so that it's secure and predictable.


living in a yaml-world; i honestly hate it.


> These same feelings extend to other proprietary config languages like HCL for Terraform, ASL for AWS Step Functions, etc. It's fine that you want a declarative API, but let me generate my declaration programatically.

Yeah, I've had the same sort of opinion since the bad old AWS CloudFormation days. I wrote an experimental CloudFormation generator 4 years ago where all of the resources and Python type hints were generated from a JSON file that AWS published and it worked really well (https://github.com/weberc2/nimbus/blob/master/examples/src/n...).

> Config declared in and generated by code has been a superior experience. It's one of the things that AWS CDK got absolutely right.

Is that how CDK works? I've only dabbled with it, but it was pretty far from the "generate cloudformation" experience that I had built; I guess I never "saw the light" for CDK. It felt like trading YAML/templating problems for inheritance/magic problems. I'd really like to hear from more people who have used AWS CDK, Terraform's CDK, and/or Pulumi.



It's an annoyingly OOP model with mutations and side-effects, but if you look past that, it's pretty nice. The core idea is you create an instance of a CDK "App" object. You create new instances of "Stack" objects that take an "App" instance as a context parameter. From there, resources are grouped into logical chunks called "Constructs" which take either a stack or another construct as their parent context param. The only things you should ever inherit from are the base Constructs for Stack, Stage, and Construct. Don't use inheritance anywhere else and you'll be okay.

The code then looks something like this (writing this straight in the comment box, probably has errors):

    // Entrypoint of CDK project like bin/app.ts or whatever
    import * as cdk from 'aws-cdk-lib'
    import { MyStack } from '../lib/my-stack.ts'
    const app = new cdk.App()
    const stack = new MyStack(app, 'StackNameHere', someProps)
    
    // lib/my-stack.ts
    // Imports go here
    export class MyStack extends cdk.Stack {
      constructor(scope: Construct, id: string, props: MyStackProps) {
        super(scope, id, props)
        const bucket = new s3.Bucket(this, 'MyBucket', {
          bucketName: 'example-bucket',
        })
        const lambda = new NodejsFunction(this, 'MyLambdaFn', {
          functionName: 'My-Lambda-Fn',
          entryFile: 'my-handler.ts',
          memorySize: 1024,
          runtime: Runtime.NodeJS_20X
        })
        bucket.grantRead(lambda),
        tracing: Tracing.Active
      })
    }

The best part is the way CI/CD is managed. CDK supports self-mutating pipelines where the pipeline itself is a stack in your CDK app. After the pipeline is created, it will update itself as part of the pipeline before promoting other changes to the rest of your environments.

The equivalent CloudFormation for the above example would be ridiculously long. And that's putting aside all the complexity it would take for you to add on asset bundling for code deployed to things like Lambda.

TL;DR: Infrastructure-as-code-as-code



> It's an annoyingly OOP model with mutations and side-effects, but if you look past that, it's pretty nice

I think I was getting hung up on the mutations and side-effects of it all. Thanks for putting words to that. I'll have to give it another try sometime. Have you used Terraform's CDK by chance? I assume it's heavily inspired from AWS's CDK, but my company has since moved to GCP/Terraform.



The mutations and side-effects only last until synthesis. You can imagine a CDK app as a pure function that runs a bunch of mutations on an App object and then serializes the state of that object in the end to static assets that can be deployed. The internals of it all are messy, but at a conceptual level, it's easy to think about.

CDKTF is really promising, IMO. When I last looked, it was still pretty new, but it's maturing, I think. One downside compared to regular AWS CDK is that the higher level constructs from the official AWS CDK can't be used in CDKTF. There is an adapter that exists, but it's one more layer between you and knowing what's going on: https://github.com/hashicorp/cdktf-aws-cdk



> Config declared in and generated by code has been a superior experience.

And here we are at the point in time when people are plainly forgotten about compiled programming languages.



I agree that YAML templating is kind of insane, but I will never understand why we don't stop using fake languages and simply use a real language.

If you need complex logic, use a programming language and generate the YAML/JSON/whatever with it. There you go. Fixed it for you.

Ruby, Python, or any other language really (I only favor scripting ones because they're generally easier to run), will give you all of that without some weird pseudo-language like Jsonnet or Go templates.

Write the freaking code already and you'll get bitten way less by obscure weird issues that these template engines have.

Seriously, use any real programing language and it'll be WAY better.



I once took a job that involved managing Ansible playbooks for an absolutely massive number of servers that would run them semi-regularly for things like bootstrapping and patching. I had used Chef before for a similar task, and I loved it because it's just ruby and I could easily define any logic I wanted while using loops and proper variables.

I understand that Ansible was designed for non-programmers, but there is no worse hell for someone who is actually familiar with basic programming than being confined to the hyper-verbose nonsense that is Jinja templating of Ansible playbooks when you need to have a lot of conditional tasks and loops.



I agree. And to make matters worse, the DSL on YAML has grown so large in features, it may as well be a programming language now.


https://yamlscript.org/ was posted here a while back: https://news.ycombinator.com/item?id=38726370

I thought I remembered more comments on that thread, but I guess nothing more than what's there needs to be said.



It technically is. Long ago as a junior sysadmin I created turing complete nightmares in Jinja.


Chef vs Ansible was the first example that popped into my mind. I had a very love/hate relationship with Chef when I used it, but writing cookbooks was definitely one of the good parts.


Ansible has a great module/plugin system. It's trivial to handle complex tasks or computations in a custom module or action.


So why is there this massive ecosystem around not writing modules then? RedHat invented automation controller just so they didn't have to implement proper error handling with Ansible.


The 'not writing modules' approach is for people that aren't comfortable writing code. I think most capable users for non-trivial things should write custom modules a lot of the time.


That's not how Ansible is meant to be used by default though. Modules are, in general, meant to be generic.

I bet you if I started writing modules for everything in most companies, people would complain. Unfortunately defaults matter.



I think language embedding is kind of a lost architecture in modern stacks. It used to be if you had a sufficiently complex application you'd code the guts in C/C++/Java/Whatever and then if you needed to script it, you'd embed something like a LISP/Lua/whatever on top.

But today, you have plenty of off-the-shelf JSON/TOML/YAML parsers you can just import into your app and a function called readConfig in place of where an embedded interpreter might be more appropriate.

It's just easier for developers to add complexity to a config format rather than provide a full language embedding and provide bindings into the application. So people have forgotten how to do it (or even that they can do it - I don't think it occurs to people anymore)



Pulumi is enticing because it allows you to write in your preferred language and abandon HCL, but it is strictly worse in my opinion. IaC should be declarative in my opinion. That allows for greater predictability, reproducibility and maintainability. In general, I think wanting to use Python or Ruby or whatever language you're going to use with Pulumi is not a good basis for choosing the tool.

There are many graveyards filled with places that tried to start writing logic into their IaC back in the Chef/Puppet era and made a huge mess that was impossible to upgrade or maintain (recall that Chef is more imperative/procedural, whereas in Puppet you describe the desired end state). The Chef/Pulumi approach can work, but it requires one person who is draconian about style and maintenance. Otherwise, it turns into a pile of garbage very quick.

Terraform/Puppet's model is a lot more maintainable for longer terms with bigger teams. It's just a better default for discouraging patterns that necessitate an outsized investment to maintain. Yes HCL can be annoying and it feels freeing to use Python/TS/whatever, but pure declarative code prevents a lot of spaghetti.



Pulumi is declarative. The procedural code (Python, Go, etc) generates the declaration of the desired state, which Pulumi then effects on the providers.

HCL is also not pure declarative code either. It can invoke non-declarative functions and can do loops based on environment variables, so in that sense there is really no difference between Pulumi and Terraform. The only real difference is that HCL is a terrible language compared to say Python.

I'm actually fairly sure HCL is Turing complete, it has loops and variables. But even if it is not all the way turing complete it's pretty close.



Pulumi may be declarative, but you use imperative languages to define your end state. The language you're actually writing your Pulumi in is what's most relevant to the point I'm making about maintainability. HCL isn't turing comlete, but even if it was, the point is that doing the types of things you can do in Python or other "real" languages is a major pain in HCL which effectively discourages you from doing that. I'm arguing that is actually a good thing for maintainability.


> recall that Chef is more imperative/procedural, whereas in Puppet you describe the desired end state

Chef's resources and resource collection and notifications scheme is entirely declarative. And after watching users beat their heads against Chef for a decade the thing that users really like is using declarative resources that other people wrote. The thing that they hate doing is trying to think declaratively themselves and write their own declarative resources or use the resource collection properly. People really want the glue code that they need to write to be imperative and simple.

The biggest issue that Chef had was the "two-pass parsing" design (build the entire resource collection, then execute the entire resource collection) along with the way that the resource collection and attributes were two enormous global variables which were mutable across the entire collection of recipe code which was being run, and then the design encouraged you to do that. And recipes were kind of a shit design since they weren't really like procedures or methods in a real programming language, but more like this gigantic concatenated 'main context' script. Local variables didn't bleed through so you got some isolation but attributes and the resource collection flowing through all of them as god-object global variables was horrible. Along with some people getting a bit too clever with Ruby and Chef internals.

I had dreams of freezing the entire node attribute tree after attribute file processing before executing resources to force the whole model into something more like a functional programming style of "here's all your immutable description of your data fed into your functional code of how to configure your system" but that would have been so much worse than Python 2.7-vs-3.0 and blown up the world.

Just looking at imperative-vs-declarative is way too simplistic of an analysis of what went wrong with Chef.



The fact that HCL has poor/nonexistent multi-language parsing support makes building tooling around terraform really annoying. I shouldn't have to install Python or a Go library to read my HCL.


The limitations of HCL are actually a good thing!

I have never seen Pulumi or CDKTF stuff work well. At some point are you simply writing a script and abandoning the advantages of a declarative approach



Right. That's what I'm arguing.


The existence of the YAML language for Pulumi and the CDK for TF both confound this explanation, it’s just not grounded in reality.


> I agree that YAML templating is kind of insane, but I will never understand why we don't stop using fake languages and simply use a real language.

The problem is language nerds write languages for other language nerds.

They all want it to be whatever the current sexiness is in language design and want it to be self-hosting and be able to write fast multithreaded webservers in it and then it becomes conceptually complicated.

What we need is like a "Logo" for systems engineers / devops which is a simple toy language that can be described entirely in a book the size of the original K&R C book. It probably needs to be dynamically typed, have control structures that you can learn in a weekend, not have any threading or concurrency, not be object oriented or have inheritance and be functional/modular in design. And have a very easy to use FFI model so it can call out to / be called from other languages and frameworks.

The problem is that language nerds can't control themselves and would add stuff that would grow the language to be more complex, and then they'd use that in core libraries and style guides so that newbies would have to learn it all. I myself would tend towards adding "each/map" kinds of functions on arrays/hashmaps instead of just using for loops and having first class functions and closures, which might be mistakes. There's that immutable FP language for configuration which already exists (i can't google this morning yet) which is exactly the kind of language which will never gain any traction because >95% of the people using templated YAML don't want to learn to program that way.



I think I'd rather just have logicless templates than use anything dynamically typed...

Jinja2 makes a lot of sense when you're trying to make it hard to add bugs, and you also don't want everyone to have to learn Rust or Elixir or something.

It would be interesting to extend a template language with a minimal FP language that could process data before the templated get it.



What we need is like a "Logo" for systems engineers / devops which is a simple toy language that can be described entirely in a book the size of the original K&R C book. It probably needs to be dynamically typed, have control structures that you can learn in a weekend, not have any threading or concurrency, not be object oriented or have inheritance and be functional/modular in design. And have a very easy to use FFI model so it can call out to / be called from other languages and frameworks.

I think Scheme would work, as long as you ban all uses of call/cc and user-defined macros. It's simple and dynamically typed, and doesn't have built-in classes or hash maps. Only problem is that it seems like most programmers dislike Lisp syntax, or at least aren't used to it.

There's also Awk, although it's oriented towards text, and doesn't have modules (the whole program has to be in one file).

It probably wouldn't be that hard to make this language yourself. Read the book Crafting Interpreters, which guides you through making a toy language called Lox. It's close to the toy language you describe.



If you combine Awk with the C preprocessor, you have a way for an Awk program to load modules, relative to where that file is located.

There is such a combination project: cppawk.

https://www.kylheku.com/cgit/cppawk/about/



Thanks for the link! It seems interesting.


There’s plenty to choose from that support embedding: Python, Perl, Lua. Heck, even EMCAScript (JavaScript, VBA, etc).

As another commenter rightfully stated, this used to be the norm.

I wouldn’t say LOGO is the right example though. It’s basically a LISP and is tailored for geometry (of course you can do a heck of a lot more with it but its strength is in geometry).



You're really missing the point. Logo was super simple and we learned it in elementary school as children, that's all that I'm talking about. And those other languages have accreted way too many features to be simple enough.


> You're really missing the point.

I got your point. I think it is you who is missing mine:

> You're really missing the point. Logo was super simple and we learned it in elementary school as children

You wouldn't have learned conditionals and other such things though. That stuff wasn't as easy to learn in LOGO because LOGO is basically a LISP. eg

    IFELSE :num = 1 [print [Number is 1]] [print [Number is 0]]
vs

    if { $num == 1 } then { print "number is 1" } else { print "number is 0" }
or

    if num == 1:
        print "number is 1"
    else:
        print "number is 0"
I'm not saying these modern languages don't have their baggage. But LOGO wasn't exactly a walk in the park for anything outside of it's main domain either. Your memory of LOGO here is rose tinted.

> And those other languages have accreted way too many features to be simple enough.

I agree (though less so with Lua) but you don't need to use those features. Sure, my preference would be "less is more" and thus my personal opinion of modern Python isn't particularly high. And Perl is rather old fashioned these days (though I think modern Perl gets more criticism than it deserves). But the fact is we don't need to reinvent the wheel here. Visual Basic could make raw DLL calls meaning you had unfettered access to Win32 APIs (et al) but that doesn't mean every VBScript out there was making DLL calls left right and centre. Heck, if you really want to distil things down then there's nothing even stopping someone implementing a "PythonScript" type language which is a subset of Python.

I just don't buy "simplicity of the language" as the reason languages aren't often embedded these days. I think it's the opposite problem: "simplicity of the implementation". It's far easier to load a JSON or YAML document into a C(++|#|Objective|whatever) struct than it is it to add API hooks for an embedded scripting language. And that's precisely why software written in dynamic languages do often expose their language runtime for configuration. Eg Ruby in Puppet and Chef, half of PHP applications having config written in PHP, XMPP servers written in Haskell, etc. In those kinds of languages, it is easy to read config from source files (sometimes even importing via `eval`) so there often isn't any need to stick config in JSON documents.



I'm deeply uninterested in continuing to have this discussion with you.


> What we need is like a "Logo" for systems engineers / devops which is a simple toy language that can be described entirely in a book the size of the original K&R C book.

I would argue that Tcl is exactly that. It's hard to make things any simpler than "everything is a string, and then you get a bunch of commands to treat strings as code or data". The entire language definition boils down to 12 simple rules ("dodekalogue"); everything else is just commands from the standard library. Simple Tcl code looks pretty much exactly like a typical (pre-XML, pre-JSON, pre-YAML) config file, and then you have conditionals, loops, variables etc added seamlessly on top of that, all described in very simple terms.



What are your thoughts on:

- https://dhall-lang.org/ - https://toml.io/en/



Dhall is the FP config language you're thinking of, I think.


I mean... Nix satisfies every single one of what you mentioned and people say its too complicated. It's literally just the JSON data structure with lambdas, which really is basic knowledge for any computer scientist, and yet people complain about it.

It's fairly straightforward to 'embed' and as a bonus it generates json anyway (you can use the Nix command line to generate JSON). Me personally, I use it as my templating system (independent of nixpkgs) and it works great. It's a real language, but also restrictive enough that you don't do anything stupid (no IO really, and the IO it does have is declarative, functional and pure -- via hashing).

In Nix's favor:

1. Can be described in a one page flier. An in-depth exhaustive explanation of the language's features is a few pages (https://nixos.org/manual/nix/stable/language/)

2. dynamically typed

3. Turing complete and based on the lambda calculus so has access to the full suite of functional control structures. Also has basic if/then/else statements for the most common cases and for intuition.

4. no threading, no concurrency, no real IO

5. definitely not object-oriented and no inheritance

6. It is functional in design and has an extremely thin set of builtins

7. FFI model is either embed libnix directly (this does not require embedding the nix store stuff, which is a completely separate modular system), or use the command line to generate json (nix-instantiate --eval --json).

Note: do not confuse nixpkgs and NixOS with the nix language. The former is a system to build linux packages and entire linux distributions that use the latter as a configuration language. The nix language is completely independent and can be used for whatever.



Tried to use Nix as a homebrew replacement and failed to get it installed correctly with it blowing up with crazy error messages that I couldn't google. I didn't even get to the point of assessing the language. It really seems like the right kind of idea, but it doesn't seem particularly stable or easy enough to get to that initial payoff. If there's a nice language under there it is crippled by the fact that the average user is going to have a hard time getting to it.


You can use nix without using nixpkgs (you seemed to be trying to use nixpkgs). The nix language is accessible via several command line tools, nix repl, nix eval, nix-instantiate, etc, and can emit json via everal flags, as well as a builtin function.


I agree with the point's in Nix's favor except for 2. dynamically typed. Defining structs as part of the language would be nice. In fact, type checking is done ad-hoc now by passing data through type checking functions.


I agree, and I just want to highlight what you said about generating a config file. It's extremely useful to constrain the config itself to something that can go in a json file or whatever. It makes the config simpler, easier to consume, and easier to document. But when it comes to _writing_ the config file, we should all use a programming language, and preferably a statically typed language that can check for errors and give nice auto complete and inline documentation.

I think aws cdk is a good example of this. Writing plain cloudformation is a pain. CDK solves this not by extending cloudformation with programming capabilities, but by generating the cloudformation for you. And the cloudformation is still a fairly simple, stable input for aws to consume.





Agreed, and I almost feel silly for pointing this out, but for writing JSON (JavaScript Object Notation), I'd recommend using JavaScript...


For JSON I'd stick with Typescript to be honest. You end up executing Javascript and producing Javascript-native objects, but the typing in Typescript to ensure the objects you produce are actually valid will save a lot of debugging.


JS is actually not that great for this IMO. You probably need an NPM package to even deal with YAML because JS has a shitty standard library.

Sticking to a scripting language with a strong standard library is way better.

Any unix system can get Ruby/Python and read/write YAML/JSON immediately without caring too much about versions.

Of course in today's upside down world most developers seem to only know JS, so it would at least be "familiar". Still a bad choice in my view.

The way this industry is going, give it a few years and we'll have React-Kubernetes for generating templates. And I wish I was joking.



Parent is talking specifically about writing JSON, not YAML.


Yeah, but the article is about YAML and my original comment was about configuration in multiple formats.

So, to clarify, for JSON JS is definitely not the worse option. For me though, even for JSON, you have much better options.



I'm very happy using Typescript to templatize JSON. You can define a template as a class, compose them if needed, and when you are done, just write an object to a file.


Completely agree, my wish is that anything that risks getting complex uses a Ruby-based DSL.

For example, I like using Capistrano, which is wrapper around rake, which is a Ruby based DSL. That means that if things get tricky I can just drop down to using a programming language. Split stuff into logical parts that I load where needed and, for example, I can do something like YAML.load(..file..).dig('attribute name') or JSON.load from somewhere else.

Yes, you risk someone building spaghetti that way, but the flip side is that a good devops can build something much easier to maintain than dozens of YAML and JSON files, and you get all the power from your IDE and linters that are already available for the programming language, so silly syntax errors are caught without needing to run anything.



The problem with imperative languages in configs is that they become harder to read. Webpack configs always devolve into this.

We need better tooling to allow tracing a how final configuration values are being generated.

And a _live programming_ environment so we can see the final generated configuration in one view.



This. It's why things like Cloud Development Kit and Pulumi are quite interesting to me.


Because the security surface of "any language" is tricky and most (all?) popular languages do not have nice data literal syntax better than JSON and YAML.


> I heard you liked configuration languages, so I made this configuration language for your configuration language generation scripts. It supports templates, of course.


Helm would probably benefit from something like JSX for YAML/JSON. Just being able to script a chart instead of this templating hell.


I argued that point in my article some time ago https://beepb00p.xyz/configs-suck.html also HN discussion at the time news.ycombinator.com/item?id=22787332


You shouldn't need the full complexity and power of a Turing complete programming language to do config. The point of config is to describe a state, it's just data. You don't need an application within an application to describe state.

Inevitably, the path of just using a programming language for config leads to your config becoming more and more complex until it inevitably needs its own config, etc. You wind up with a sprawling, Byzantine mess.



The complexity is already there. If you only need static state like you say, then YAML/JSON/whatever is fine. But that's not what happens as software grows.

You need data that is different depending on environments, clouds, teams, etc. This complexity will still exist if you use YAML, it'll just be a ridiculous mess where you can break your scripts because you have an extra space in the YAML or added an incorrect `True` somewhere.

Complexity growth is inevitable. What is definitely avoidable is shoving concepts that in fact describe a "business" rule (maybe operational rule is a better name?) in unreadable templates.

Rules like: a deployment needs add these things when in production, or change those when in staging, etc exist whether they are hidden behind shitty Go templates or they are structured inside of a class/struct, a method with a descriptive name, etc.

The only downside is that you need to understand some basics of programming. But for me that's not a downside at all, since it's a much more useful skill than only knowing how to stitch Go templates together.



Why are we writing software that needs so much configuration? Not all of it is needed. We could do things more like consumer software, which assumes nobody will even consider your app if they have to edit a config file.


> your config becoming more and more complex until it inevitably needs its own config, etc. You wind up with a sprawling, Byzantine mess.

We're already there with Helm.

People write YAML because it's "just data". Then they want to package it up so they put it in a helm chart. Then they add variable substitution so that the name of resources can be configured by the chart user. Then they want to do some control flow or repetitiveness, so they use ifs and loops in templates. Then it needs configuring, so they add a values.yaml configuration file to configure the YAML templating engine's behaviour. Then it gets complicated so they define helper functions in the templating language, which are saved in another template file.

So we have a YAML program being configured by a YAML configuration file, with functions written in a limited templating language.

But that's sometimes not enough, so sometimes variables are also defined in the values.yaml and referenced elsewhere in the values.yaml with templating. This then gets passed to the templating system, which then evaluates that template-within-a-template, to produce YAML.



At the end of the day, Helm's issues stem from two competing interests:

(1) I want to write something where I can visualize exactly what will be sent to Kubernetes, and visually compare it to the wealth of YAML-based documentation and tutorials out there

(2) I have a set of resources/runners/cronjobs that each require similar, but not identical, setups and environments, so I need looping control flow and/or best-in-class template inclusion utilities

--

People who have been working in k8s for years can dispense with (1), and thus can use various abstractions for generating YAML/JSON that don't require the user to think about {toYaml | indent 8}.

But for a team that's still skilling up on k8s, Helm is a very reasonable choice of technology in that it lets you preserve (1) even if (2) is very far from a best-in-class level.



I have a recent example of rolling out IPv6 in AWS:

1. Create a new VPC, get an auto-assigned /56 prefix from AWS.

2. Create subnets within the VPC. Each subnet needs an explicitly-specified /64 prefix. (Maybe it can be auto-assigned by AWS, but you may still want to follow a specific pattern for your subnets).

3. Add those subnet prefixis to security / Firewall rules.

You can do this with a sufficiently-advanced config language - perhaps it has a built-in function to generate subnets from a given prefix. But in my experience, using a general-purpose programming language makes it really easy to do this kind of automation. For reference, I did this using Pulumi with TypeScript, which works really well for this.



That kind of ignores the entire pipeline involved in computing the correct config. Nobody wants to be manually writing config for dozens of services in multiple environments.

The number of configurations you need to create is multiplicative, take the number of applications, multiply by number of environments, multiply by number of complete deploys (i.e. multiple customers running multiple envs) and very quickly end up with an unmanageable number of unique configurations.

At that point you need a something at least approaching Turing completeness to correctly compute all the unique configs. Whether you decide to achieve that by embedding that computation into your application, or into a separate system that produces pure static config, is kind of academic. The complexity exists either way, and tools are needed to make it manageable.



That's not my experience after using AWS CDK since 2020 in the same company.

Most of our code is plain boring declarative stuff.

However, tooling is lightyears ahead of YAML (we have types, methods, etc...), we can encapsulate best practices and distribute as libs and, finally, escape hatches are possible when declarative code won't cut.



We need turing completeness in the strangest of places. We can often limit these places to a smaller part of the code. But it's really hard to know beforehand where those places will occur. Whenever we think we have found a clear separation we invent a config language.

And then we realize that we need scripting so we invent a templating language. Then everybody looses their minds and invents 5 more config languages that surely will make us not need the templating language.

Let's just call it code and use clever types to separate turing and non-turing completeness?



A really good solution here is to use a full programming language but run the config generator on every CI run and show the diff in review. This way you have a real language to make conditions as necessary but also can see the concrete results easily.

Unfortunately few review tools handle this well. Checked-in snapshot tests are the closest approximation that I have seen.



> You don't need an application within an application to describe state.

As shown in the article, you apparently do.



It happens because config is dual purpose: its state, but it's also the text-UI for your program. It spirals out of control because people want the best of it being "just text" and being a nice clean UI.


I agree, I think a language like dhall (https://dhall-lang.org/) strikes a good balance.


Yeah, YAML is good at declarative things. It’s when you start using it imperatively eg CI/CD is when it really starts to get ugly.


This is how config actually works in Scala.


Throwing in a plug for https://dhall-lang.org/

> Dhall is a programmable configuration language that you can think of as: JSON + functions + types + imports



I just knew this would be about Kubernetes when I saw the title.

The Kubernetes API is fairly straightforward, and has a well-defined (JSON) schema, people should be spending a bulk of their time learning k8s understanding how to use the API, but instead they spend it working out how to use a Helm chart.

I don't think Jsonnet, Ksonnet, Nu, or CUE ever gained that much traction. I'm convinced most people just use Kustomize, because it's fairly straightforward and built in to kubectl.

I'd like a tool that:

- Gives definition writers type checking against the k8s schemas - validation, version deprecations, etc.

- Gives users a single artefact that can be inspected easily and will fail (ACID) if deployed against a cluster that doesn't support any objects/versions.

- Is built into the default toolchain

---

I feel like writing a Bun or Deno TypeScript script that exports a function with arguments and returns a list of definitions would work well, esp. with `deno compile`, etc. but that violates the third point.



> The Kubernetes API is fairly straightforward, and has a well-defined (JSON) schema, people should be spending a bulk of their time learning k8s understanding how to use the API, but instead they spend it working out how to use a Helm chart.

This is a general pattern in software. Instead of learning the primitives and fundamentals that your system is built on, which would be too hard, instead learn a bunch of abstractions over top of it. Sure, now you are insulated from the lower-level details of the system, but now you have to deal with a massive stack of abstractions that makes diagnosis and debugging difficult once something goes wrong. Now it's much harder to ascertain what exactly is happening in your system, since the details of what is actually going on have been abstracted away from you by design. Further, you are now dependent on that abstraction layer and must support and accommodate whatever updates may be released by the vendor, in addition to whatever else is lurking in your dependency graph.



Yep, I find kustomize and (especially) helm so confusing, while finding kubernetes yaml files very easy to use and understand.


probably doesn't meet the 2nd requirement, most definitely doesn't meet the third, but:

https://cdk8s.io/docs/latest/



The second requirement is actually probably the most important - if someone that just set up ArgoCD, Flux, or has their own GitOps pipeline, how much of a headache does using a new compile step present?

Lots of things are simple in isolation: want to use Cue? Just get your definitions and install the compiler and call it and boom, there are your k8s defs! Ok, but how do I integrate all of that into my existing toolchain? How do I pass config? Etc, etc.

The best, fastest tool won't win. The tool that has the most frictionless user story will.



I was able to get CDK8s working easily by simply committing the built template along with my TypeScript. Then, I just pointed ArgoCD to my repo.


We do the same thing but commit to a second git repo that we treat like the "k8s yaml release database".


I love the idea of keeping it simple and I do try to use kustomize or even plain yaml as installation method as much as possible.

But in practice when managing large systems you inevitably end up benefiting from templating



I've begun thinking that if you start thinking about templating you might be better off building an operator. Operators aren't as well understood and documented. But in my mind an operator is just a pod or deployment that creates on demand resources using the k8s api.


The purpose of an Operator is to realize the resources desired/requested in a (custom) resource manifest, often as YAML or JSON.

You give the apiserver a document describing what resources you need. The Operator actually does the work of provisioning those resources in the "real world" and (should) update the status field on the API object to indicate if those resources are ready.



oh yeah; operators are great and sometimes they are necessary.

On the other hand, most operators I've seen are just k8s manifest templates implemented in Go.

I often end up preferring using Jsonnet to deal with that instead of doing the same stuff in Go.

Jsonnet is much more close to the underlying datamodel (the k8s manifest Json/Yaml document) and comes with some useful functionality out of the box, such "overlays".

It has downsides too! It's untyped, debugging tools are lacking, people are unfamiliar with it and don't care to learn it. So I totally get why one would entertain the possibility of writing your "templates" using a better language.

However, an operator is often too much freedom. It's not just using Go or Rust or Typescript to "generate" some Json manifests, but it also contains the code to interact with the API server, setup watches, and reactions etc.

I often wish there was a better way to separate those two concerns

I'm a fan of metacontroller [1], which is a tool that allows you to write operators without actually writing a lot of imperative code that interacts with the k8s API, but instead just provide a general JSON->JSON transformer, which you could write in any langue (Go, Python, Rust, Javascript, .... and also Jsonnet if you want).

I recently implemented something similar but much tailored to just "installing" stuff, called Kubit. An OCI artifact contains some abitrary tarball (generally containing some template sources) and a reference to a docker image containing an "engine" and runs the engine with your provided tarball + some parameters passed in a CRD. The OCI artifact could contain a helm chart and the template engine could contain the helm binary, or the template engine could be kubecfg and the OCI artifact could contain a bunch of jsonnet files. Or you could write your own stuff in python or typescript. The kubit operator then just runs your code, gathers the output and applies with with kubectl apply-set.

1. https://metacontroller.github.io/metacontroller/intro.html

2. https://github.com/kubecfg/kubit



> On the other hand, most operators I've seen are just k8s manifest templates implemented in Go.

> I'm a fan of metacontroller [1], which is a tool that allows you to write operators without actually writing a lot of imperative code that interacts with the k8s API, but instead just provide a general JSON->JSON transformer,

That seems... surprising, to me. It's not clear to me how a JSON->JSON transformer (which is essentially a pure function on UTF-8 strings to UTF-8 strings, i.e. an operation without side effect) can actually modify the state of the world to bring your requested resources to life. If the only thing the Operator is being used for is pure computation, then I agree it's overkill.

An example use case for an Operator would be a Pod running on the cluster that is able to receive YAML documents/resource objects describing what kind of x509 certificate is desired, fulfill an ACME certificate order, and populate a Secret resource on the cluster containing the x509 certificate requested. It's not strictly JSON to JSON, from "certificate" custom resource to Secret resource - there's a bunch of side-effecting that needs to take place to, for instance, respond to DNS01 or HTTP01 challenges by actually creating a publicly accessible artifact somewhere. That's what Operators are for.



Metacontroller is actually quite easy to learn. It comes with good examples too. Including a re-implementation of the Stateful Set controller, all done with iterations of an otherwise pure computation. The trick is obviously that the state lives in the k8s api server, from which the inputs of the subsequent invocation of your pure function come.


> an operator is often too much freedom

While that is true I'm a bit afraid that we might be overselling the concept of limiting freedom past a certain point. Limiting freedom has the upside of giving us some guarantees that makes a solution easier to reason about. But once we step out of dumb-yaml I don't see that making additional intermediate trade-offs is worth it. And there are apparently some downsides to introducing additional layers as well.

The main downside of limiting freedom seems to be the chaos of having so many different ways to do things. Imagine what could happen if we agreed that there are two ways of doing things; write yaml without templates or write an operator. Then maybe we could focus efforts on the problem of writing maintainable operators.

Things should be either dumb data or the kitchen sink I think.



I'm not against having actual controllers with powerful logic.

But often is possible to separate the custom logic from the bulk of the parameterized boilerplate.



Helm is a low budget operator.


No... no, no, no. No kidding; Operators are indeed poorly understood. They are not just glorified XSLT for YAML/JSON.

https://kubernetes.io/docs/concepts/extend-kubernetes/operat...



We're using jsonnet for our systems and they have absolutely nothing to do with k8s. I'm not sure it's true to say it has ever gained much traction. It's just a niche case for complex configuration, and isn't the most publicised tool.

It does precisely what we need with zero fuss, cross platform and cross _language_ (we've embedded it in C++, .NET, and JVM executables).

We can use the resulting json config with a vast array of tools that simply don't exist for the alternatives such toml/yaml/hocon/ini whatever. In fact we tried to get HOCON working for non-JVM languages but there was always some edge case.



How would one use the json api without ending up writing a bunch of custom code?


I think custom code is to be expected, and making it maintainable is what's important.

> everything should be made as simple as possible, but no simpler.

Helm et al made it simpler than it was, IMO.



Everyone hand rolling code does not seem like an improvement over tools like helm even if it’s yaml


No, obviously not, and that's not what I've suggested.


Helm is another can of hot garbage. Impossible to vendor without hitting name collisions, can configure only what’s templated.

Jsonnet is the way to go with generated helm manifests transformed later. Kustomize with its post-renderer hooks is another can of even hotter garbage.



> Impossible to vendor without hitting name collisions

What problem exactly are you facing? I can change the name of the chart itself in chart.yaml and if the name of the resources collide I change them with nameOverride/fullnameOverride in the values. All charts have these because they are autogenerated by `helm create`.

I vendor all charts and never had this problem.



You just made a copy of a chart. You modified your chart. What I’m missing is helm having some notion of an org in the chart name, like docker does: repo/name:tag, helm only has name and version. Hence you modify your chart.yaml and it should be preferable without having to modify anything.

This is really problematic when a chart pulls dependencies in.



For Helm the value is it is not a configuration managemet solution but a package manager. The rest are just methods of writing json/yaml.

I understand the "hate" against yaml, But I don't think it's deserving it that much.

Perhaps timoni will take over with it's usage of cue. At least it's a package management solution.



k8s make me miss xml


It's funny how little developers think about how to do configuration right.

It's just a bunch of keys and values, stored in some file, or generated by some code.

But its actually the whole ball game. It's what programming is.

Everything is configuration. Every function parameter is a kind of configuration. And all the configuration in external files inevitably ends up as a function parameter in some way.

The problem is the plain-text representation of code.

Declarative configuration files seem nice because you can see everything in one place.

If you do your configuration programmatically, it is hard to find the correct place to change something.

If our code ran in real-time to show us a representation of the final configuration, and we could trace how each final configuration value was generated, then it wouldn't be a problem.

But no systems are designed with this capability, even though it is quite trivial to do. Configuration is always an after-thought.

Now extend this concept to all of programming. Imagine being able to see every piece of code that depends upon a single configuration value, and any transformations of it.

Also, most configuration is probably better placed into a central database because it is relational/graph-like. Different configuration values relate to one another. So we should be looking at configuration in a database/graph editor.

Once you unchain yourself from plain-text, things start to become a lot simpler...of course the language capabilities I mentioned above still need to become a thing.



This is something I'm trying really hard to do with a client. They have a bunch of 1500+ line "config" files for products, which are then used to make technical drawings and production files. The configs attempt to use naming scheme to group related variables together.

I want to migrate to an actual nested data-structure using (maybe) JSON - and these engineers absolutely will not write code, so config-as-code is a no-go, in addition to the disadvantage you mentioned.

My next thought was that there should be a better way to show the configuration, and allow that configuration to be modified. I was thinking maybe some sort of visual UI which where the user can navigate a representation of the final product, select a part and modify a parameter that way.

Is that along the lines of your suggestion? If not will you please expand a little? Configuration is the absolute core of this application.



Sounds like you need an SQL database. You could use SQLite.

Then provide a GUI to modify that database. You could add a bunch of constraints in the database too to ensure the config is correct.

Usually when there is plain-text files though, it's because they want it that way. It's easier to edit a text file sometimes than rows in a database. Cut/copy/paste/duplicate files and text. Simple textual version control.



Sure, I agree - I'm proposing JSON as an intermediate step toward a well-defined data-model since the thousands of copied config files have evolved over time, so the data-model is a smear of backward-compatibility hacks.

What I was trying to do is get you to explain what you mean by this:

> If our code ran in real-time to show us a representation of the final configuration, and we could trace how each final configuration value was generated, then it wouldn't be a problem. [...] But no systems are designed with this capability, even though it is quite trivial to do. Configuration is always an after-thought.



This is only relevant if you allow code to define config.

If you use conditionals and loops to create config, and then view the final json, it quickly becomes annoying when you know the thing you want to change in the final json, but have to trace backwards through the code to figure out where to change it.

So programmatic configs only work if you have this "value tracing" capability. Which nothing really does.



Worse yet, in some places (CI/CD) YAML becomes nearly a programming language. A very verbose, unintuitive, badly specified and vendor-specific one as well.


It's pretty much repeating the mistake of early 2010s Java, where the entire application frequently was glued together by enormous ball of XML that configured all the dependency injection.

It had the familiar properties of (despite DTDs and XML validation) often blowing up late, and providing error messages that were difficult to interpret.

At the time a lot of the frustration was aimed at XML, but the mid 2020s YAML hell shows us that the problem was never the markup language.



You have a loosely coupled bundle of modules that you need to glue together with some configuration language. So you decide to use X. Now you have two problems.


Spot on. We use ytt[0], "a slightly modified version of the Starlark programming language which is a dialect of Python". Burying logic somewhere in a yaml template is one thing I dislike with passion.

[0] https://tanzu.vmware.com/developer/guides/ytt-gs/



TBH, ytt is the only yaml templating approach that I actually like.

The downside is that it is easy to do dumb things and put a lot of loops in your yaml.

The positive is that it is pretty easy to use it like an actual templating language with business logic in starlark files that look almost just like Python. In practice this works pretty well.

The syntax is still fairly clumsy, but I like it more than helm.



In some places working with Kubernetes, people unironically use the term "YAML engineer".


I've seen memes where SREs complain they have just become YAML engineers. :(


I've been there. Not YAML specifically, but basically just configuration (XML, JSON, properties, ...) for some proprietary systems without any good documentation or support available. "It's easy, just do/insert X", half a year and dozens of meetings and experts later, it was indeed not just X. Meanwhile I could've build everything myself from scratch or with common open-source solutions.


I mean...building a data centre / PaaS with YAML is pretty cool

We used to have to shove servers in to racks ! Kids these days :D



I *loved* shoving servers in racks!


I dream of a day there's a physical component of my job, not just the staring at a screen bit.


yamlops is a real thing :)


YAML is the Bradford Pear of serialization formats. It looks good at first, but as your project ages, and the YAML grows it collapses under the weight of it's own branches.


I had to look up that tree. Invasive, offensive odour, cynaide-rich fruit. That's a a good insult!


You should see what they look like after a 25kph breeze. Which isn't too far off from what templated YAML generates after someone commits a bad template.


YAML is also just as bad as the Linden tree.

https://www.youtube.com/watch?v=aoqlYGuZGVM



Yeah … for CI files (like Github workflows & such), one of the best things I think I've done is just to immediately exec out to a script or program. That is, most of our CI steps look like this:

  run: 'exec ci/some-program'
… and that's it. It really aids being able to run the (failing) CI step offline, too, since it's a single script.

Stuff like Ansible is another matter altogether. That really is programming in YAML, and it hurts.



In such places one frequently has to remind oneself and others to not start programming in that configuration language, if avoidable, to not create tons of headache and pain.


Even worse, every generation repeats this mistake. I‘m not sure S-Expressions are the answer, but Terraform HCL should never have been invented.


I was just telling a colleague today that HCL is great until you need to do a loop. A lot of parallels to this YAML discussion


My favorite pattern in HCL is the if-loop. Since there is no »only do this resource if P« in Terraform, the solution is »run this loop not at all or once«.


I'll take HCL over YAML templating any day. At least it is working with real data structures not bashing strings together.

That being said, yes, it is also an awful language.



This criticism doesn't pass the sniff test though: your average Haskeller loves to extoll the virtues of using Haskell to implement a DSL for some system which is ultimately just doing the same thing in practice (because they're still not going to write documentation for it, but hey, how hard can it be to figure out it's just...)

YAML becomes a programming language because vendors need a DSL for their system, and they need to present it in a form which every other language can mostly handle the AST for, which means it's easiest if it just lives atop a data transfer format.



Hey now. Your average Haskeller would simply recommend you replace YAML with Dhall.

https://dhall-lang.org/



Why not "just" use an embedded DSL?


I don't know what this has to do with Haskell. I understand that they need a DSL for their system. I just don't agree that it is a good idea to use some general purpose serialization format. In the end they always evolve to a nearly full programming language with conditions and loops. Using a full programming language makes much more sense IMHO, for example like Zig build files or how we use Python to build neural networks. That way I can actually use existing tools to do what I need.


maybe yaml should standardise hygienic macros. and a repl.


The lengths people go to avoid using s-expressions never ceases to amaze me.

We're talking countless centuries and great many minds pushed to brink of madness, just to keep the configs looking like Python or JavaScript.



I'd say it's even worse: it's a collective hallucination that complex configs are not code.


Yeah, I'm very sad that helm won. We do OSS k8s stuff at work, and 100% of users have asked for us to make a helm chart. So we had to. It is miserable to work on; your editor can't help you because the files are named like "foo.yaml" but they aren't YAML. You have to make sure you pipe all your data through "indent 4" so that things are lined up correctly in the YAML. What depresses me the most is that you have to re-expose every Kubernetes feature in your own way. Someone wants to add deployment.spec.template.spec.fooBars? Now you have to add deploymentFooBars to your values.yaml file and plumb it in. For every. single. feature.

It's truly "worse is better" gone wrong. I have definitely done some terrible things like "sed -e s/$FOO/foo/g" to implement templating... and that's probably how Helm started. The result is a mess.

I personally grew up on Kustomize before it was in kubectl, and was always exceedingly happy with it. (OK, it has a lot of quirks. But at least it saves you time because it actually understands the semantics of the objects you are creating.)

I like Jsonnet a lot better. As part of our k8s app, we ship an Envoy deployment to do all of our crazy traffic routing (basically... maintaining backwards compatibility with old releases). Envoy configs are... verbose..., but Jsonnet makes it really easy to work on. (The code in question: https://github.com/pachyderm/pachyderm/blob/master/etc/gener...)

I'm seriously considering transpiling jsonnet to the Go template language and just implementing everything with Jsonnet. At least that is slightly maintainable, and nobody will ever know because "helm install" will Just Work ;)

But yeah, I think Helm will be the death of Kubernetes. Some competing computer allocator container runner thingie will have some decent language for configuration, and it will just take over overnight. Mark my words!



> I have definitely done some terrible things like "sed -e s/$FOO/foo/g" to implement templating

Next time you reach for this, check out envsubst for a slightly improved solution that’s somewhat standard (at least common).

On the topic of templating or modifying helm charts using jsonnet, you might find Tanka helpful:

https://tanka.dev/helm



> But yeah, I think Helm will be the death of Kubernetes. Some competing computer allocator container runner thingie will have some decent language for configuration, and it will just take over overnight. Mark my words!

I want to believe this.

Everywhere I've worked we're still rawdogging tf/hcl and helm though, because change is scary.

At least I get some relief in my personal projects. :')



I see a problem here. I'm not certain if the sort of person who would choose YAML as their configuration language sees a problem here.

There is a direct conflict between human-centred data representations and computer-centred. Computers love things that look like a bit like a Lisp. Humans like things that look a bit like Python. If you're the sort of person who wants to use a computer to manipulate their Kubernetes config then you'd be secretly annoyed that Kubernetes uses YAML. However, it appears the Kubernetes community are mainly YAML people, so why would they mind that their config files will be horrible to work with once programming logic gets involved? The downside of YAML is exactly this scenario, and I believe the people involved in K8s are generally cluey enough to see that coming.

> YAML is a superset of JSON

The spec writers can put whatever they want in their document, but I don't think this is true. If you go in and convert all the YAML config to JSON, the DevOps team is going to get upset. The two data formats have the same semantic representation, but so do all languages compiled to the same CPU arch. JSON and YAML are disjoint in practice. Mixing the two isn't a good idea.



The ironic thing is that, IIRC, k8s manifests were supposed to be machine-generated from the k8s's inception, you weren't supposed to write them by hand... of course, people wrote them by hand anyway, until it became unbearable ― at which point they've started templating them because that's how the things always seem to progress: manually-written text is almost never replaced by machine-generated config-serialized-to-text, it's replaced by templated-but-originally-still-manually-written text.


> k8s manifests were supposed to be machine-generated from the k8s's inception,

failed spectacularly at not being inconvenient enough for their intended purpose.

one of those cases where unreadable by design would be a most welcome feature.



"YAML is a superset of JSON" only means that any JSON document is a valid YAML document. It does not mean YAML is equal to JSON.


My personal philosophy is that string interpolation should not be used to generate machine-readable code, and template languages are just fancy string interpolation. We've all seen the consequences of SQL injection and cross-site scripting. That's the kind of thing that will keep happening as long as we keep putting arbitrary text into interpreters.

Yes, this means I don't think we should use template files to make HTML at all.

Alternatives to using template languages for HTML include Haml (for Ruby) and Pug (for JavaScript). These languages have defined ways to specify entire trees of tags, attributes, and text nodes.

If you don't like Python-style significant indentation, JavaScript has JSX. The HTML-looking parts of JSX compile down to a bunch of `createElement` expressions that create a web document tree. That tree can then be output as HTML if necessary.

Haml, Pug, and JSX are not template languages even though they can output HTML. Likewise, `JSON.stringify(myObj)` is not a template language for JSON. Generating machine-readable code should be done with a tool that understands and leverages the known structure of the target language when possible.



> Haml, Pug, and JSX are not template languages even though they can output HTML.

That's nonsense, unless we go by your idiosyncratic definition of what a template language is ("fancy string interpolation").

> Haml (HTML Abstraction Markup Language) is a templating system that is designed to avoid writing inline code in a web document and make the HTML cleaner.

> Pug – robust, elegant, feature rich template engine for Node.js

> JSX is an XML-like syntax extension to ECMAScript without any defined semantics.

OK, I'd agree that JSX is not strictly a template language.

But in the end, all of these compile down to HTML. Not by string interpolation, but as a language that is parsed into a syntax tree, then rendered into HTML properly with an internal understanding of valid structure.

YAML with templating is fancy string interpolation, it's not a template language (or at least a poorly implemented one).



I am aware that Haml and Pug call themselves template languages, but they are not. In a template language, the source is a "template" that has some special syntax to fill in some bits. I don't think that's a very idiosyncratic definition. Pretty much any programming language can output a bunch of text, but most of them are not template languages. Java has XMLBuilder, but that doesn't make it a template language for outputting XML. But PHP is a template language, even though it's not recommended to use it that way anymore.


Sorry, reading over my comment, I sounded more antagonistic than I meant to be. After all, we're here to enjoy discussion and not to battle against each other.

As an aside, on another post yesterday, I had a pleasant surprise about "templating" in life itself.

> The familiar distinction between software and hardware loses its meaning in living cells. We propose new ways to study the phylogeny of metabolisms, new astronomical ways to search for life on exoplanets, new experiments to seek the emergence of the most rudimentary life, and the hint of a coherent testable pathway to prokaryotes with template replication and coding.

https://arxiv.org/abs/2401.09514

Maybe DNA is the original templating language. (Hopefully with more sophistication than fancy string interpolation.)



Well, it's true that Haml calls itself a "templating system", and Pug uses the term "template engine". That's 3 out of 3, you win. ;)

PHP is a scripting language that is also a template processor, but I wouldn't call it a template language. So we disagree on several points, but no big deal. A big disadvantage of PHP, in relation to your original point about "fancy string interpolation", is that it does not natively understand the target output HTML syntactically and structurally.



Not all template languages are string template languages, though. If you consider PHP a templating language for text, for example, then by the same logic XQuery is a templating language for XML.


This is the essence of the problem! Yaml and templates are just distractions. It just boils down to the fact that "string" is a very general type and we use it lazily.

My personal rule: Every time a value is inserted into a string it must be properly encoded.

I wrote a full blog post around this a while back https://kevincox.ca/2022/02/08/escape-everything/. But the TL;DR is that every string has a format which needs to be respected wether that be HTML, SQL or human-readable terminal output. Every time you put some value into a string you should be properly encoding it into that format. But we rarely do.



> My personal rule: Every time a value is inserted into a string it must be properly encoded.

This is how Django templates have done it for over a decade. You have to go out of your way to tell it not to escape the values if for some reason you need that.



We are switching to cuelang [1]. IMHO it is better designed than Jsonette. Since Kubeenetes already has state reconciliation, the only thing missing in this setup is deletion. But that can now be accomplished with the prune feature. [2]

[1] https://cuelang.org/docs/integrations/k8s/

[2] https://kubernetes.io/blog/2023/05/09/introducing-kubectl-ap...



I can second cuelang. We started using it at work and it's so nice. Some of the error messages are a little hard to decipher, but that's acceptable because it catches so many errors up front. The few times I have to write yaml directly, it now feels so tedious in comparison.


I love YAML and I curse it every single day that I'm working with Helm charts.

People ask me what I'd use to deploy apps on Kubernetes and I say I hate Helm and would still use it for a single reason: everybody is using it, I don't want to create a snowflake infrastructure that only I understand.

Still, back in the day I thought jsonnet would win this battle but here we are, cursing Helm and templates. That's the power of upstream decisions.



This is where I usually pitch in with "Have your heard of CUELang, our lord and savior?": https://cuelang.org/

- Not turing complete yet sufficiently expressive to DRY

- Define schema and data with the same language, in a separate or same file. With union types.

- Generate YAML or JSON. Can validate itself, or a YAML or JSON file.

The biggest drawback being the only implementation is currently in go, meaning you may have to subprocess of ffi.



we have a pipeline that ingest very concise cuelang files.

then it generates json files for each application for a tool that will create xml definitions which then are applied to a xls which the architects own, to spit out a yaml that we use to apply our helm charts. the charts deploy a k8s client which then interact with the main cluster via json using the api.

took a while, but we are using the best tool for each job.



just throw in a kafka cluster so you can pipe each step through an event bus and you'll have an enterprise-grade deployment setup


You used JSON twice, how casual.

Your API should clearly be using protobuf.



How does it compare to dhall?


Dhall's lack of any form of type inference makes it very verbose and difficult to refactor in my opinion. (I'm the author of dhall-kubernetes and never ended up using it in production; funnily enough). Dhall is also extremely slow. We had kubernetes manifests that took _minutes_ to type-check. Cue is basically instant. This matters a lot to me.

I find cue very ergonomic. Also it treating both types and values as values is very neat. You write your types and your values in the same syntax and everything unifies neatly. but I sometimes miss its lack of functions.

Cue also being to ingest protobuf definitions and openapi schemas makes it very quick and easy to integrate with your project. Have a new Kubernetes CRD you want to have type-checked in cue? No problem just run `cue get go k8s.io/api/myapi/v1alpha1` and off you go you have all your type definitions imported from Go to Cue!

Especially for k8s this makes for very fast development and iteration cycle.

I've wanted to take a look at https://nickel-lang.org/ which is a "what if cue had functions" language. but to be honest Cue kind of serves my needs.



Functions are a nice to have, but:

- It tends to make things less declarative.

- You lose locality of behavior, which is very useful in configuration.

Also, nickel doesn't support injecting data into the nickel file, so external program can't set variables, query a database and pass the result to the conf file, etc.



> Dhall is also extremely slow. We had kubernetes manifests that took _minutes_ to type-check. Cue is basically instant.

Everyone wants type-safety, but no one wants to wait for the type-checker :)

Maybe in this case dhall with type checks equivalent to dhall would be slower, but I notice in many places people say "strong type-checking is valuable" while still expecting similar compile times as languages with weaker type systems.



People always undervalue the beauty of a short feedback loop until it's taken away from them.

And even then, they won't exactly pin point the problem, rather express their general frustration, without realizing that the dynamic system they used had indeed some great properties and were not popular for no reason.



Speaking of Nickel, they've got a great document detailing the reasons for their design (for example why they chose not embed in a general-purpose language like Pulumi) and how Nickel compares to other config languages like Dhall and CUE: https://github.com/tweag/nickel/blob/master/RATIONALE.md


Cue was designed very much with k8s in mind and developed tutorials and integrations for it early on. Dhall was designed pre-k8s. Dhall had to introduce a defaults feature: before that it was completely unusable for k8s. Dhall has functions, which are natural to programmers- particularly from an FP background, Dhall would be trivial to start using. Whereas it takes some getting used to cue's unifications- but there is enough documentation and integration for getting going with k8s to make up for it. Dhall has unique features for stably importing configurations from remote locations.


In my view, the presence of YAML templating is a red flag in any codebase or system.

YAML got its popularity with the advent of Ruby on Rails, largely due to the simplicity of the database.yml file as an aid in database connection string abstraction that felt extremely clean to Java programmers who were used to complicated XML files full of DSN names and connection string peculiarities.

The evolution of the database.yml file into something arguably as complex as the thing it was intended to replace is described in the article below:

https://dev.to/andreimaxim/the-rails-databaseyml-file-4dm9



I will tell you exactly why we template yaml. Its the exact same reason every code base has ugly parts. And that's the evolution of complexity.

At first, you have a yaml file. No templates, no variables. Just a good old standard yaml. Then, suddenly you need to introduce a single variable. Templating out the one variable is pretty easy, so you do it, and it's still mostly for humans to edit.

Well, now you have a yaml file and template engine already in place. So when one more thing pops up, you template it out.

8 features later, you wonder what you've done. Only, if we go back in time, each step was actually the most efficient. Introducing anything else at step 1 would be over-engineering. Introducing it anywhere else would lead to a large refactor and possible regressions.

To top it off, this is not business logic. Your devs are not touching this yaml all that much. So is it worth "fixing", probably not.



The title of TFA was actually my reaction when I learned what Helm was actually doing. Initially I thought Helm would take an input file of YAML-with-template-bits, parse that YAML as an object, then use the provided template bits to fill in the parts of that object, then serialize the object back to YAML and write it out. Sounds reasonable, right? Nope, it's literal text substitution, so if you want to have a valid YAML as the output you better count your indentation on your fingers, and track where the newlines go or don't go.


Wouldn't it be a better idea to use an existing programming language instead of cooking up numerous half baked templating languages?


Yes.

Except you then have to sensor that programming language severely. Maybe you can accept some endless loop, but you probably don't want the CI orchestrator to start mining Monero, instead of bootstrapping and configging servers and services.

A solution to that sensorship might be a very limited WASM runtime: one that offers a very few API's, has severely limited resources and timeouts and such. So people can write their orchestration in Python, Javascript or Rust or even Brainfuck if they want, but what that orchestration can do, and for how long it can do that, and how much memory, space and so on it gets, all is very limiting.

While that may work, it's far harder to think of than "lets make another {{templating|language}}" inside this YAML that we already have and everyone else uses.



I don't see any practical difference w.r.t. cybersecurity between "I blindly applied this pile of YAML to my production kubernetes clusters without looking at it" and "I blindly downloaded and ran this computer program on my CI runner without looking at it".

A supply chain attack on the former means that your environment is compromised. So does the latter.



I do see a difference.

GitHub actions isn't going to run your Python code on its orchestration infra. Nor is DigitalOcean or Fly.io or CircleCI. They all convened around "YAML" because it's a very limited set of instructions.

I'm quite sure you cannot write a bitcoin miner (or something that opens a backdoor) in Liquid inside YAML in the DSL that Github Actions has. I am 100% sure you can write a bitcoin miner in Python, Javascript, Lua, or any programming language that Github would use to replace their YAML config.



What? GitHub Actions, at the very least, isn't strictly yaml. I run arbitrary code in whatever language I want all the time. I'm pretty sure third party workflows can, too.


you can still have a json output of the python code and compare it with in similar way to how atlantis work.


We wrote a backend service at Lyft in Python and at some point needed to do some string interpolation for experimentation. In a rush someone implemented this in YAML (no new deps needed). This ended up being the bane of the teams existence. Almost impossible to test if something was going to break in runtime, could only verify it was valid yaml but many other things were infeasible, super hard to debug - it soured me on YAML for years.


Ansible convinced me that doing programming tasks in YAML is insanity, so I started an experiment: What would Ansible be like if it's syntax were more like Python than YAML. https://github.com/linsomniac/uplaybook

I spent around 3 months over the holidays exploring that by implementing a "micro Ansible", I have a pretty solid tool that implements it, but haven't had much "seat time" with it: working on it rather than in it. But what I've done has convinced me that there are some benefits.



I am really sad that jsonnet / ksonnet never really took off. It’s a great way to template, but has a bit of a learning curve in my experience. I suspect that is why it’s niche.

If you like what is presented in this article, take a look at Grafana Tanka (https://tanka.dev).



I was reading the description of Jsonnet and wondering why we don't just use JavaScript. Read a file, evaluate it, take the value of the last expression as the output, and blat it out as JSON.

The environment could be enriched with some handy functions for working with structures. They could just be normal JavaScript functions. For example, a version of Object.assign which understands that "key+" syntax in objects. Or a function which removes entries from arrays and objects if they have undefined values, making it easy to make entries conditional.

Those things are simple enough to write on demand that this might not even have to be a packaged tool. Just a thing you do with npm.



Yeah similarly I'm using Nix to template K8s templates and I've never looked back. Helm is great for deploying 3rd party applications easily but I've never seen the appeal for using it for in house services, templating YAML is gross indeed.


The fact that it's a purely functional programming language with lazy evaluation is really powerful but steepens the learning curve for devs who haven't worked with functional languages.

The stdlib is also pretty sparse, missing some commonly required functions.



> The fact that it's a purely functional programming language with lazy evaluation is really powerful but steepens the learning curve for devs who haven't worked with functional languages.

does it really though? what part do they struggle with?



IME engineers struggle with folds most.


> The stdlib is also pretty sparse, missing some commonly required functions.

This seems to be the general curse of template languages. For some reason, their authors have this near-religious belief in removing every "unneeded" feature, which in practice results in having to write 10 incomprehensible lines of code to do something that could be easily done in one line of readable code in a proper PL.



In my experience there is a near zero uptake of jsonnet or similar amongst "regular" i.e less ops inclined developers.

gotmpl is a lot easier to grok if you are coming in cold. Yes it sucks for anything mildly complex, but the barrier to entry is significantly lower.

Generation via real programming languages is the future I am hoping for.



Jsonnet looks like a case of XKCD-927[0]. I fully agree with you that real programing languages are the way to go for generating anything more complex.

[0] https://xkcd.com/927/



Indeed why? However the conclusion I have is not to use JSON but to use a type safe configuration language that can express my intent much better making illegal states impossible. One example of such lang is Dhall.

https://dhall-lang.org/



If I’m going to use a whole language to generate my config already, why would I use anything but the language my application is written in? Everything can export JSON after all.


Because your language might not have a nice type system. For example Python -> JSON is going to produce worse guarantees than DHALL.


Different requirements, different guarantees. Principle of least power. Have a look at https://docs.dhall-lang.org/discussions/Safety-guarantees.ht....


This makes no sense to me.

You have complex enough logic to warrant a language, you should use a real language. You'll have more support, less obscure issues, a solid standard library and whatever else you want, because it's a REAL language.

If the argument is "someone in my team uses recursion to write the YAML files, so I'll disallow it", then the issue is not with the language, it's with the team.

What I have found on my career is that many Ops people sell themselves short and hesitate to dive into learning and fully using an actual language. I've yet to understand why, but I've seen it multiple times.

They then end up using pseudo-languages in configuration files to avoid this small step towards using an actual language, and then complain about how awful those pseudo-languages are.



> You have complex enough logic to warrant a language, you should use a real language.

Not sure what you mean. Dhall is a real language:

    Dhall is not a Turing-complete programming language, 
    which is why Dhall’s type system can provide safety 
    guarantees on par with non-programmable configuration 
    file formats. Specifically, Dhall is a “total” 
    functional programming language, which means that:

    You can always type-check an expression in a finite 
    amount of time

    If an expression type-checks then evaluating that 
    expression always succeeds in a finite amount of time


Pulumi over Terraform.

CDK over Cloudformation.

Don't hand craft configuration files, these aren't new lessons. I remember being first introduced to Troposphere, which was pretty awesome.



Can someone help me understand what is the advantage of using jsonnet, cue, or something else vs a simple python script (or dialect, like starlark), when you have the need of dynamically creating some sort of config?

I've used jsonnet in the past to create k8s files, but I don't work in that space anymore. I don't remember it being better or easier than writing a python script that outputs JSON. Not even taking into account maintainability and such. Maybe I'm missing something?



To add to the sibling comments, after going from a jsonnet-based setup to a Typescript-based one (via pulumi), the biggest thing I missed from jsonnet was the native object merge operations which are very useful for this kind of work as it lets you say "I want one of these, but with these changes" even when the objects are highly nested, and you can specify whether to merge or override for each individual key.

But ultimately this was a minor issue and I think it's far more important that you use something like this (whether a DSL or a mainstream PL) and that you're not trying to do string templating of YAML.



They're various points along the Turing complete config generator vs declarative config spectrum. Declarative config is ideal in lots of ways for mission critical things, but hard to create lots of because of boiler plate.

A turing-complete general purpose language is entirely unconstrained in its ability to generate config, so it's difficult to understand all the possible configs it can generate. And it's difficult to write policy that forbids certain kinds of config to be generated by something like Python. And when you need to do an emergency-rollback, it can be hard to debug a Python script that generates your config.

Starlark is a little better because it's deliberately constrained not to be as powerful as Python.

Jsonnet is, IIUC, basically an open source version of the borgcfg tool they've had at Google forever. My recollection is that Borgcfg had the reputation of being an unreadable nightmare that nobody understood. In practice, of course, people did understand it but I don't think anyone loved working with it.

Brian Grant, creator of Kubernetes, wrote up his thoughts on various config approaches in this Google doc: https://docs.google.com/document/d/1cLPGweVEYrVqQvBLJg6sxV-T....



I definitely wouldn't use Python because it isn't sandboxed, and users will end up doing crazy things like network calls in your config.

Starlark is a good option though.

People will talk about Jsonnet not being Turing complete, but IMO that is completely irrelvant. Turing completeness has zero practical significance for configs.



There's 2 things on the horizon here for Kubernetes that give me hope. KCL, its own configuration language, and Timoni, which builds off CUE and corrects some of the shortcomings of Helm.

Though these days, OLM and the Quarkus operator SDK give you a completely viable alternative approach to Helm that enables you to express much more complex functionality and dependency relationships over the lifecycle of resources. An example would be doing a DB backup before upgrading to a new release etc. Obviously this power comes at a cost.



I'm no fan of YAML but for an example of templating YAML that is tolerable, take a look at esphome[1] (and I suppose also home assistant).

In your main yaml file you have something like this:

  packages:
    left_garage_door: !include
      file: garage-door.yaml
      vars:
        door_name: Left
        door_location: left
        open_switch_gpio: 25
        close_switch_gpio: 26
Then in garage-door.yaml you can reference the vars directly with ${door_name} syntax.

It's the best version of templating YAML that I have experienced.

1. https://esphome.io/guides/configuration-types#packages-as-te...



Yes, templating YAML is crazy. But is the answer jsonnet? That's even more batshit.

Why hasn't anyone opted for a "patch-based" approach? I.e. start with a base YAML/JSON file, apply a second file over it, apply this third one, and use the result as the config. How you generate these files is entirely up to you.



Yes. The answer is a "config.d" directory, this has been known to linux package managers for a long time. It is the only way for multiple packages to contribute to configuration without fighting over ownership of the one true config file.


I'm confused on two points:

(A) why not use the yaml syntax that is not whitespace sensitive. In the authors example, that could be: {name: Al, address: something}

(B) do env variables not go a long way to avoiding the need for a template? Instead of generating a complete YAML, put env variable placeholders in and set those values in the target environment. At this rate, the same YAML can generally be deployed anywhere. I've seen that style implementated several times, works pretty well.

I do agree that generating config itself, and not just interpolating values - is potentially really gnarly. I do wonder, instead of interpolating variables at deploy time, why not use env variables and do the interpolation at runtime?



Separate generated content from maintained content. Works for me. But on the specifics here, from a very python POV.

Strict YAML is easier to maintain than json if you have deeper than one or maybe two levels of nesting, multiline strings, or comments.

So, I build my config systems to _generate_ YAML instead of “templating YAML.”

PyYAML extensions and ruamel.yaml exist—Though kind of out of date, and more new projects are using TOML. (From project description: “ruamel.yaml is a YAML parser/emitter that supports roundtrip comment preservation”)

Confession: but yeah, not when I use ansible. Ansible double-dog-dares you to “jinja2 all the things” without much in the way of structured semantics.



If you're using Helm to deploy your own apps, I feel that's a code smell. I'll add jsonnet for your own apps to the that list.

Just use dumb YAML, maybe kustomize if you really need, but if that's not sufficient, consider that a sign that you're not carving the wood the way it's telling you to.

Any form of templating for creating your own application manifest is another moving part that allows for new and fun errors, and the further away your source manifest is from the deployed result, the harder it is to debug.

If you really want to append a certain set of annotations to each and every pod in a cluster, instead of using shared templates (and enforcing their usage), there's other approaches in K8s for these kinds of use cases, that you have a lot more control over.



For those who do not know it yet, the now classic noyaml site: https://noyaml.com/


This article made me think it'd be nice to generate k8s JSON using TypeScript. Just a node script that runs console.log(JSON.stringify(config)), and you pipe that to a yaml file in your deploy script. The syntax seems more sane and has more broad appeal than jsonnet, and I'd wager that the dev tooling would be better given good enough typings.

By the way the answer to the question "why are we templating yaml?" is: people are just more familiar with it and don't want to have to translate examples to jsonnet that they copy and paste from the web. Do not underestimate this downside :) Same downside would probably apply to TypeScript-generated configs I bet.



Others have mentioned CDK, but I want to say that this is almost the exact approach I took on a project recently and it worked out fine. Node script that validates a few arguments and generates k8s manifests as JSON to be fed into `kubectl apply`.

IME, here's no need to involve anything more complicated if your deployment can be described solely as k8s manifests.



I would recommend implementing a similar API to Grafana Tanka: https://tanka.dev

When you "synthesise", the returned value should be an array or an object.

1. If it's an object, check if it has an `apiVersion` and `kind` key. If it does, yield that as a kubernetes object and do not recurse. 2. If it's an array or any other object, repeat this algorithm for all array elements and object values.

This gives a lot of flexibility to users and other engineers because they can use any data structures they want inside their own libraries. TypeScript's type system improves the ergonomics, too.





You can convert YAML to JSON programmatically, and JSON is valid jsonnet, so you can pretty much copy paste examples from the web into your jsonnet if you find yourself wanting to do that


That's sort of what https://cdks.io does, except the final output is YAML for better readability.




> copy and paste from the web

Hot take, this is a terrible idea, and is why so much cloud infra is monstrously expensive (and bad).

People need to stop making infra easy. It’s not supposed to be easy, because when you make a bad decision, you don’t get to revert a commit and carry on with life. You don’t understand IOPS and now your gp2 disk is causing CPU starvation from IOWAIT? Guess you’re gonna learn some things about operating within constraints while waiting for a faster disk to arrive at the DC! Buckle up, it’ll be good for you.

I’m fully aware that I sound like a grouchy gatekeeper here, and I’m fine with it. People making stupid infra decisions en masse cause me no end of headaches in my day job, and I’m tired of it.



I can't remember how many times I heard or saw the argument "but that is in YAML", which implies that the configuration(or god forbid, the code) is simple and well designed. I find it hilarious.

And a worst contender is embedding text template like jinja in a YAML config and forcing everyone to use such abomination to change production config via deployment. Yes, I'm talking about Terraform or the like. Why people think this kind of design is acceptable is beyond my comprehension.



I need a restraining order against YAML DSLs.


The cycle seems: Invent a new static information format (XML/JSON/HTML/...), reduce verbosity, add GUI, variables, comments, expressions, control flow, validation, transformation, static typing, compilers, IDE support, dependency management, and maybe a non-backwards-compatible major version etc. And you end up with yet another Java/C# clone, just inferior because it was never meant to support all these things.


I'm designing a simple dev environment from scratch.

My solution for this is a sandboxed lua for programatic configuration:

https://github.com/civboot/civlua/tree/main/lib/luck

I can't stand JSON (for many reasons) so I created a serialization format that combines it and CSV for nested objects

https://github.com/civboot/civlua/tree/main/lib/tso

I wish the industry would standardize on a solution like this. IMO you shouldn't use a "real" language unless you can lock it down to be determinisitic. JSON is supposed to be human readable but fails for lots of real-world data like multi-line strings or lists of records.

CSV is more readable but doesn't supported nested objects.



I think YAML is a good pick for non-developers / content creators. The front matter section in Markdown files is a good example. Or is there a better, human-friendly alternative?


You just pinpointed my biggest peeve with YAML. It looks like it's "human friendly" because there are no scary curly braces. But you still need to get the syntax exactly right, so that benefit is very small. And now you have to keep your finger on the screen while scrolling in order to figure out what a bullet belongs to.


Then what alternative do you recommend for content creators? Do you use the alternative in Markdown front matter?


You should make what you do / don't do less of your identity. You're limiting yourself because you identify as "not the kind of person who does that".


Note that I am not a content creator myself. I build solutions for web teams and on those teams, some people focus solely on content and Markdown. I want to offer them an easy editing experience. So far YAML has been the easiest format for them.


TOML is pretty easy to grok and forgiving at the same time


I don't think they have a need for configuration files while filming their tiktoks.


What is the best term to use for the people who are writing content on the web team? The ones who write blog entries, documentation, and marketing pages. The ones who mainly touch Markdown files.


Learn JavaScript. Get the fuck out of your "content creator" pigeonhole. JavaScript is content.


I don't think anyone writes their blog entries with JavaScript here.


> But you still need to get the syntax exactly right

I think it's more that it's declarative that makes it simple. Also you just have to remember simpler rules compared to JSON.

E.g.

  - Apple
  - Orange
  - Strawberry
  - Mango
Is simpler than

  [
    "Apple",
    "Orange",
    "Strawberry",
    "Mango"
  ]
Don't forget to skip that last comma! But not all of the others!


>I think it's more that it's declarative that makes it simple

..it's no more or less declarative than other configuration languages?

And yes, I get that it looks simpler. I just think that it applies as long as your file can fit in about half a page. As it grows and becomes deeply nested, IMO, that simplicity disappears.



This is least of my worries - just use VScode with plugin which gives red lines on and formats yaml, use yamllint in your CI.


Basically you're saying YAML is unreadable without an IDE or a text editor with advanced highlighting functionality.


YAML is all but human-friendly. It has far too many special features and edge cases for most people. Something simple like Java properties files would solve something like markdown front matter perfectly fine.


Java properties files are a mess. They still require Windows encoding (ISO-8859-1), which is incompatible with UTF-8.


ISO-8859-1 is Latin-1, there's nothing specifically "Windows" about it.


You're correct.


You were probably thinking of Windows-1252, which is an extension of ISO-8859-1 that supports more European character sets.


Only if you're still using java 8.


That's a common misconception.

Look at the documentation [0] or at the OpenJDK code. Both assume ISO-8859-1, unless you're dealing with a special case where resource bundles are involved.

[0]: https://docs.oracle.com/en/java/javase/21/docs/api/java.base...



Why something so complex for front matter? Isn't it typically just a few key/value pairs?


I'd personally go with TOML over YAML for that


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact



Search:
联系我们 contact @ memedata.com