Preferences

gavinray parent
"Better to have it and not need it; than to need it, and not have it..."

jodrellblank
“You can’t have everything. Where would you put it?” - Steven Wright.

“Better to have hoarding disorder than to need a fifty year old carrier bag full of rotting bus tickets and not have one” really should need more justification than a quote about how convenient it is to have what you need. The reason caches exist as a thing is so you can have what you probably need handy because you can’t have everything handy and have to choose. The amount of things you might possibly want or need one day - including unforeseen needs - is unbounded, and refusing to make a decision is not good engineering, it’s a cop-out.

Apart from cost, the more time and money you spend indexing, cataloging, searching it. How many companies are going to run an internal Google-2002 sized infrastructure just to search their old hoarded data?

hnlmorg
This is a really easy problem to solve.

Step one: add log severity to your log messages (pretty much every log library supports this out of the box).

Step two: add a log archive (you should have this anyway so that logs can be retained past the initial retention period of your log querying tools. Eg you might have a compliance requirement to keep logs for two years but you obviously wouldn’t want anything that old stored in your expensive fast log search)

Step three: create a way to ingest your archived logs (again, something your business should have, otherwise what’s the bloody point in having an archive)

Step four: have a rule that pushes logs of high severity straight into your log ingestion pipeline, and logs of lower severity into your archive.

Step four seems to be the piece that most people are oblivious too. But it’s generally really easy to implement. Particularly so if you’re using a reputable observability platform.

People who think “log everything” means “log PII” or “stick everything in the same log ingestion pipeline” are simply doing logging wrong. I’m not normally one to say “you’re doing it wrong” but when it comes to logging, these tools are long since mature now. The problem isn’t the tooling, it’s people’s awareness of it.

gavinray OP
I'm not sure what poor engineering practices you have seen, but in my painfully-gotten experience, application of this principle usually amounts to having varying levels of a debug log flag that dump this info either to stdout via JSONL that's piped somewhere, or as attributes in OTEL spans.

This has never been a source of significant issues for me.

lelanthran
> "Better to have it and not need it; than to need it, and not have it..."

Having it is pointless if your SNR is so low that it costs more money than simply waiting for the bug the next time it comes up.

IMO, if a bug never surfaces again, that's not a bug I care about anyway. Keeping all generated data in case someone wants to see the record from a bug 3 months ago is absolutely pointless - if it hasn't surfaced again in the last three weeks, you absolutely have more high-priority things to look at!

I want to see this mythical company, where a paid employee is dedicated by the company to look at a log from 3 months ago, to solve a bug that hasn't resurfaced in that three month period!

Until you’re working with personal information of EU customers, where the opposite maxime applies: "Only store what you absolutely need"

Seriously, storing petabytes of logs is a guarantee for someone on your team writing sensitive data to logs, and/or violate regulations.

jkogara
Or more succinctly, albeit less eloquently: "Better to be looking at it than looking for it."

This item has no comments currently.