Nothing I say should be construed as legal advice. I encourage you to seek licensed counsel in your jurisdiction if you need legal advice.
- otterleyAt least in New York City and outside the U.S., I regularly see pizzas being delivered frequently by bicycle, moped, and motorcycle. I also see deliveries being done with small trucks (Kei style) and vans that fit in alleyways.
- Which jurisdiction was this? The behavior of the prosecutor is atrocious.
- > those ingested logs are not logs for you, they are customer payload which are business criticial
Why does that make any difference? Keep in mind that at large enough organizations, even though the company is the same, there will often be an internal observability service team (frequently, but not always, as part of a larger platform team). At a highly-functioning org, this team is run very much like an external service provider.
> I've yet to see a Logging as a Service provider not have outages where data was lost or severely delayed.
You should take a look at CloudWatch Logs. I'm unaware of any time in its 17-year history that it has successfully ingested logs and subsequently lost or corrupted them. (Disclaimer: I work for AWS.) Also, I didn't say anything about delays, which we often accept as a tradeoff for durability.
> And for the capacity side, again the design question is "what happens when incoming events exceed our current capacity - all the collectors/relays balloon their memory and become much much slower, effectively unresponsive, or immediately close the incoming sockets, lower downstream timeouts, and so on."
This is one of the many reasons why buffering outgoing logs in memory is an anti-pattern, as I noted earlier in this thread. There should always -- always -- be some sort of non-volatile storage buffer in between a sender and remote receiver. It’s not just about resilience against backpressure; it also means you won’t lose logs if your application or machine crashes. Disk is cheap. Use it.
- By all means, make a proposal to the FIDO Alliance that solves all these problems without introducing new ones. You're going to have to anticipate all the objections and be prepared to answer them, though.
- It would, yes. Thus it’s an action I would only recommend if doing so would help prevent serious injury to their customers. If it comes down to a choice between locking a customer out and having, say, their money stolen, then the scale tips towards safety.
- You don’t use AI to perform formal verification. You give the agent access to verification tools whose output can then be fed back to the model.
It’s the same design as giving LLMs the current time, since they can’t tell time themselves, either.
- So if a client ever goes rogue someday (either intentionally or has been compromised) and starts shipping off private material to a malicious third party, you think relying parties shouldn’t have the option not to trust that client anymore?
- If you trust that the cryptography employed in passkeys is effectively unbreakable, then it follows that for all intents and purposes, passkeys cannot be phished. It’s the same thing as trusting that your browsing sessions cannot be MITMed because the end to end encryption is sufficiently strong.
- (2024) - previously discussed at https://www.hackerneue.com/item?id=40310896
- > have you never been asked to ensure data cannot be lost in the event of a catastrophe? Do you agree that this requires synchronous external replication?
I have been asked this, yes. But when I tell them what the cost would be to implement synchronous replication in terms of resources, performance, and availability, they usually change their minds and decide not to go that route.
- > It matters to me, because I don’t want to be dependent on a sync ack between two fault domains for 99.999% of my logs. I only care about this when the regulator says I must.
If you want synchronous replication across fault domains for a specific subset of logs, that’s your choice. My point is that treating them this way doesn’t make them not logs. They’re still logs.
I feel like we’re largely in violent agreement, other than whether you actually need to do this. I suspect you’re overengineering to meet an overly stringent interpretation of a requirement. Which regimes, specifically, dictated that you must have synchronous replication across fault domains, and for which set of data? As an attorney as well as a reliability engineer, I would love to see the details. As far as I know, no one - no one - has ever been held to account by a regulator for losing covered data due to a catastrophe outside their control, as long as they took reasonable measures to maintain compliance. RPO=0, in my experience, has never been a requirement with strict liability regardless of disaster scenario.
- > If you want durability, a single physical machine is never enough.
It absolutely can be. Perhaps you are unfamiliar with modern cloud block storage, or RAID backed by NVRAM? Both have durability far above and beyond a single physical disk. On AWS, for example, ec2 Block Express offers 99.999% durability. Alternatively, you can, of course, build your own RAID 1 volumes atop ordinary gp3 volumes if you like to design for similar loss probabilities.
Again, auditors do not care -- a fact you admitted yourself! They care about whether you took reasonable steps to ensure correctness and availability when needed. That is all.
> when regs say “thou shalt not lose thy data”, I move the other way. Which is why the streams are separate. It does impose an architectural design constraint because audit can’t be treated as a subset of logs.
There's no conflict between treating audit logs as logs -- which they are -- with having separate delivery streams and treatment for different retention and durability policies. Regardless of how you manage them, it doesn't change their fundamental nature. Don't confuse the nature of logs with the level of durability you want to achieve with them. They're orthogonal matters.
- If your hardware and software can’t guarantee that writes are committed when they say they are, all bets are off. I am assuming a scenario in which your hardware and/or cloud provider doesn’t lie to you.
In the world you describe, you don’t have any durability when the network is impaired. As a purchaser I would not accept such an outcome.
- > In practice, this means that durability of that audit record “a thing happened” cannot be simply “I persisted it to disk on one machine”
As long as the record can be located when it is sought, it does not matter how many copies there are. The regulator will not ask so long as your system is a reasonable one.
Consider that technologies like RAID did not exist once upon a time, and backup copies were latent and expensive. Yet we still considered the storage (which was often just a hard copy on paper) to be sufficient to meet the applicable regulations. If a fire then happened and burned the place down, and all the records were lost, the business would not be sanctioned so long as they took reasonable precautions.
Here, I’m not suggesting that “the record is on a single disk, that ought to be enough.” I am assuming that in the ordinary course of business, there is a working path to getting additional redundant copies made, but those additional copies are temporarily delayed due to overload. No reasonable regulator is going to tell you this is unacceptable.
> Depending on the stringency, RPO needs to be zero for audit
And it is! The record is either in local storage or in central storage.
- Take Linux kernel audit logs as an example. So long as they can be persisted to local storage successfully, they are considered durable. That’s been the case since the audit subsystem was first created. In fact, you can configure the kernel to panic as soon as records can no longer be recorded.
Regulators have never dictated where auditable logs must live. Their requirement is that the records in scope are accurate (which implies tamper proof) and that they are accessible. Provided those requirements are met, where the records can be found is irrelevant. It thus follows that if all logs over the union of centralized storage and endpoint storage meet the above requirements then it will satisfy the regulator.
- It depends. Some cases like auditing require full fidelity. Others don’t.
Plus, if you’re offering a logging service to a customer, the customer’s expectation is that once successfully ingested, your service doesn’t drop logs. If you’re violating that expectation, this needs to be clearly communicated to and assented by the customer.
- It depends. Some cases like auditing require full fidelity. Others don’t. Plus, if you’re offering a logging service to a customer, the customer’s expectation is that once successfully ingested, your service doesn’t drop logs. If you’re violating that expectation, this needs to be clearly communicated to and assented by the customer.
The right way to think about logs, IMO, is less like diagnostic information and more like business records. If you change the framing of the problem, you might solve it in different way.
- Of course not. Worst case should be backpressure, which means processing, indexing, and storage delays. Your service might be fine but your visibility will be reduced.