- > If sqlite had a generic "strictly ascending sequence of integers" type
Is that not what WITHOUT ROWID does? My understanding is that it's precisely meant to physically cluster data in the underlying B-Tree
If that is not what you meant, could you elaborate on the "primary key tables aren't really useful here" footnote?
- Actually for this kind of workload 15Gbps is still mediocre. What you actually want is the `n` variant of the instance types, which have higher NIC capacity.
In the c6n and m6n and maybe the upper-end 5th gens you can get 100Gbps NICs, and if you look at the 8th gen instances like the c8gn family, you can even get instances with 600Gbps of bandwidth.
- Honestly this benchmark feels completely dominated by the instance's NIC capacity.
They used a c5.4xlarge that has peak 10Gbps bandwidth, which at a constant 100% saturation would take in the ballpark of 9 minutes to load those 650GB from S3, making those 9 minutes your best case scenario for pulling the data (without even considering writing it back!)
Minute differences in how these query engines schedule IO would have drastic effects in the benchmark outcomes, and I doubt the query engine itself was constantly fed during this workload, especially when evaluating DuckDB and Polars.
The irony of workloads like this is that it might be cheaper to pay for a gigantic instance to run the query and finish it quicker, than to pay for a cheaper instance taking several times longer.
- It seems to talk about Rosetta 2 as a whole, which is what the containerization framework depends on to support running amd64 binaries inside Linux VMs (even though the kernel still needs to be arm)
Is there a separate part of Rosetta that is implemented for the VM stuff? I was under the impression Rosetta was some kind of XPC service that would translate executable pages for Hypervisor Framework as they were faulted in, did I just misunderstand how the thing works under the hood? Are there two Rosettas?
- They barely just released Containerization Framework[0] and the new container[1] tool, and they are already scheduling a kneecapping of this two years down the line.
Realistically, people are still going to be deploying on x64 platforms for a long time, and given that Apple's whole shtick was to serve "professionals", it's really a shame that they're dropping the ball on developers like this. Their new containerization stuff was the best workflow improvement for me in quite a while.
- Do not reference these kinds of docs whenever you need practical, actionable advice. They serve their purpose, but are for a completely different kind of audience.
For anyone perusing this thread, your first resource for this kind of security advice should probably be the OWASP cheatsheets which is a living set of documents that packages current practice into direct recommendations for implementers.
Here's what it says about tuning Argon2:
https://cheatsheetseries.owasp.org/cheatsheets/Password_Stor...
- Documenso[0] is a pretty cool alternative that is increasingly compliant with more and more e-signature standards
- There used to be a similarly names one called CozoDB[0] which was pretty awesome but it looks like its development significantly slowed down.
- Ah yes, pretending we can access infinite amounts of memory instantaneously or in a finite/bounded amount of time is the achilles heel of the Von Neumann abstract computer model, and is the point where it completely diverges from physical reality.
Acknowledging that memory access is not instantaneous immediately throws you into the realm of distributed systems though and something much closer to an actor model of computation. It's a pretty meaningful theoretical gap, more so than people realize.
- I was trying to do this in 2023! The hardest part about building a search engine is not the actual searching though, it is (like others here have pointed out), building your index and crawling the (extremely adversarial) internet, especially when you're running the thing from a single server in your own home without fancy rotating IPs.
I hope this guy succeeds and becomes another reference in the community like the marginalia dude. This makes me want to give my project another go...
- I also switched away from Tarsnap because I needed to restore my personal PDF collection of like 20GB once and my throughput was like 100Kb/s, maybe less. It has been a problem for at least a decade, with no fix in sight.
I'm carefully monitoring plakar in this space, wondering if anyone has experience with it and could share?
- This looks amazing, I've been shopping for an implementation of this I could play around with for a while now
They mention promising results on Apple Silicon GPUs and even cite the contributions from Vello, but I don't see a Metal implementation in there and the benchmark only shows results from an RTX 2080. Is it safe to assume that they're referring to the WGPU version when talking about M-series chips?
- For those looking to break free and are considering self-hosting, I can strongly recommend Stalwart. I'm surprised how almost no one seems to have heard of it, but it's amazing (and supports JMAP!)
- Feels like the complete opposite s-expressions which are the easiest possible thing to parse, this sounds like a complete nightmare to write a parser for.
It might even be easier to treat the input string as a 2D grid than as a sequence and have a parsing head that behaves like a 2x2 convolutional kernel...
This would make for either a great Advent of Code, or a nightmare interview question, I love it.
- Upvoted for using Redbean. I've been using it recently and it has been absolutely amazing, the built-in functionality that it has exposed through the Lua interface makes it an extensively programmable proxy that you can sandbox the crap out of if you're familiar with the unixy bits of it
- Computation model behind this vaguely reminds me of the Epic Games programming model thing for Verse Calculus: https://simon.peytonjones.org/verse-calculus/
- Sure, but is EOL really a defense given the absolutely pathetic security posture that created this exploit in the first place? Is there a statute of limitations on mind boggling levels of incompetence?
I'd usually give the EOL argument some credit, but this exploit is not an accident, someone deliberately wrote an unauthenticated remote command execution as a feature, and it made it to production, and no one in this long chain of failures thought to themselves "gee, maybe we shouldn't do this"
- > Then a couple of weeks ago, added [direct] links to the Wayback Machine
Hopefully they are also making substantial donations to the Internet Archive, since they will be directing a lot of traffic into it and basically using their infrastructure as a feature on their main product...
EDIT:
Apparently they are collaborating but there are not much details [0]
[0] https://blog.archive.org/2024/09/11/new-feature-alert-access...
- Since you're using Rust, the Cranelift JIT compiler implements something like this[0] to construct an e-graph for its expression rewriting subsystem called ISLE, however if I'm not mistaken it is for disjoint sets (not intervals), and therefore it does not deal with ordering.
Maybe you can adapt it for your use case and add those new constraints in?
Keep in mind though that this was not written to be in the hot-path itself, you could probably do significantly better by pouring your soul into the SIMD rabbit hole (though SIMD in Rust is usually very annoying to write)
Best of luck, hope this helps!
[0] https://github.com/bytecodealliance/wasmtime/blob/7dcb9bd6ea...
It's a bit harder to adapt the technique to parsers because the Thompson NFA always increments the sequence pointer by the same amount, while a parser's production usually has a variable size, making it harder to run several parsing heads in lockstep.
[0] https://swtch.com/~rsc/regexp/regexp2.html