http://symbolflux.com/projects
https://twitter.com/Westoncb
westoncb@gmail.com
Looking for work and or collaborators!
- It sounds more like you just made an overly simplistic interpretation of their statement, "everything works like I think it should," since it's clear from their post that they recognize the difference between some basic level of "working" and a well-engineered system.
Hopefully you aren't discouraged by this, observationist, pretty clear hansmayer is just taking potshots. Your first paragraph could very well have been written by a professional SWE who understood what level of robustness was required given the constraints of the specific scenario in which the software was being developed.
- I've been on a break from coding for about a month but was last working on a new kind of "uncertainty reducing" hierarchical agent management system. I have a writeup of the project here: https://symbolflux.com/working-group-foundations.html
- Ah interesting, I missed that possibility. Digging a little more though my understanding is that what's universal is a shared basis in weight space, and particular models of same architecture can express their specific weights via coefficients in a lower-dimensional subspace using that universal basis (so we get weight compression, simplified param search). But it also sounds like to what extent there will be gains during inference is in the air?
Key point being: the parameters might be picked off a lower dimensional manifold (in weight space), but this doesn't imply that lower-rank activation space operators will be found. So translation to inference-time isn't clear.
- > So, they found an underlying commonality among the post-training structures in 50 LLaMA3-8B models, 177 GPT-2 models, and 8 Flan-T5 models; and, they demonstrated that the commonality could in every case be substituted for those in the original models with no loss of function; and noted that they seem to be the first to discover this.
Could someone clarify what this means in practice? If there is a 'commonality' why would substituting it do anything? Like if there's some subset of weights X found in all these models, how would substituting X with X be useful?
I see how this could be useful in principle (and obviously it's very interesting), but not clear on how it works in practice. Could you e.g. train new models with that weight subset initialized to this universal set? And how 'universal' is it? Just for like like models of certain sizes and architectures, or in some way more durable than that?
- I think the idea is like: it took extra work 'cause Rust makes you be so explicit about allocations and types, but it's also probably faster/more reliable because that work was done.
Of course at the end of the day it's just marketing and doesn't necessarily mean anything. In my experience the average piece of Rust software does seem to be of higher quality though..
- Doing math is not the same as calculating. LLMs can be very useful in doing math; for calculating they are the wrong tool (and even there they can be very useful, but you ask them to use calculating tools, not to do the calculations themselves—both Claude and ChatGPT are set up to do this).
If you're curious, check out how mathematicians like Robert Ghrist or Terence Tao are using LLMs for math research, both have written about it online repeatedly (along with an increasing number of other researchers).
Apart from assisting with research, their ability on e.g. math olympiad problems is periodically measured and objectively rapidly improving, so this isn't just a matter of opinion.
- lol no problem. In reality though there's kind of a funny story behind it because I suspect the way I ended up using them so much is similar to how ChatGPT did. When I got into writing I studied grammar, then decided to read a bunch of classics and analyze their usage of punctuation in general until I had a good understanding of every bit of it. Then, in order to practice, I'd apply what I learned to anything I was writing at the time whether journal notes, conversations on AIM/IRC etc. That latter step meant I was translating a lot of casual/natural speech into a form that also had a high level of 'correctness'. And if you faithfully translate natural speech into 'correct'ly punctuated sentences, you end up using a lot of em dashes. Because ChatGPT/LLMs are tuned for natural/authentic style, as well as for a high degree of 'correctness,' you get today's state of affairs. Just a theory.
- I actually tweeted like a month ago that I was the reason LLMs use em dashes so much lol: https://x.com/Westoncb/status/1961802304698671407
- The difference is the incentive to improve, and actual present rate of improvement, for models like this is far higher than it is for jetpacks. (That and certain intrinsic features at least suggest the route to improvement is roughly "more of the same," vs "needs massive unknown breakthrough".)
- Yep, it's surreal.
- lol yep, fully get that. And I mean I'm sure o4 will be great but the '-mini' variant is weaker. Some of it will come down to taste and what kind of thing you're working on too but personal preferences aside, from the heavy LLM users I talk to o3 and gemini 2.5 pro at the moment seem to be top if you're dialoging with them directly (vs using through an agent system).
- I've seen that specific kind of role-playing glitch here and there with the o[X] models from openai. The models do kinda seem to just think of themselves as being developers with their own machines.. I think it usually just doesn't come up but can easily be tilted into it.
- Gotcha. Yeah, give o3 a try. If you don't want to get a sub, you can use it over the api for pennies. They do have you do this biometric registration thing that's kind of annoying if you want to use over api though.
You can get the Google pro subscription (forget what they call it) that's ordinarily $20/mo for free right now (1 month free; can cancel whenever), which gives unlimited Gemini 2.5 Pro access.
- There is a skill to it. You can get lucky as a beginner but if you want consistent success you gotta learn the ropes (strengths, weaknesses, failure modes etc).
A quick way of getting seriously improved results though: if you are literally using GPT-4 as you mention—that is an ancient model! Parent comment says GPT-4.1 (yes openai is unimaginably horrible at naming but that ".1" isn't a minor version increment). And even though 4.1 is far better, I would never use it for real work. Use the strongest models; if you want to stick with openai use o3 (it's now super cheapt too). Gemini 2.5 Pro is roughly equivalent to o3 for another option. IMO Claude models are stronger in agentic setting, but won't match o3 or gemini 2.5 pro for deep problem solving or nice, "thought out" code.
- This sounds like a cool project and I may sign up. I think the idea of seeding collaborations with side projects vs something necessarily serious right out the gate is pretty solid/clever.
Edit: I do agree with others though that we should be able to see projects/profiles before registering.
- Location: Tucson, AZ (US) Remote: Yes
Willing to relocate: Possibly!
Technologies: LLMs, Typescript, node, React, three.js, dabbled with Elixir and Rust, lots of Java a long time ago and a bit of Objective C/Swift
Portfolio: https://symbolflux.com
Github: https://github.com/westoncb
LinkedIn: https://www.linkedin.com/in/weston-beecroft-b4a98054
Email: westoncb@gmail.com
I'm an experienced generalist software engineer typically working with early-stage startups or with clients on relatively greenfield projects.
I have strong technical foundations, good product sense, and have gone deep with AI/LLM tech both in the sense of being able to use it effectively, and in exploring the design space for products/tools leveraging it.
I've done a lot of work in developer tools and data visualization of various kinds. Data-rich, non-traditional UIs with highly optimized UX, and rapid prototyping are my forte.
I've been basically on sabbatical for a while now, mostly taking the time to learn and build with AI. At some point in early 2024 I decided: okay, time to take this seriously, and have. After the long break, I'm ready to start something new!
- SEEKING WORK | USA | Remote is fine (I'm also considering relocation to Chicago or NYC, maybe SF)
I'm a jack-of-all-trades software engineer who's done extensive "founding engineer" work in the context of both VC-backed startups and for smaller contract gigs over a period of about 10 years.
I have strong technical foundations, good product sense, and have gone deep with AI/LLM tech both in the sense of being able to use it effectively, and in exploring the design space for products/tools leveraging it.
Tools I use most frequently these days: Typescript, node, React, pnpm+vite. I've also done extensive work with three.js in the past. I used to write a lot of Java and a bit of Objective C, have dabbled in Rust and Elixir.
I've done a lot of work in developer tools and data visualization of various kinds. Data-rich, non-traditional UIs with highly optimized UX, and rapid prototyping are my forte.
Email: westoncb@gmail.com
Portfolio: https://symbolflux.com
Github: https://github.com/westoncb
LinkedIn: https://www.linkedin.com/in/weston-beecroft-b4a98054
- > and must say that I'm a bit confused by this presentation, and it's a bit unclear to me what it adds.
I think the disconnect might come from the fact that Karpathy is speaking as someone who's day-to-day computing work has already been radically transformed by this technology (and he interacts with a ton of other people for whom this is the case), so he's not trying to sell the possibility of it: that would be like trying to sell the possibility of an airplane for someone who's already just cruising around in one every day. Instead the mode of the presentation is more: well, here we are at the dawn of a new era of computing, it really happened. Now how can we relate this to the history of computing to anticipate where we're headed next?
> ...but sometimes an LLM becomes the operating system, sometimes it's the CPU, sometimes it's the mainframe from the 60s with time-sharing, a big fab complex, or even outright electricity itself?
He uses these analogies in clear and distinct ways to characterize separate facets of the technology. If you were unclear on the meanings of the separate analogies it seems like the talk may offer some value for you after all but you may be missing some prerequisites.
> This demo app was in a presentable state for a demo after a day, and it took him a week to implement Googles OAuth2 stuff. Is that somehow exciting? What was that?
The point here was that he'd built the core of the app within a day without knowing the Swift language or ios app dev ecosystem by leveraging LLMs, but that part of the process remains old-fashioned and blocks people from leveraging LLMs as they can when writing code—and he goes on to show concretely how this could be improved.
- I think the community here would find the methodology behind building this of interest. How all-encompassing "vibe coding" has come to be for "developing software with LLMs" is unfortunate imo. I have experimented with vibe-coding too, but it's a very different thing from the process used here. This tweet from Andrej Karpathy is the best description of how I approach development with LLMs: https://x.com/karpathy/status/1915581920022585597
Project is cool overall, love the xkcd-like comic idea—but prompting and/or model-selection could use some work. I'd like to take a crack at tuning it myself :)