https://21337.tech
- Indeed. I can't fault people for wanting to give their careers a boost in these increasingly trying times. As someone who stepped into analytics just in time to catch the wave (10 years ago), I can understand why someone would want to hop aboard.
That said, I at least took the time to learn the maths.
- I think we're talking about the same thing. I should be clear that I don't think the selected token probabilities being reported are enough, but if you're reporting each returned tokens probability (both selected and discarded) and aggregating the cumulative probabilities of the given context, it should be possible to see when you're trending centrally towards uncertainty.
- The statistical certainty is indeed present in the model. Each token comes with a probablility; if your softmax results approach a uniform distribution (i.e. all selected tokens at the given temp have near equal probabilities), then the next most likely token is very uncertain. Reporting the probabilities of the returned tokens can help the user understand how likely hallucinations are. However, that information is deliberately obfuscated now, to prevent distillation techniques.
- Agreed. All these attempts to benchmark LLM performance based on the interpreted validity of the outputs are completely misguided. It may be the semantics of "context" causing people to anthropomorphize the models (besides the lifelike outputs.) Establishing context for humans is the process of holding external stimuli against an internal model of reality. Context for an LLM is literally just "the last n tokens". In that case, the performance would be how valid the most probablistic token was with the prior n tokens being present, which really has nothing to do with the perceived correctness of the output.
- Amodei's work history indicates that his background as a software developer is a single part-time job that he held for a year-and-a-half after college. As far as I'm concerned, he wouldn't even make it as a junior on my team. I'm not inclined to believe anything he says about what it takes to write production-ready code.
- I think Graphene gets posted here yearly. Having tested a variety of ROMs dedicated to different elements of security, I can attest that Graphene allows the most "normal" phone usage compared to many others. The biggest factor is the sandboxed Google Play Services, which allow you to use a lot of apps that you wouldn't be able to otherwise.
I've used Lineage without MicroG, as a comparison, and that's becoming more-and-more unusable every day some lousy Android developer tethers their company's app to some feature exclusive to Play Services.
- I'm a native English speaker who asks myself the same questions on most emails. You can use LLM outputs all you want, but if you're worried about the tone, LLM edits drive the tone to a level of generic that ranges from milquetoast, to patronizing, to outright condescending. I expect some will even begin to favor pushy emails, because at least it feels human.
- No, but at that point, why even leverage a stochastic text generator? Placing hard constraints on a generative algorithm is just regular programming with more steps and greater instability.
Edit: Also, one could just look to the world of decision tree and route-finding algorithms that could probably do this task better than a language model.
- I don't know, I think some improved hardware would greatly improve the aesthetics of the Lost Woods, which severely drops in frame rate when docked. Handheld, the diminished fidelity at 720p buys back some frames.
I'd be inclined to agree about some older Zelda games though, namely Wind Waker. I replayed it on GCN recently, and can attest that HD Wii U version really didn't add anything to the aesthetics.
- When there's millions of doctors, not only are there going to be more mediocre doctors than anything, but there has to be a bottom of the barrel as well.
It took me years to be diagnosed with PTSD, a problem I knew I had. Because I am not a vet, I had to go through every other diagnosis first -- schizo, bipolar, borderline -- each with a new set of pills to take. Some of the shrinks who diagnosed me wouldn't do anything but open my file, make some remarks, and fill out a prescription, with nary any eye contact.
Finally got a very expensive doctor who wasn't under the thumb of insurance companies. Her first question, upon hearing my issues, was "how is your sleep?" "I don't, really" was my reply. Screened me for PTSD and I clocked 76/80 pts. She set me up with the proper therapy, and within a year, I was screening at 30/80 pts. All it took was asking me one question that wasn't loaded towards the doctors favorite diagnosis & prescription.
- An LLM salesman assuring us that $1000/mo is a reasonable cost for LLMs feels a bit like a conflict of interests, especially when the article doesn't go into much detail about the code quality. If anything, their assertion that one should stick to boring tech and "have empathy for the model" just reaffirms that anybody doing anything remotely innovative or cutting-edge shouldn't bother too much with coding agents.
- I have a background in NLP (pre-LLM) and like to study extremist rhetoric, and, while I don't think you're being reductionist, it's a little more removed than that. I'd replace with "hate" with "problems and stress". Once you can attribute that stress to a group... that's when the hate develops. There are certain global powers who have recognized this and weaponized it. Agreeing with the most extreme of both sides, loudly, is the modern standard for propaganda.
- To expand on the other comment, if you look under the data folder in nanoGPT, you can see examples of how to train the model using various data sources and encoders. "shakespeare_char" is probably the most rudimentary, only converting the characters of the input into integers.
e.g. https://github.com/karpathy/nanoGPT/blob/master/data/shakesp...
https://en.wikipedia.org/wiki/1953_Iranian_coup_d%27%C3%A9ta...