[ my public key: https://keybase.io/dacox; my proof: https://keybase.io/dacox/sigs/KaGMMaVyowMM1tCFTS6uwLYZXy5yuW4AntMpDgHOeLs ]
- dacoxThe iPhones autocorrect is one of my biggest frustrations coming from Android a few years ago. The biggest frustration for me is the tendency to correct the _second to last word_. I have never gotten used to this. I know i can stop it by "clicking" on the word instead of hitting space - but that feels slow and bad.
- awesome! i was just looking at a bucket full of parquet files from last year trying to recall some things about them.
i tried to install with brew, but it told me my cli tools were "too out of date". Never seen that before! and also just upgraded.
Will try again tomorrow
- thanks! glad to know its getting fixed. if i have any notes ill send them your way!
- Cool app!
I think I found a broken problem (or it’s worded strangely?) and I’m unable to progress beyond it
“Pick 2 that beat villain on board”
“QQ552”
In submitting Queen and 2 to make a full house but it just says
“Incorrect. Two pair on board. Win with a full house or ace kicker”
- There is a podcast, Memory Hole, about the recovered memory movement. There is a lengthy section about this book and the people behind it (not positive)
- Plugging ourselves, Lumen5 based out of Vancouver. I was one of the first hires found trough the HN hiring thread many moons ago. https://lumen5.com/
- This is happening everywhere where I live (Vancouver)
- I think it should fairly clear why I'm curious, as the article mentions
> Although it’s typically not viewed as a partisan board, the Biden administration had installed the entire committee.
After some degree of googling the history of ACIP I had not found any explanation and thought maybe someone here(who is actually American and maybe follows this kind of thing more closely?) would just know
Looks like there are actually some comments now that are more clarifying.
> Are you just asking questions to smokescreen for this executive power grab?
I’m just trying to understand the background. I get that this is a sensitive topic, but I’d ask that we keep things civil and give people the benefit of the doubt when they’re asking honest questions.
- yeah, I see that.
Apparently ACIP is very much not new. I am curious to the specifics of the prevous mass appointment, however.
- Was this a new committee? there is a quote about this being a coup, but it is also noted that the previous administration selected the entire existing committee
- It's an open secret that most nutrition research is of extremely low quality - almost all relying on decades old self reported nutritional questionnaires.
Sometimes dozens of these studies get wrapped up and analyzed together, and we headlines that THING IS BAD with a hazard ratio of like 1.05 (we figured out smoking was bad with a hazard ratio that was like 3! - you need a really good signal when you are analyzing such low quality data)
- ...k
- They definitely have to be trained to reinforce longer outputs, but I do not believe this adequately explains the low-ish generation limits.
We are starting to see models with longer and longer generation limits (gpt-4o-mini having 16k, the o1 models going up to 64k), as well as longer and longer context limits (often 128k, google offering a million).
I find it very unlikely they are actually training with inputs or outputs near these maximums.
If you want to convince yourself, do the attention calculation math for these sequence lengths.
You can also see how openai restricts the sequence length for fine tuning to 64k - almost certainly bound by available GPU sizes
I suspect the 4096 limits have been set as a "reasonable" limit for a myriad of reasons.
- Chunked prefill or some similar technique is also necessary for serving long context requests where there is not enough GPU memory available, regardless of concerns about latency.
For example, consider a prompt sent to Llama 3.1 405B that uses 128k input tokens.
The KV cache will be 123GB. No matter how many GPUs you shard the model across, you are not fitting that KV cache in GPU memory (a H100 has 80GB)
- When doing inference for an LLM, there are two stages.
The first phase is referred to as "prefill", where the input is processed to create the KV Cache.
After that phase, the "decode" phase is called auto-regressively. Each decode yields one new token.
This post on [Inference Memory Requirements](https://huggingface.co/blog/llama31#inference-memory-require...) is quite good.
These two phases have pretty different performance characteristics - prefill can really maximize GPU memory. For long contexts, its can be nigh impossible to do it all in a single pass - frameworks like vLLM use a technique called "chunked prefill".
The decode phase is compute intensive, but tends not to maximize GPU memory.
If you are serving these models, you really want to be able to have larger batch sizes during inference, which can only really come with scale - for a smaller app, you won't want to make the user wait that long.
So, long contexts only have to be processed _once_ per inference, which is basically a scheduling problem.
But the number of decode passes scales linearly with the output length. If it was unlimited, you could get some requests just _always_ present in an inference batch, reducing throughput for everyone.
- In my experience, mostly using LoRA finetunes with OpenAI, it is important to include a/the prompt in your fine tuning dataset. It is _also_ important to include that prompt at inference time. I have not seen any evidence of being able to save on input tokens by using a LoRA, at least not with the chat models.
This was a bit less true of older models like davinci - you really could basically just send them data (not sure how they were fine tuned under the hood). However, these models were less powerful in general.
For `gpt-3.5-turbo`, which definitely uses LoRA or something very similar, I have not witnesses this behaviour.
Another way to look at this is that when a LoRA is added to a sufficiently large foundation model like GPT-3, it doesn't really lose its "GPT-ness"
- which essay is that? having trouble finding it
- Wow, I have exactly the same side project in progress, minus the fine tuning part. We even chose the same names and phrasing for parts of the project.
- Engineer is a regulated term and profession in Canada, with professional designations like the P.eng - they get really mad when people the term engineer more loosely, as is common in the tech industry.
Because of this, there are "B.Seng" programs at some Canadian universities, as well as the standard "B.Sc" computer science program.
The degree was very new when I attended uni, so went for Comp sci intead as it seemed more "real". The B.Seng kids seemed to focus a lot more on industry things (classes on object oriented programming), which everyone picked up when doing internships anyways. They also had virtually no room for electives, whereas the CS calendar was stacked with very interesting electives which imo were vastly more useful in my career.
In practice, no one gives a hoot which degree you have, and we tend to just use the term SWeng regardless.
It honestly kinda feels like a bunch of crotchety old civil engineers trying to regulate an industry they're not a part of. I have _never_ seen a job require this degree.
- The unspoken thing here is that PLC code often(usually?) isn't exactly written in text, or in a format readable by anything other than the PLC programming software.
After a year long foray into the world of PLC, I felt like I was programming in the dark ages.
I'm assuming its a bit better at very big plants/operations, but still.