Preferences

ijk
Joined 1,312 karma

  1. I'm no expert, but having read some archeological papers that do make conclusions like that, the evidence is often quite compelling and well-supported. The context we find something in can convey a lot of data, and conclusions that aren't supported by the evidence are frequently argued against by other archeologists. Granted, if you only read the university press releases or the popular summaries thereof it can be somewhat misleading, but that's more down to the journalism than the research.
  2. Is there a changing taste hypothesis? It's honestly the first time I've heard that suggested as the explanation, versus the more plausible to me idea of reconstruction from incomplete evidence.
  3. Eh, that's overstating the case. There's clearly some aesthetics that are more appealing to more people but for many architectural movements in particular the reason that they look that way is for the way that specific ideological reasons interacted with material constraints and the intended message. Brutalism in particular was intended to be cheap and honest; given the constraints many of these buildings were designed under, it makes sense. There are some quite appealing brutalist buildings; a core tenet of the style was integrating the buildings into the natural landscape, in contrast to the artificial styles that had previously been popular. The post-war shortages limited the available materials, shaping the constraints they were operating under. Raw concrete was honest, cheap, and was allowed to weather naturally.

    There's a lot of ugly brutalist buildings, but there's a lot of ugly buildings in every style. At lot of them look cheap because they were supposed to be cheap; to a certain extent looking inexpensive was intended. In some cases the hostile nature of the institutional building was part of the point, conveying strength unstead of offering a pleasant experience, but there's also some quite pleasant brutalist buildings that have a lot of nature integrated into the design.

  4. Interestingly to me, generative AI is often used to get results that commit the opposite error compared with these statues: they are, essentially, too confident in their choice of details. For any random topic, the average member of the public is likely to believe the AI's results are more accurate than can be backed up by the evidence.
  5. I was hoping that this would be about Llama 1 and comparison with GPT-contaminated models.
  6. Unfortunately, I am also worried that is the case.

    There was an era where there were a lot of completely free sites, because they were mostly academic or passion projects, both of which are subsidized by other means.

    Then there were ads. Banner adds, Google's less obtrusive text ads, etc. There were a number of sites completely supported by ads. Including a lot of blogs.

    And forums. Google+ managed to kill a lot of niche communities by offering them a much easier way to create a community and then killing it off.

    Now forums have been replaced by Discord and Reddit. Deep project sites still exist but are rarer. Social media has consolidated. Most people don't have personal home pages. There's a bunch of stuff that's paywalled behind Patreon.

    And all of that has been happening before anyone threw AI into the mix.

  7. Buying a book scanner and frequenting used book stores seems like a past time to start that'll pay off in the long term.
  8. There is an awful lot of "looking for my keys under the street light" going around these days. I've seen a bunch of projects proposed that are either based on existing data (but have no useful application of that data) or have a specific application (but lack the data and evaluation required to perform that task). It doesn't matter how good your data is if no one has any use for things like it, and it doesn't matter how neat your application would be if the data doesn't match.

    I'm including things like RL metrics as data here, for lack of a better umbrella term, though the number of proposed projects that I've seen that decided that ongoing evaluation of actual effectiveness was a distraction from the more important task of having expensive engineers make expensive servers into expensive heatsinks is maddening.

  9. Not rainforest, but rather savanna [1].

    The Arabian desert is technically considered to be part of the Sahara, climate-wise, and participes in the same cycle [2].

    This article is about researching evidence for ehat those transitions looked like, focusing on evidence that dates around the end of that particular dry period, pre-Holocene.

    > Prior to the onset of the Holocene humid period, little is known about the relatively arid period spanning the end of the Pleistocene and the earliest Holocene in Arabia. An absence of dated archaeological sites has led to a presumed absence of human occupation of the Arabian interior. However, superimpositions in the rock art record appear to show earlier phases of human activity, prior to the arrival of domesticated livestock25.

    [1]: https://en.wikipedia.org/wiki/African_humid_period

    [2]: https://www.nationalgeographic.com/environment/article/green...

  10. Not strictly true: while this was previously believed to be the case, Anthropic demonstrated that transformers can "think ahead" in some sense, for example when planning rhymes in a poem [1]:

    > Instead, we found that Claude plans ahead. Before starting the second line, it began "thinking" of potential on-topic words that would rhyme with "grab it". Then, with these plans in mind, it writes a line to end with the planned word.

    They described the mechanism that it uses internally for planning [2]:

    > Language models are trained to predict the next word, one word at a time. Given this, one might think the model would rely on pure improvisation. However, we find compelling evidence for a planning mechanism.

    > Specifically, the model often activates features corresponding to candidate end-of-next-line words prior to writing the line, and makes use of these features to decide how to compose the line.

    [1]: https://www.anthropic.com/research/tracing-thoughts-language...

    [2]: https://transformer-circuits.pub/2025/attribution-graphs/bio...

  11. So, what I think most people don't realize is that the amount of computation an LLM can do in one pass is strictly bounded. You can see that here with the layers. (This applies to a lot of neural networks [1].)

    Remember, they feed in the context on one side of the network, pass it through each layer doing matrix multiplication, and get a value on the other end that we convert back into our representation space. You can view the bit in the middle as doing a kind of really fancy compression, if you like. The important thing is that there are only so many layers, and thus only so many operations.

    Therefore, past a certain point they can't revise anything because it runs out of layers. This is one reason why reasoning can help answer more complicated questions. You can train a special token for this purpose [2].

    [1]: https://proceedings.neurips.cc/paper_files/paper/2023/file/f...

    [2]: https://arxiv.org/abs/2310.02226

  12. There's been a few attempts at training a backspace token, though.

    e.g.:

    https://arxiv.org/abs/2502.04404

    https://arxiv.org/abs/2306.05426

  13. Adding knowledge works, depending on how to define knowledge and works; given sufficient data you can teach an LLM new things [1].

    However, the frontier models keep improving at a quick enough rate that it's often more effective just to wait for the general solution to catch up with your task then to spend months training a model yourself. Unless you need a particular tightly controlled behavior or need a smaller faster model or what have you. Training new knowledge in can get weird [2].

    And in-context learning takes literal seconds-to-minutes of time if your information fits in the context window, so it's a lot faster to go that route if you can.

    [1] https://arxiv.org/abs/2404.00213

    [2] https://openreview.net/forum?id=NGKQoaqLpo

  14. That's consistent with other research I've seen, where varied presentation of the data is key to effective knowledge injection [1].

    My assumption, based on the research is that training on different prompts but the same answer gives you more robust Q&A behavior; training on variations of how to express the same concept generalizes. Training on the same prompt and different answers gives you creative diversity [2].

    [1] https://arxiv.org/abs/2404.00213 [2] https://arxiv.org/abs/2503.17126

  15. Some of that is, or at least was, down to the training: extending the context window but not training on sufficiently long data or using weak evaluation metrics caused issues. More recent models have been getting better, though long context performance is still not as good as short context performance, even if the definition of "short context" has been greatly extended.

    RoPE is great and all, but doesn't magically give 100% performance over the lengthened context; that takes more work.

  16. Yes, I was using it for structured outputs before the dedicated structured outputs got their act together.
  17. My sense is they need to go back and update previous docs; they release a lot of software updates and a lot of notebooks showing how to use the features, but the two might fall out of sync. Would that match your observations?
  18. It's a little more subtle than that: They're approximating the language used by someone describing the taste of chocolate; this may or may not have had any relation to the actual practice of eating chocolate in the mind of the original writer. Or writers, because the LLM has learned the pattern from data in aggregate, not from one example.

    I think we tend to underestimate how much the written language aspect filters everything; it is actually rather unnatural and removed from the human sensory experience.

  19. Spans labeled as 'unknown' when I definitely labeled them in the code is probably the most annoying part of Phoenix right now.

This user hasn’t submitted anything.