Preferences

patresh
Joined 335 karma

  1. I believe OP's point is that for a given model quality, inference cost decreases dramatically over time. The article you linked talks about effective total inference costs which seem to be increasing.

    Those are not contradictory: a company's inference costs can increase due to deploying more models (Sora), deploying larger models, doing more reasoning, and an increase in demand.

    However, if we look purely at how much it costs to run inference on a fixed amount of requests for a fixed model quality, I am quite convinced that the inference costs are decreasing dramatically. Here's a model from late 2025 (see Model performance section) [1] with benchmarks comparing a 72B parameter model (Qwen2.5) from early 2025 to the late 2025 8B Qwen3 model.

    The 9x smaller model outperforms the larger one from earlier the same year on 27 of the 40 benchmarks they were evaluated on, which is just astounding.

    [1] https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct

  2. They're likely of limited use for someone looking for introductory material to ML, but for someone having done some computer vision and used various types convolution layers, it can be useful to see a summary with visualizations.
  3. Does anyone have experience with longer DeepResearch tasks with Mammouth? How does it compare to using Gemini's / ChatGPT's DeepResearch or GPTResearcher + API-based alternatives?

    For standard questions I feel like it doesn't matter too much what you use. When it comes to multi-step searching + reasoning flows like look for alternatives, fetch pricing, feature lists, compare etc, the differences are larger because of the engineering glue and prompting around the pure LLM inference which makes the tools more or less powerful.

  4. I've had no issues with the app lately, but it's still missing the feature of building a local search index to do searches based on e-mail content, like the web client can do.
  5. Yes, I don't mean HN doesn't experience toxicity, but putting things in context, if you read random posts on X versus HN there is no comparison.

    Moderation for sure helps, would there be ways to make it scalable with less manual supervision? Or a system that would organize people with certain rule-sets to distribute them into suitable sized groups?

    I do agree with your statement that "Good discussions evolve naturally and also randomly", let's say now your platform becomes popular. It will attract players that will want to exploit that either to sway opinions for their own gain, and I believe that this is becoming increasingly cheaper to game and simulate whole crowds. So the limits are mostly with this in mind.

    Indeed perhaps the term social platform is vague and different "optimal rules" could be different for social platforms that is a mega-forum, a network for friends, or just generic post sharing.

    I'm wondering if there is some sort of taxonomy of these rulesets or levers that exist? Or a review paper on what has been tried and what effects they had? There are so many possible ways to structure online social interactions.

  6. Indeed, there are different societal structures that would attract more one or the other type of person.

    I wonder if it would be possible to simulate this to understand what behaviors will emerge if you set certain types of rules. It is certainly difficult to create coherent personalities with LLMs that act in realistic ways but I wonder if one could get an approximation.

    Perhaps what I have in mind is also not best described as "pleasant", but also something that is net-positive for society, where as a whole society is better off having that than not. This is arguably the case for HN but not necessarily for some of the bigger ones out there.

  7. I also enjoy watching Charles, a French-Canadian cyclist currently cycling from Canada to Europe. As a geologist he regularly explains rock formations and rock types he encounters.

    https://www.youtube.com/c/Charlesenv%C3%A9lo

  8. If the diagram is representative of what is happening, it would seem that each cluster is represented as a hypersphere, possibly using the cluster centroid and max distance from the centroid to any cluster member as radius. Those hyperspheres can then overlap. Not sure if that is what is actually happening though.
  9. What is the clustering performed on? Is another embedding model used to produce the embeddings or do they come from the LLM?

    Typically LLMs don't produce usable embeddings for clustering or retrieval and embedding models trained with contrastive learning are used instead, but there seems to be no mention of any other models than LLMs.

    I'm also curious about what type of clustering is used here.

  10. I agree with your premise that there is often an unproductive pendulum-like phenomenon in public debates where interpretations swing from one extreme to the other, making nuanced discussions difficult.

    However I don't believe that PG's article meant to address the elephant, but rather was a meta-level thesis on how he sees debates being shut down by orthodoxy, and for that he does suggest what he thinks would be a possible solution.

    Perhaps the thesis could have gained in being more balanced to as you say "avoid giving tacit permissions for the extremists on the other side"? On the other hand, does one always have to shield one's expressions with disclaimers and is one not free to share thoughts however raw in order to express, discuss and learn, update our beliefs?

    There likely is a bigger responsibility when one has a larger audience to avoid misinterpretations, but ultimately I believe as long as there is a rational and nuanced discussion to take the good points and have a productive debate, it should be okay.

    How can we create incentives to have a more nuanced discussion?

  11. Some of the disagreement or confusion seems to stem from the definition of the word "woke" which means different things to different people?

    Having read both essays I don't see them necessarily in disagreement. pg criticizes the performative and orthodox nature of some social justice activists' behavior, however it doesn't seem that the author's behavior here is performative at all.

    Perhaps we should just avoid these terms like "woke" and just say what we mean to avoid this societal dissonance? I feel like decent rational people can talk past each other depending on how they have been exposed to the term.

  12. Some high paying jobs also come with high pressure and little free time which could harm life satisfaction. It could be that high earners that are likely to participate in such a study are the ones that have more free time to dedicate to spontaneous endeavors, therefore might already have a higher life satisfaction.

    This bias can also exist for lower-paying jobs, however I would guess proportionally there might be more 80-hour/week type high responsibility jobs in the higher paying brackets.

  13. Another related one from last year based on the Othello game (cited in the above paper) :

    Do Large Language Models learn world models or just surface statistics? - https://www.hackerneue.com/item?id=34474043 - Jan 2023 (174 comments)

  14. How can one explain the graph you linked given the recent bull market in stocks?

    Wouldn't this mean that capital is flowing in which should lead to more hiring? Is the job market response delayed or are there other factors?

  15. The high level API seems very smooth to quickly iterate on testing RAGs. It seems great for prototyping, however I have doubts whether it's a good idea to hide the LLM calling logic in a DB extension.

    Error handling when you get rate limited, the token has expired or the token length is too long would be problematic, and from a security point of view it requires your DB to directly call OpenAI which can also be risky.

    Personally I haven't used that many Postgres extensions, so perhaps these risks are mitigated somehow that I don't know?

  16. Or order destroying the public streets leading to houses with alleged criminal activity.
  17. RepRisk | Chief Technology Officer | Full-time | Zurich, Switzerland | https://www.reprisk.com

    RepRisk's goal is to drive transparency and accountability of company practices. As a leading ESG data provider, we monitor media reports worldwide to provide a comprehensive overview of companies' behavior.

    RepRisk, based in Zurich Switzerland, has been in the ESG business since 2007 and values intellectual honesty, humility and openness.

    We are looking for a talented CTO to drive the technical vision of the company.

    Apply at : https://www.reprisk.com/careers/chief-technology-officer-zur...

  18. If you need larger batch sizes but don't have the VRAM for it, have a look at gradient accumulation (https://kozodoi.me/python/deep%20learning/pytorch/tutorial/2...).

    You can accumulate the gradients of multiple batches before doing the weight update step. This allows you to run effectively much larger batch sizes than your GPU would allow without it.

  19. She says that the ROI probability calculation is wrong, which is the last part of her video and a separate topic.

    The part about correlations has not been retracted AFAIK. I agree that there's a need for a baseline though, there is one example on her recent twitter feed but more samples are needed to get a better picture.

  20. RepRisk | Full-time | Senior Data Engineer | Zurich, Switzerland | Required work permit or Swiss or EU citizenship | ONSITE / HYBRID

    RepRisk is a rapidly-growing global company and a pioneer in the environmental, social, and governance (ESG) data science field. Our goal is to make the world a better place by creating transparency in the business world. We combine NLP with human intelligence to analyze public information and identify ESG risks.

    We are looking for a talented data engineer for our NLP team in Zurich.

    We are looking for someone experienced in batch / stream processing e.g. Kafka, Python, SQL, AWS and in general experience with scalable data processing applications.

    More details on our portal : https://www.reprisk.com/careers/senior-data-engineer-zurich

  21. Also the next point

    > It should have (and has shown to have) better scaling laws

    is a statement based on two anecdotes but I don't see a compelling reason why this should be the case in general.

    Active learning approaches are not mentioned even though they allow incorporating human feedback during the fine-tuning process and this can be done with a purely supervised approach.

    IMO the last point is the only compelling one : having for example agents that can browse the web during learning could open a lot of possibilities. It would have been interesting to develop this last point more : what are the current difficulties in training such agents?

  22. Absolutely. Without giving proper context, the character giving an answer could be a wacky philosopher, a 5 year old child, a person from the 1800s, a liar, an uninterested passer-by, a trickster mage in a novel. If he didn't build the prompt to make it clear that it's a conversation, the context could even be song lyrics for example.
  23. There is a fundamental difference between AI Dungeon-type chatbots and chatbots you typically encounter on websites e.g. for customer support.

    The former does not really have a goal and is unconcerned about responding with factual information as long as the conversation is coherent. It makes sense to use large language models that are quite good at modeling next word probabilities based on context.

    The latter however is goal-oriented (help the customer) and constrained by its known actions and embedded knowledge. This often forces the conversational flows (or at least fragments) to be hard-coded and machine-learning is used as a compass to determine which flow to trigger next.

    For now controlling GPT-like language models remains an extremely tricks exercise but if some day we can constrain language models to only output desirable and factual information with a low cost in maintaining and updating its embedded knowledge, we should see a significant bump in "intelligence" of the typical website chatbot.

  24. Learning words by translation is fine but what the book argues is that it's very inefficient, because you seed them with respect to your original language so you build a habit of going back and forth between the languages when trying to come up with a word instead of staying immersed.

    The advantage of images is first of all that visual cues are very powerful for memory, the more senses you associate with a memory the stronger it will be (I wonder if anyone has ever tried to incorporate smells into SRS?). Furthermore, it is not always easy to find a decent image but the mere search for this image will make your brain work with that word in mind and create associations.

    Granted, it is not easy to find images for words such as "philosophy" but with a bit of creativity it is possible and if not, it's always possible to explain the target word in the target language to stay immersed.

  25. I agree with your comment. As a sidenote concerning your last point, Gabriel Wyner's book "Fluent Forever: How to Learn Any Language Fast and Never Forget It" explains in detail how to build an SRS system to learn a language and it strongly advises against using translation tasks in your SRS.

    Instead of thinking in the target language, your mind will create strong associations with the words in your original language and make it difficult to think in the target language by always having to refer to the original language. A better task design would be receiving images and coming up with the word in the target language.

  26. I find that the fact that the functions min and max have the same name as the variables min and max increases cognitive load which makes it harder to think about it.

    I find the following easier to read :

      Math.min(Math.max(num, lower_bound), upper_bound)
  27. That part of the Readme seems to be out of date, they released the largest GPT-2 model last year https://www.openai.com/blog/gpt-2-1-5b-release/
  28. I agree that there is no perfect objectivity journalism, even the choice of reporting something or not is a subjective choice. However, I find it misleading if not dangerous to claim that there is not even a continuum of objectivity or impartiality in journalism.

    If we just say that there is no impartiality, we put the news outlets that do a lot of research and strive to give news reports painting a balanced view of reality at the same level as propaganda machines that completely disregard facts.

  29. Running an i5 9600K / 2x Nvidia 2070 / 64GB RAM on 19.10 and I managed to get to a fairly stable configuration. The only pain point was and has always been Nvidia drivers and CUDA stability. I have yet to find a consistent way to install the driver and CUDA version you want and lock them. A couple of weeks ago I installed Wine which deleted my CUDA packages. For now it's stable enough for a few months until I need to reinstall or tinker with something but maybe someone has suggestions for improving stability.

This user hasn’t submitted anything.

Keyboard Shortcuts

Story Lists

j
Next story
k
Previous story
Shift+j
Last story
Shift+k
First story
o Enter
Go to story URL
c
Go to comments
u
Go to author

Navigation

Shift+t
Go to top stories
Shift+n
Go to new stories
Shift+b
Go to best stories
Shift+a
Go to Ask HN
Shift+s
Go to Show HN

Miscellaneous

?
Show this modal