- Thank you. Your comment about LLMs to semantically parse diverse data, as a first step, makes sense. In fact come to think of it, in the area of prompt optimization too - such as MIPROv2 [1] - the LLM is used to create initial prompt guesses based on its understanding of data. And I agree that UMAP still works well out of the box and has been pretty much like this since its introduction.
[1] Section C.1 in the Appendix here https://arxiv.org/pdf/2406.11695
- I was not aware this existed and it looks cool! I am definitely going to take out some time to explore it further.
I have a couple of questions for now: (1) I am confused by your last sentence. It seems you're saying embeddings are a substitute for clustering. My understanding is that you usually apply a clustering algorithm over embeddings - good embeddings just ensure that the grouping produced by the clustering algo "makes sense".
(2) Have you tried PaCMAP? I found it to produce high quality and quick results when I tried it. Haven't tried it in a while though - and I vaguely remember that it won't install properly on my machine (a Mac) the last time I had reached out for it. Their group has some new stuff coming out too (on the linked page).
- Thanks for the example. Yes, true, this is for expensive functions - to be precise functions that depend on data that is hard to gather, so you interleave the process of computing the value of the function with gathering strategically just as much data as is needed to compute the function value. The video on their page [1] is quite illustrative: calculate shortest path on a graph where the edge weights are expensive to obtain. Note how the edge weights they end up obtaining forms a narrow band around the shortest path they find.
- Timefold looks very interesting. This might be irrelevant but have you looked at stuff like InfoBax [1]?
- You don't - the way I use LLMs for explanations is that I keep going back and forth between the LLM explanation and Google search /Wikipedia. And of course asking the LLM to cite sources helps.
This might sound cumbersome but without the LLM I wouldn't have (1) known what to search for, in a way (2) that lets me incrementally build a mental model. So it's a net win for me. The only gap I see is coverage/recall: when asked for different techniques to accomplish something, the LLM might miss some techniques - and what is missed depends upon the specific LLM. My solution here is asking multiple LLMs and going back to Google search.
- Love awk. In the early days of my career, I used to write ETL pipelines and awk helped me condense a lot of stuff into a small number of LOC. I particularly prided myself in writing terse one-liners (some probably undecipherable, ha!); but did occasionally write scripts. Now I mostly reach for Python.
- 1 point
- I'm curious to know if Anthropic mentions anywhere that they use speculative decoding. For OpenAI they do seem to use it based on this tweet [1].
- Wouldn't this be an optimization problem, that's to say, something like z3 should be able to do - [1], [2]?
I was about to suggest probabilistic programming, e.g., PyMC [3], as well, but it looks like you want the optimization to occur autonomously after you've specified the problem - which is different from the program drawing insights from organically accumulated data.
[1] https://github.com/Z3Prover/z3?tab=readme-ov-file
[2] https://microsoft.github.io/z3guide/programming/Z3%20Python%...
- Aside from secondmind [1] I don't know of any companies (only because I haven't looked)... But if I had to look for places with strong research culture on GPs (I don't know if you're) I would find relevant papers on arxiv and Google scholar, and see if any of them come from industry labs. If I had to take a guess on Bayesian tools at work, maybe the industries to look at would be advertising and healthcare.I would also look out for places that hire econometricists.
Also thank you for the book recommendation!
- This is the definitive reference on the topic! I have some notes on the topic as well, if you want something concise, but that doesn't ignore the math [1].
[1] https://blog.quipu-strands.com/bayesopt_1_key_ideas_GPs#gaus...
- Active Learning is a very tricky area to get right ... over the years I have had mixed luck with text classification, to the point that my colleague and I decided to perform a thorough empirical study [1], that normalized various experiment settings that individual papers had reported. We observed that post normalization, randomly picking instances to label is better!
- Evals somehow seem to be very very underrated, which is concerning in a world where we are moving towards (or trying to) systems with more autonomy.
Your skepticism of "llm-as-a-judge" setups is spot on. If your LLM can make mistakes/hallucinate, then of course, your judge llm can too. In practice, you need to validate your judges and possibly adapt to your task based on sample annotated data. You might adapt them by trial and error, or prompt optimization, e.g., using DSPy [1], or learning a small correction model on top of their outputs, e.g., LLM-Rubric [2] or Prediction Powered Inference [3].
In the end, using the LLM as a judge confers just these benefits:
1. It is easy to express complex evaluation criteria. This does not guarantee correctness.
2. Seen as a model, it is easy to "train", i.e., you get all the benefits of in-context learning, e.g., prompt based, few-shot.
But you still need to evaluate and adapt them. I have notes from a NeurIPS workshop from last year [4]. Btw, love your username!
[2]https://aclanthology.org/2024.acl-long.745/
- I noticed the conflation of terms too but it seems to arise out of the original Microsoft announcement!
https://devblogs.microsoft.com/commandline/edit-is-now-open-...
- Note the website (ai-contest.com) that the post links to seems to have been hijacked by a gambling site.
For the use-cases where Genetic Programming was popular, I would recommend looking at Bayesian Optimization (bayesopt) as an alternative today (I know I keep recommending the area - but I hope I do when it is relevant :-)). This is mostly because IMHO it has a principled foundation that has been productively developed further in the past few years. Here's a good book on the topic [1], and I've a tutorial as well [2]. Interestingly one of the books I had encountered when reading up on Genetic Algo. years ago was by Melanie Mitchell [3]!
Bayesopt or Genetic Programming, or any search algorithm that can operate over non-differentiable objective functions are very useful in practice. For ex, when performing model selection in the space of hyperparameters, when your model is not differentiable such as a traditional Decision Tree [4]. Or exotic use-cases like molecule discovery [5].
You can try out bayesopt using the botorch or hyperopt libraries. The latter only implements a specific bayesopt algo. which was/is popular but it seems to have been bettered of late [4].
[2] Part 1 https://blog.quipu-strands.com/bayesopt_1_key_ideas_GPs
[3] Found a free copy online https://www.boente.eti.br/fuzzy/ebook-fuzzy-mitchell.pdf
[4] "... Analysis of the Black-Box Optimization Challenge 2020" https://proceedings.mlr.press/v133/turner21a.html
[5] ChemBO is an example but there are others https://proceedings.mlr.press/v108/korovina20a.html
- The people who founded Ponoc seemed to have creative differences with Miyazaki. They wanted to make a movie [1] that they felt Ghibli won't greenlight [2] - but there seems to have been no deep seated animosity or desire to rip-off. Incidentally I just borrowed this movie from the local library a few hour ago because the cover art reminded me of Ghibli but I noticed it wasn't a Ghibli production. Some searching online led me to the cited article.
[1] Mary and the Witch’s Flower https://imdb.com/title/tt6336356/
[2] https://otakuusamagazine.com/hayao-miyazaki-says-he-wont-see...
- I highly recommend the course you've mentioned (by Yaser Abu-Mostafa). In fact I still recommend it for picking up the basics; very good mix of math and intuition, Abu-Mostafa himself is a terrific teacher, and he is considerate and thoughtful in responding to questions at the end of his presentations. The last part is important if you're a beginner: it builds confidence in you that its probably ok to ask what you might consider a simple question - it still deserves a good answer. The series is a bit dated now in terms of what it covers, but still solid as a foundational course.
- I came across this a few days ago, and my excuse to give it a a serious look is that Andreas Krause has some deep and interesting research in Gaussian Processes and Bandits [1].
[1] https://scholar.google.com/scholar?start=10&q=andreas+krause...
- I have mentioned this elsewhere online but I was once attending a talk in the computer history museum for which I had turned up early. An elderly gentleman took his seat right before me - it took me a while to process that it was Knuth (in reality, it was less to do with processing and more with accepting!).
Somewhat recently we spotted him at a Hitchcock movie festival at Palo Alto, which my wife and I were attending.
Random run-ins are surreal :-)
- 1 point
- YouTube recommendations have worked very well for me. I listen to a lot of talks and if it weren't for the recommendations I wouldn't have discovered some very interesting talks or even areas. The only time the recommendations derail is when I have guests over and they login into my account (because that's the one always logged in on my roku). But that is expected. And it's easy to get rid of, by deleting the history in a custom date range, or simply by continuing to watch the kind of stuff I was watching earlier.
- I have had some success with PowerPoint animations. Put up the equation, and for each mouse click surround the symbol(s) you want to explain in a red box or something and have a infobox show up (which can also contain images, such as plots - very helpful). As you transition to the next symbol(s), fade away the previous infobox and surrounding box.
- It is pleasantly surprising to see how close your pipeline is to mine. Essentially a good representation layer - usually based on BERT - like minilm or MPNet, followed by a calibrated linear SVM. Sometimes I replace the SVM with LightGBM if I have non-language features.
If I am building a set of models for a domain, I might fine-tune the representation layer. On a per-model basis I typically just train the SVM and calibrate it. For the amount of time this whole pipeline takes (not counting the occasions when I fine-tune), it works amazingly well.
EDIT: It shows the side-by-side view by default, but it is easy to toggle to a unified view. There's probably a way to permanently set this somewhere.