2
points
mynti
Joined 93 karma
- mynti parentFor all of these kind of releases I ask myself, if it would work well they would not release it for free
- To me, this only makes sense on a word level not sentence level. I can understand that words, especially those that are older, have evolved because of energy and comfort constraints of our physiology. But to extend this to sentence level is a rather big step. I would suppose it works for simple, short sentences that had to be efficient in the past. But imagine sentences about computer science where most words are rather new and have been chosen by arbitrary rules. To me it would be interesting to see, if this hypothesis holds when applied to longer and more complex sentences and "modern" words.
- If we think of every generation as a compression step of some form of information into our DNA and early humans existed for ~1.000.000 years and a generation is happening ~20years on average, then we have only ~50.000 compression steps to today. Of course, we have genes from both parents so they is some overlap from others, but especially in the early days the pool of other humans was small. So that still does not look like it is on the order of magnitude anywhere close to modern machine learning. Sure, early humans had already a lot of information in their DNA but still
- ".. Claude estimates that AI reduces task completion time by 80%. We use Claude to evaluate anonymized Claude.ai transcripts to estimate the productivity impact of AI."
What is this? So they take Claude and ask how much do you think you saved on time here? How can you take this seriously. ChatBots are easy to exaggerate, especially about something positive like this.
- Cool idea! I had a look at the code and have been wondering about the sigmoid gating, it is used to add some of the q_struct and k_struct into the original key and query. But I wonder why this gating is independend of the input? I would have expected this gating to be dependednd on the input, so if the model sees something more complex it needs more of this information (or something similar). But it is just a fix, learnable parameter per layer, or am I mistaken? What is the intuition about this?
- If you put it against the value created from these hours, the graph almost flips entirely: https://figure.nz/chart/mMmSnWWbULiK4SvY-17BBScq4PaYeiUnz
Also: in some countries, like Germany, there is a lot of part time work for mothers, which does impact this statistic quite a bit
- Super interesting blogpost. I just wonder how this is actually different to LORA, since LORA also adds some parameters and freezes the rest of the model. This seems like a sparse, memory efficient LORA with a couple of extra steps, since it uses attention again to make the sparsity work. All while making it a lot more effective compared to LORA (performance drop of only 11% compared to 71%).
- “To be an artist means you must declare a loyalty to your art form and your vision that runs deeper than almost any other, even sometimes deeper than blood kinship.”
That is absurd. Selling out your childrens childhood for some sort of pseudo deep "art" and defending it in this way is almost psychopathic
- For anyone curious about what the Gated Delta Network is: https://arxiv.org/pdf/2412.06464
- This is super cool! One thing I find counterintuitive is that GPT5 or o3 not have better performance. GPT5 gets about 800k on average per round but I would have expected it to be nearly perfect, since these are not particularly hard questions and mostly trivia or simple look up knowledge questions. There is little reasoning involved so I expected the big models to do much better.
- i have been anticipating this release for some time now. I am an avid user of ecosia and find the results passable in comparison to google. A couple of queries a week I am unsatisfied with and switch to google or a chatbot these days. I hope this new index is also comparable in quality and one step closer to a bit more competition in this market. Sadly, I could not find an official blog post or anything on the qwant news site..
- I clicked because i was very curious what this actually is, but it really is just what it says. The problem is, there is no twerking, the girls are just swaying from left to right really slowly which is not really twerking, but that is beside the point. Why are things like this being built? How can someone look themselves in the eyes after this?
- this seems very interesting: they got a big sensor dataset and generated some text from that. I guess this involves things like maximum values, mean values, maybe simple trends and things like was the person walking or biking etc. It would be interesting to see if the model identifies things that were not so easily provided in the training data. Otherwise this is just teaching the model to sort of calculate the mean from sensor data instead of using tools to do this