Preferences

CIFAR-10 is an image classification dataset (32x32 pixel images.

LLaMA 70B 3.3 is a text-only, non-multimodal language model. Just look up the Huggingface page that your own repo points to.

> The Llama 3.3 instruction tuned text only model...

I might be wrong, but I'm pretty sure a text model is going to be no better than chance at classifying images.

Another comment pointed out that your test suite cheats slightly on HellaSwag. It doesn't seem unlikely that Grok set up the project so it could cheat at the other benchmarks, too.

https://www.hackerneue.com/item?id=46215166

> The repo contains the full pipelines, configuration files, and benchmark scripts, and those show the precise datasets, metrics, and evaluation flows.

There's nothing there, really.

I'm sorry that Grok/Ani lied to you, I blame Elon, but this just doesn't hold up.


This item has no comments currently.

Keyboard Shortcuts

Story Lists

j
Next story
k
Previous story
Shift+j
Last story
Shift+k
First story
o Enter
Go to story URL
c
Go to comments
u
Go to author

Navigation

Shift+t
Go to top stories
Shift+n
Go to new stories
Shift+b
Go to best stories
Shift+a
Go to Ask HN
Shift+s
Go to Show HN

Miscellaneous

?
Show this modal