Comment by cyanydeez - Hacker Neue

cyanydeez 5 days ago parent

for vision models, I wonder if they can train on things like photo negatives, rotated images, etc. Or madlib like sentences where a Q/A is like "the _____ took first place in the horse show."

bearseascape 4 days ago

The madlib like sentences approach is actually how masked token prediction works! It was one of the pretraining tasks for BERT, but nowadays I think all (?) LLMs are trained with next token prediction instead.

latency-guy2 4 days ago

For photo negatives - usually doesn't matter. I am not up to date with what the vision folks are doing at these companies, but images are usually single channel, and more likely than not for regular images in greyscale. Otherwise in complex domain for the radar folks, and those are not RGB based images at all, rather scatterer defined.

Additional channels being recognized in training usually didn't matter for the experiments and models I used to deal with before 2022, and if they were, certainly did not matter for colors. Then again, the work I was doing was on known (and some additional confusers) classes for object detection and classification where the color pretty much didn't matter in the first place.

This item has no comments currently.