Preferences

Couple notes for newcomers:

1. This is a VLM, not a text-to-image model. You can give it images, and it can understand them. It doesn't generate images back.

2. It seems like Pixtral 12B benchmarks significantly below Qwen2-VL-7B [1], so if you want the best local model for understanding images, probably use Qwen2. If you want a large open-source model, Qwen2-VL-72B is most likely the best option.

1: https://qwenlm.github.io/blog/qwen2-vl/


>If you want a large open-source model, Qwen2-VL-72B is most likely the best option.

Only the 2&7B have been "open sourced". From your link:

>We opensource Qwen2-VL-2B and Qwen2-VL-7B with Apache 2.0 license, and we release the API of Qwen2-VL-72B!

This item has no comments currently.

Keyboard Shortcuts

Story Lists

j
Next story
k
Previous story
Shift+j
Last story
Shift+k
First story
o Enter
Go to story URL
c
Go to comments
u
Go to author

Navigation

Shift+t
Go to top stories
Shift+n
Go to new stories
Shift+b
Go to best stories
Shift+a
Go to Ask HN
Shift+s
Go to Show HN

Miscellaneous

?
Show this modal