>If you want a large open-source model, Qwen2-VL-72B is most likely the best option.
Only the 2&7B have been "open sourced". From your link:
>We opensource Qwen2-VL-2B and Qwen2-VL-7B with Apache 2.0 license, and we release the API of Qwen2-VL-72B!
Only the 2&7B have been "open sourced". From your link:
>We opensource Qwen2-VL-2B and Qwen2-VL-7B with Apache 2.0 license, and we release the API of Qwen2-VL-72B!
1. This is a VLM, not a text-to-image model. You can give it images, and it can understand them. It doesn't generate images back.
2. It seems like Pixtral 12B benchmarks significantly below Qwen2-VL-7B [1], so if you want the best local model for understanding images, probably use Qwen2. If you want a large open-source model, Qwen2-VL-72B is most likely the best option.
1: https://qwenlm.github.io/blog/qwen2-vl/