https://mistral.ai/news/mistral-ocr , recent release. Its been a step function improvement for my pipelines
I’m feeding pdfs directly to Gemini to extract tables and so far the results are pretty good. There was a post on HN a few days ago about using Gemini for this task.
I assume there's some reasonable tool out there to convert PDFs to Markup and than feed it to some LLM API with okay costs (Gemini? DeepSeek?). Any suggestions?