A lot of customers choose us for our handwriting, checkbox, and table performance. To handle complex handwriting, we've built an agentic OCR correction layer which uses a VLM to review and make edits to low confidence OCR errors.
Tables are a tricky beast, and the long tail of edge cases here is immense. A few things we've found to be really impactful are (1) semantic chunking that detects table boundaries (so a table that spans multiple pages doesn't get chopped in half) and (2) table-to-HTML conversion (in addition to markdown). Markdown is great at representing most simple tables, but can't represent cases where you have e.g. nested cells.
Accuracy and data verification is challenging. We have a set of internal benchmarks we use, which gets us pretty far, but that's not always representative of specific customer situations. That's why one of the earliest things we built was a evaluation product, so that customers can easily measure performance on their exact docs and use cases. We recently added support for LLM-as-a-judge and semantic similarity checks, which have been really impactful for measuring accuracy before going live.
A lot of customers choose us for our handwriting, checkbox, and table performance. To handle complex handwriting, we've built an agentic OCR correction layer which uses a VLM to review and make edits to low confidence OCR errors.
Tables are a tricky beast, and the long tail of edge cases here is immense. A few things we've found to be really impactful are (1) semantic chunking that detects table boundaries (so a table that spans multiple pages doesn't get chopped in half) and (2) table-to-HTML conversion (in addition to markdown). Markdown is great at representing most simple tables, but can't represent cases where you have e.g. nested cells.
You can see examples of both in our demo! https://dashboard.extend.ai/demo
Accuracy and data verification is challenging. We have a set of internal benchmarks we use, which gets us pretty far, but that's not always representative of specific customer situations. That's why one of the earliest things we built was a evaluation product, so that customers can easily measure performance on their exact docs and use cases. We recently added support for LLM-as-a-judge and semantic similarity checks, which have been really impactful for measuring accuracy before going live.