https://superdocs.io
- https://citellm.com
Working on CiteLLM, an API that extracts structured data from PDFs and returns citations for each field (page + coordinates + source snippet + confidence).
Instead of blindly trusting the LLM, you can verify every value by linking it back to its exact location in the original PDF.
- > verifying their claims ends up taking time.
I've been working on this problem with https://citellm.com, specifically for PDFs.
Instead of relying on the LLM answer alone, each extracted field links to its source in the original document (page number + highlighted snippet + confidence score).
Checking any claim becomes simple: click and see the exact source.
- 3 points
- I'm working on SuperCurate (https://getsupercurate.com), which is geared towards note retrieval and curation rather than note creation. Think filing cabinet for your notes, web clippings, images and PDFs.
I wanted fast search and filters for my Evernote archive so I could drill down and surface exactly what I was looking for.
There's also a Web Clipper extension for Chrome.
Demos:
Search and curation: https://www.youtube.com/watch?v=z4QSIoUL4Uk
Web Clipper: https://www.youtube.com/watch?v=8F7QoC7X3fs
Search inside PDFs (jumps to page + highlights snippet): https://www.youtube.com/watch?v=t0X9sD-938Q
It's free while in beta, would love feedback if you try it.
- 2 points
- 1 point
- https://getsearchablepdf.com (I'm the founder)
- Shameless plug: https://getsearchablepdf.com
- We've built an app like that but for PDF table extraction, https://table2xl.com
- Shameless plug: https://getsearchablepdf.com
There's a free trial so you can check if it works for your handwriting.
- If you're on Windows try https://table2xl.com (disclosure: I'm the founder), it's more accurate than Excel's camera import. No API though.
Every extracted field comes with a precise citation back to the source document (page + snippet + bounding box + confidence score) so reviewers can verify where each value came from.
Hallucinations get flagged automatically because there's no supporting text in the source.
The goal is to make HITL fast and not have reviewers read through the whole document.