This reduces to parsing PDFs, which is an unsolved hard problem.
At low volumes, my preferred approach is to select and extract text (copy/paste, perhaps using the poppler library for larger-scale work), dump that to plain-text and convert that (manually / scripted) to Markdown. From there you can get to PDF or pretty much any other format through tools such as pandoc.
At low volumes, my preferred approach is to select and extract text (copy/paste, perhaps using the poppler library for larger-scale work), dump that to plain-text and convert that (manually / scripted) to Markdown. From there you can get to PDF or pretty much any other format through tools such as pandoc.