Preferences

Try with cpdf (disclaimer, wrote it):

  cpdf -output-json -output-json-parse-content-streams in.pdf -o out.json
Then you can play around with the JSON, and turn it back to PDF with

  cpdf -j out.json -o out.pdf
No live back-and-forth though.

The live back-and-forth is the main point of what I'm asking for — I tried your cpdf (thanks for the mention; will add it to my list) and it too doesn't help; all it does is, somewhere 9000-odd lines into the JSON file, turn the part of the content stream corresponding to what I mentioned in the earlier comment into:

        [
          [ { "F": 0.0 }, "g" ],
          [ { "F": 0.0 }, "G" ],
          [ { "F": 0.0 }, "g" ],
          [ { "F": 0.0 }, "G" ],
          [ "BT" ],
          [ "/F19", { "F": 10.9091 }, "Tf" ],
          [ { "F": 88.93600000000001 }, { "F": 709.0410000000001 }, "Td" ],
          [
            [
              "Subsequen",
              { "F": 28.0 },
              "t",
              { "F": -374.0 },
              "to",
              { "F": -373.0 },
              "the",
              { "F": -373.0 },
              "p",
              { "F": -28.0 },
              "erio",
              { "F": -28.0 },
              "d",
              { "F": -373.0 },
              "analyzed",
              { "F": -373.0 },
              "in",
              { "F": -374.0 },
              "our",
              { "F": -373.0 },
              "study",
              { "F": 83.0 },
              ",",
              { "F": -383.0 },
              "Bridge's",
              { "F": -373.0 },
              "paren",
              { "F": 27.0 },
              "t",
              { "F": -373.0 },
              "compan",
              { "F": 28.0 },
              "y",
              { "F": -373.0 },
              "Ne",
              { "F": -1.0 },
              "wGlob",
              { "F": -27.0 },
              "e",
              { "F": -374.0 },
              "reduced"
            ],
            "TJ"
          ],
          [ { "F": -16.936 }, { "F": -21.922 }, "Td" ],
This is just a more verbose restatement of what's in the PDF file; the real questions I'm asking are:

- How can a user get to this part, from viewing the PDF file? (Note that the PDF page objects are not necessarily a flat list; they are often nested at different levels of “kids”.)

- How can a user understand these instructions, and “see” how they correspond to what is visually displayed on the PDF file?

This might actually be something very valuable to me.

I have a bunch of documents right now that are annual statutory and financial disclosures of a large institute, and they are just barely differently organized from each year to the next to make it too tedious to cross compare them manually. I've been looking around for a tool that could break out the content and let me reorder it so that the same section is on the same page for every report.

This might be it.

This item has no comments currently.

Keyboard Shortcuts

Story Lists

j
Next story
k
Previous story
Shift+j
Last story
Shift+k
First story
o Enter
Go to story URL
c
Go to comments
u
Go to author

Navigation

Shift+t
Go to top stories
Shift+n
Go to new stories
Shift+b
Go to best stories
Shift+a
Go to Ask HN
Shift+s
Go to Show HN

Miscellaneous

?
Show this modal