Actually it's quite simple. We parse from left to right. When we hit EOL, we return to the beginning of line and increase Y by one.
Blocks are parsed in the following way: when we get the beginning count of block opening characters, we move Y by one, loop right while whitespace, until we encounter ending count of block characters.
In transposed block, we just switch X and Y, it is easily done with pointers, and use the same code.
(fst-atom """ trd-atom frt-atom
""" 00001
asdf 00002 """ fth-atom)
qwer 00003 hahaha
zxcv """ hehehe
""" hohoho
"""
I'm not sure I'd like the above to be parseable.
It might even be easier to treat the input string as a 2D grid than as a sequence and have a parsing head that behaves like a 2x2 convolutional kernel...
This would make for either a great Advent of Code, or a nightmare interview question, I love it.