Show HN: Parqeye – A CLI tool to visualize and inspect Parquet files

105 points 1 day ago

25 comments kaushiksrini github.com

I built a Rust-based CLI/terminal UI for inspecting Parquet files—data, metadata, and row-group-level structure—right from the terminal. If someone sent me a Parquet file, I used to open DuckDB or Polars just to see what was inside. Now I can do it with one command.

Repo: https://github.com/kaushiksrini/parqeye

MayeulC 8 minutes ago

This looks very handy, thank you for working on this and making it open source.

I did submit a feature request for vi keybindings; though I could look into contributing this myself if I find a bit of spare time.

The other thing that surprised me was the size of the binaries: 90MB for a TUI tool (x64 Linux)? I wonder what the bulk of that is? Is there an issue with LTO? An other commenter noticed as well.

It also looks like you are building against a relatively recent glibc (2.34), which limits compatibility with older systems. Building against an older glibc can be hard to do, so I am not faulting you here, and you do provide a musl fallback, which is appreciated (mandatory notice that the musl allocator can dramatically degrade the performance of rust programs, just in case you were not aware of this).

A few more ideas for improvements (you probably already have your own laundry list):

- Mouse support?

- Seeing that you do have graphs, it would be fun to see a scatter plot as well as a distribution plot under statistics in the "Row Groups" tab (though you probably pull these from the metadata, so that would require further processing, which may be out of scope).

alentred 3 hours ago

Very nice that it can show the metadata. If you rather focus on the data itself, a Swiss army knife in the terminal is VisiData [1] . It works with many formats from CSV to Parquet. You'd need to install Pyarrow I think to read Parquet files. VisiData is great to not only peek into the file but filter it, sort, compute simple metrics and even can plot a histogram or scatterplot for ex. I avoided a lot of Jupyter notebooks by using VisiData :)

[1] https://www.visidata.org/

hilti 2 hours ago

Similar tool for JSONL files: I built JSONL Viewer Pro after repeatedly crashing VS Code trying to inspect multi-GB training datasets and IoT device logs with nested objects.

Native Mac/Windows app with multi-threaded parsing (simdjson), automatic nested object flattening, and handles 10M+ rows instantly.

For HN: Use code HN100 for free access

https://iotdatasystems.gumroad.com/

Built with C++ for native performance (~6MB app, not Electron).

Would love feedback from folks working with large JSONL files.

tomtom1337 27 minutes ago

Super quick feedback - opening that link on my phone shows me two options next to each other, seemingly with the same name / description (followed by …) and same pricetag. I had to turn my phone sideways to see that there is a windows and a Mac version.

I think you can afford the extra characters to show the whole page in portrait mode. (iPhone 16 pro Safari)

https://imgur.com/a/aTxO3sp

nathanscully 2 hours ago

I found a similar tool called nail-parquet[1] which has some nice query functions. I packaged[2] it up for nixpkgs but it’s stuck in merge limbo…

[1] https://github.com/Vitruves/nail-parquet [2] https://github.com/NixOS/nixpkgs/pull/449066

bigshik 5 hours ago

Nice work—this hits a real pain point with Parquet. My main use case is debugging partitioned datasets on S3 with schema drift and skew, where I care about: which files/partitions have schema mismatches, weird row-group stats (all-null, out-of-range, huge skew), and doing that via metadata only.

Right now parqeye looks mainly single-file focused. Do you have plans for a “dataset mode” that takes a dir/S3 prefix and surfaces per-file/row-group summaries (row counts, min/max, null %, schema diffs vs a reference file) using just Parquet stats so it scales to tens of GB? Or do you see parqeye intentionally staying a single-file inspector?

jasonjmcghee 5 hours ago

Yours looks much better for your use case, but fwiw you can do it in a single command with duckdb too (but not interactive etc.):

    duckdb -c "from 'foo.parquet'"

but maybe still useful for other formats or multi-file or remote situations

kylebarron 6 hours ago

Looks great!

Another seemingly extremely similar project released in the last few days: https://github.com/raulcd/datanomy

el_oni 2 hours ago

Beautiful, I'm currently deep into getting our data into iceberg from firehose and I'm really curious what metadata is written, are bloomfilters being written for the columns i want? Has my compaction and sort jobs helped min-max statistics on those columns?

Will take a look when i get to my laptop!

papers1010 9 hours ago

It’s crazy how long we’ve gone without a tool like this. This is huge. Thank you for finally building this!

0cf8612b2e1e 7 hours ago

It is really incredible how poor the parquet tooling has been for years. The cornerstone of data engineering, yet just inspecting a file is needlessly clunky.

joelthelion 3 hours ago

What is really missing for parquet's wide adoption is support in Excel.

lolive 8 hours ago

Can DuckDB be included in the tool, so you can run queries directly from the UI? [that would avoid opening DBeaver whenever you need that kind of feature]

lolive 8 hours ago

Hu huuum... https://harlequin.sh/

mrasong 7 hours ago

This tool actually feels pretty solid too.

banga 8 hours ago

Looks like a nice tool, but failed for me when reading a geoparquet file created using duckdb.

lolive 8 hours ago

Apart from some visual glitches, this is an INSTANT BUY !

Note: must the Windows binary really be 78MB ?

ch2026 7 hours ago

CLIs are bulky

WorldPeas 1 day ago

thank you so much! this was an annoyance of mine for so long. edit: any chance you make a brew package? if you'd like I'd be happy to PR it in.

kaushiksrini OP 1 day ago

yep! it’s available as a homebrew tap — you can install it with: `brew install kaushiksrini/parqeye/parqeye`

dacox 5 hours ago

awesome! i was just looking at a bucket full of parquet files from last year trying to recall some things about them.

i tried to install with brew, but it told me my cli tools were "too out of date". Never seen that before! and also just upgraded.

Will try again tomorrow

WorldPeas 1 day ago

wonderous.

jspanos2 6 hours ago

This is very impressive. Look forward to using this

swety101 5 hours ago

Such a cool idea!! So helpful

dionian 5 hours ago

tried it out. love it.

This item has no comments currently.