Preferences

paddy_m
Joined 2,208 karma
Hacking on buckaroo the exploratory data analysis table widget for pandas and jupyter https://github.com/paddymul/buckaroo hnchat.com:yPQM3az7FbxcBqz4ZUyD hn@paddymullen.com

  1. I wish more programmers would pay attention to how productive power users in different can be with their tools. Look at CAD competitions. I wonder if there are video editting competitions?
  2. Buckaroo - the data table viewer for jupyter.

    I recently integrated Lazy Polars and running analytics in background processes so I can reliably provide a fast table viewing experience on dataframes that would normally exhaust memory of the jupyter kernel. Analytics are run column by column and results are written to cache, if a column fits into memory individually, summary stats for the entire dataframe can be computed.

    Here's a demo video of scrolling through 19M rows, and running background summary stats.

    https://www.youtube.com/shorts/x1UnW4Y_tOk

  3. I love the listing the number of dependencies in the title. It tells me that serious engineering went into this. I will be incorporating this "feature" into READMEs of my own projects.
  4. I’d love to read about how emissions / fuel economy is causing the oiling problems. Any articles?

    Would putting an aftermarket oil pump in these modern engines protect them or is it a deeper design issue?

  5. When should you reach for a data catalog via a data warehouse or data lake? If you are choosing a data catalog this is probably obvious to you, if you just happened on this HN post less so.

    Also, what key decisions do other data catalogs make via your choices? What led to those decisions and what is the benefit to users?

  6. Blame the obama CAFE regulations that accounted for wheelbase and car volume, giving manufacturers lower fuel economy standards for larger cars. Then the CAFE standards that hold trucks/SUVs to a lower standard.

    The economically efficient way to get the fuel economy result would have been to increase gasoline taxes, but that's a non starter politically. Higher gas prices would allow people to choose to keep a cheap gas guzzling truck/car, buy a new more efficient and expensive car, or buy a new slightly more efficient slightly more expensive car. It would have been simpler though and given consumers more choice.

  7. I have been working on Buckaroo - my table display library for dataframes in notebook environments. Buckaroo adds table and analytics features like histograms, summary stats, sorting, and search to every dataframe. Recently I have been working to make it work better with large datasets.

    This involves making it lazy for polars, allowing it to read arbitrarily large files no longer requiring loading the entire dataframe into memory. When a large dataframe initially displays, no summary stats will be available. Summary stats are computed in the background in groups of columns. Then results are cached per column. To accomplish this I wrote a polars plugin in rust that computes hashes of columns. Dealing with large data like this is tricky, operations sometimes crash, sometimes take all available memory, and sometimes they just run for a very long time. I have also been building an execution framework for Buckaroo. It uses multiprocessing based timeouts, and the caching to execute summary stats in the background.

    Being able to control the execution, recover from timeouts, crashes and memory exhaustion opens up some interesting debugging tools. I have written methods that take arbitrary groups of polars expressions and produce a minimal reproduction test case through a git-bisect like process.

    All of this assures that if individual columns of a dataframe fits into memory, summary stats will be computed for the entire dataframe in the background. And because it is cached, the next time you open the same dataframe, the stats will be display instantly. When exploring data I do this in an adhoc way manually (splitting up a dataframe by columns and rows), but it is error prone. This should all be automatic.

    I will be presenting this at PyData Boston in December.

    The Column's the limit: interactive exploration of larger than memory data sets in a notebook with Polars and Buckaroo

  8. Location: Boston Remote: Yes Willing to relocate: Yes Technologies: Python, Pandas/Numpy, Jupyter, JS/TS and something many devs skip: actually talking to users. Résumé/CV: https://www.linkedin.com/in/paddymullen/ Email: paddy@paddymullen.com I'm a developer who believes code isn't valuable unless it's used. That's why I start with conversations, not commits. I work to understand real problems before reaching for shiny new packages. I build tools that are simple, effective, and easy to adopt. My goal is always to solve the problem and make sure people know there’s a solution. In my next role, I'm looking to work with a team that values clarity and impact. I'm especially interested in data heavy environments where thoughtful tooling can make workflows better. Most recently, I built [Buckaroo](https://github.com/paddymul/buckaroo) an open source data table for Jupyter using Pandas/Polars. It combines fast rendering, summary stats, and a low-code UI. It scratches an itch I've had for over a decade and has already streamlined my own analysis workflow.
  9. https://github.com/paddymul/emacs-from-scratch/blob/master/p...

    and

    https://github.com/paddymul/emacs-from-scratch/blob/master/p...

    The emacs-lisp for changing the behaviour of consult-grep was quite complex and took a while to write.

  10. How does this deal with numeric types like NaN, Infinity...?
  11. What's the story for JS. I see that there is a javascript directory, but it only mentions nodejs. I don't see an npm package. So does this work in web browsers?
  12. I customized grep-find on my setup. I have a shell script so that it does the following (in typescript-mode)

    first search ts, tsx, js, jsx files in the current project. Exclude .git, node_modules...

    then search node_modules with the ts.. extensions, still exclude .git

    finally search the whole tree for .py files still exclude .git, /dist...

    This is coupled with consult-find-grep. I basically want to find a string in the most relevant file type. I never want a result from node_modules first in the results. It took some work, but the results are quite nice!

  13. Busses are loud, but not nearly as load and polluting as cars in aggregate
  14. citation needed.
  15. If anyone has an open source project and wants a review of their README and docs, please get in touch. I enjoy providing this type of feedback, and I think every project can benefit from it.
  16. This comes up in testing a lot. I want testing data included in test source files to look tabular. I want it to be indented such that I can spot order of magnitude differences.
  17. I have heard multiple people claim that macros are incompatible with strong or static typing and I don't see why.

    If there were a lisp with optional static typing like typescript, it would seem to me to be completely possible to write macros that write types. In many cases it woudl do away with the need for generic types (and allow multiple competing syntaxes for dynamic types). Most interestingly it would allow you to write new generic forms instead of waiting for whatever the language designer gives you. It would also allow you access to types at runtime (which the typescript language designers took away).

    Maybe people were telling me that lisp style macros were incompatible with hindley millner typing, but I still don't see how. The macros would just emit a hindley milmner subset.

    What am I missing?

  18. I was recently wondering if that was still running.
  19. That would be a reasonable stance if the difference were in incubating a new project vs excluding functionality from blessing because it interfaces with non free software. The functionality I'm talking about is excluded because of the latter.
  20. Well the linking into GCC was a C code issue. For emacs, there is a large collection of elisp that is shipped with the official package. Preventing worthy enhancements of that core package solely in the name of a distorted view of freedom hinders emacs and the adoption of emacs.

This user hasn’t submitted anything.

Keyboard Shortcuts

Story Lists

j
Next story
k
Previous story
Shift+j
Last story
Shift+k
First story
o Enter
Go to story URL
c
Go to comments
u
Go to author

Navigation

Shift+t
Go to top stories
Shift+n
Go to new stories
Shift+b
Go to best stories
Shift+a
Go to Ask HN
Shift+s
Go to Show HN

Miscellaneous

?
Show this modal