df = ( pl.scan_parquet('hf://datasets/minimaxir/mtg-embeddings/mtg_embeddings.parquet') .filter( pl.col("type").str.contains("Sorcery"), pl.col("manaCost").str.contains("B"), ) .collect()
Polars is awesome to use, would highly recommend. Single node it is excellent at saturating CPUs, if you need to distribute the work put it in a Ray Actor with some POLARS_MAX_THREADS applied depending on how much it saturates a single node.
This item has no comments currently.
It looks like you have JavaScript disabled. This web app requires that JavaScript is enabled.
Please enable JavaScript to use this site (or just go read Hacker News).
Polars is awesome to use, would highly recommend. Single node it is excellent at saturating CPUs, if you need to distribute the work put it in a Ray Actor with some POLARS_MAX_THREADS applied depending on how much it saturates a single node.