- jtbaker parenthttps://camelai.com/hackernews/? Worked for me.
- I've seen (but not used) this tool recently, which seems like it does a similar thing. Curious if it is any better experience.
- I haven't/wouldn't use it because I have a decent K8S ollama/open-webui setup, but docker announced this a month ago: https://www.docker.com/blog/introducing-docker-model-runner
- I'm not sure if they are "apps" per se, but both are these are running python code via pyodide in the browser:
- Perfect use case for https://github.com/urchade/GLiNER
- Love Go (and async python, for different reasons) but miss me with the gRPC unless you are building hardened internal large enterprise systems. We adopted it at a late stage startup for a microservices architecture, and the pain is immense.
So many issues with type duplication due to weird footguns around the generated types. Lots of places where we needed to essentially duplicate a model due to the generated types not allowing us to modify or copy parts of a generated type's value and so forth.
- > The problem with Parquet is it’s static. Not good for use cases that involve continuous writes and updates. Although I have had good results with DuckDB and Parquet files in object storage. Fast load times.
You can use glob patterns in DuckDB to query remote parquets though to get around this? Maybe break things up using a hive partitioning scheme or similar.
- I thought about this some more and did some research - and found an indexing approach using HNSW, serialized to parquet, and queried from the browser here:
https://github.com/jasonjmcghee/portable-hnsw
Opens up efficient query patterns for larger datasets for RAG projects where you may not have the resources to run an expensive vector database
- I have tinkered with using DuckDB as a poor man's vector database for a POC and had great results.
One thing I'd love to see is being able to do some sort of row group level metadata statistics for embeddings within a parquet file - something that would allow various readers to push predicates down to an HTTP request metadata level and completely avoid loading in non-relevant rows to the database from a remote file - particularly one stored on S3 compatible storage that supports byte-range requests. I'm not sure what the implementation would look like to define sorting the algorithm to organize the "close" rows together, how the metadata would be calculated, or what the reader implementation would look like, but I'd love to be able to implement some of the same patterns with vector search as with geoparquet.
- Nice, thanks for the suggestion. I got it set up just before leaving town for a few days, so have been doing a little tinkering with it. I was hoping to have a setup with LM Studio, where my laptop could use the API Server from the mini over the TS network. Unfortunately doesn't seem to be the case, so I'll set up a configuration like you mentioned to just have a global client from any device on the netowrk.
It's very cool to be able to have access to such a high horsepower machine from anywhere though. Next step is figuring out the networking interface to be able access the host GPU/ollama API from pods running in a Colima VM/k3s cluster setup.
- Thanks for all your writing on these topics Simon! It has turned me from a bit of a naysayer to an optimist around this tooling, especially being able running stuff locally with access to tools. Have an M4 Pro mac mini arriving this week to have a similar self hosted setup over tailscale.