Profile: jaffee - Hacker Neue

jaffee

Joined Jan 26, 2014 244 karma

[ my public key: https://keybase.io/jaffee; my proof: https://keybase.io/jaffee/sigs/m6DntPZSNkB2cmrKeaITaqkrHBbXJpgYP3QX_fQWS6I ] Twitter: @mattjaffee

jaffee Jun 12, 2025 parent

1Password can be your 2fa and autofill those fields. It has a built in scanner which will look at your screen and read the QR code on the screen (no separate device needed).
jaffee Mar 13, 2024 parent

> text to servo movement
yeah this was super impressive. If this is at the point where you can put an arbitrary object in front of it and ask it to move it somewhere, that's going to be huge for industrial automation type stuff I'd imagine.
I do wonder how much of that demo was pre-baked/trained though. Could they repeat the same thing with a banana? What if the table was more cluttered? What if there were two people in the frame?
jaffee Mar 5, 2024 parent

> embedding vectors you've calculated from the code? If so, those are likely quite easily reversible
I don't think embeddings are generally reversible... you're usually projecting onto a lower dimensional space, and therefore losing information.
jaffee Nov 3, 2023 parent

wait... but why did it work in the development environment?
jaffee Aug 8, 2023 parent

Well... sure. But OpenAI and MSFT have gone to a lot of trouble to build up the mystique around GPT-4 by being secretive about its architecture and publishing papers with tantalizing phrases like "sparks of AGI" and so on. I think this type of thing provides a useful counterbalance.
jaffee Dec 10, 2022 parent

This was my thought when we first wrote this back in... 2018 or whatever. The papers referenced sort of derive this technique in that way that feels rather roundabout. For the actual implementation we took the more direct approach... though I think we did switch from a twos complement to a sign/magnitude representation at one point which allows us to dynamically vary the bit depth used which can save some space and computation time.
As far as the performance goes, in this system, we represent almost everything with compressed bitmaps, so there's some advantage to using them for integers and range queries as well as the output of a range query is very naturally a bitmap which can easily be combined with more typical categorical bitmaps when evaluating more complex queries.
jaffee Oct 7, 2022 parent

Depends what you want out of life and what career you're moving to... you can become pretty competent in a career in tech in just a few years and it will likely become lucrative pretty quickly. I feel that there will likely be increasing demand for programmers/engineers for decades to come. Check out Steve Yegge's youtube show for thoughts on this... https://www.youtube.com/watch?v=C8332hz8c2s&list=PLZfuUWMTtM...
jaffee Oct 6, 2022 parent

That's Sir Arthur C. Clarke to you!
jaffee Sep 27, 2022 parent

Andrew, see if anything here catches your eye... we've got a few openings. You can email me at my username at featurebase.com.
https://www.featurebase.com/careers
jaffee Sep 22, 2022 parent

Generally speaking, the nice thing about bitmap indexes is that you're able to access the data in a very granular way. If you have a WHERE clause that's calling out specific values, you only access the data which is pertinent to those values within a column, you don't have to scan the whole column. This is simply due to the structure of a bitmap index where you have a separate bitmap for each value in the domain of a column.
Furthermore, access patterns for bitmaps tend to be very linear and cache/prefetch friendly.
I think it's very feasible that adding SIMD could result in a real-world speedup in an otherwise well-optimized in-memory system. I agree if you need to go to disk, that will likely dominate the overall performance of a single query, but it may still be overall more efficient which can still help in a multi-user situation.
jaffee Sep 19, 2022 parent

wait I'm thinking of fgprof.... this looks awesome too though
jaffee Sep 19, 2022 parent

this thing is awesome, have used it many times to quickly track down tricky performance issues.
jaffee Sep 14, 2022 parent

I wish there was a comparison to how dangerous the same commute is by car. I feel like you should also factor in the benefits of getting extra exercise twice a day for 20 years vs sitting still.
Though maybe the alternative is walking, not driving...
jaffee Sep 14, 2022 parent

many! It was originally developed for marketing use cases- helping marketers understand up-to-date use her behavior and find interesting segments.
But really it's useful anytime you need low latency analytics on fresh data.
jaffee Sep 4, 2022 parent

The full arrays do get expensive, although not too bad. I work at FeatureBase and we have a whole analytics DB built on a roaring variant... for perf reasons it's usually worth it to bias toward the bitmap representation when you get past about 2k set bits, though it does take a bit more space.
1 point Sep 30, 2019

Summer of Reading – short book reviews

0 comments jaffee thegreenplace.net
3 points Jul 3, 2019

Go compiler internals: adding a new statement to Go

0 comments jaffee thegreenplace.net
jaffee Jun 3, 2019 parent

Great question! We have actually done some experiments with this in the past and will likely be rolling out features like this on top of Pilosa as part of Molecula https://www.molecula.com/is-your-data-ai-ready/
jaffee Jun 1, 2019 parent

One does have to maintain some understanding of the how integer row and column ids are linked to what they actually represent.
Sometimes this is a function which might map (for example) row 3 to the letter 'd', 4 to 'e', and so on. Sometimes it has to be a lookup table which can be kept within Pilosa, or externally. Sometimes the IDs map directly to what they represent (day-of-month, year, passenger count, etc.)
So strictly speaking, not everything is a bitmap, but the bulk of the heavy lifting in terms of serving queries is computation on bitmaps.
jaffee May 31, 2019 parent

Bit-sliced indexing is the clever magic here. This post goes very deep on it https://www.pilosa.com/blog/range-encoded-bitmaps/
But really, you use one bitmap for each binary bit of an integer, and it turns out you can generate arbitrary range queries on your dataset by doing various combinations of boolean operations on those bitmaps.
jaffee May 31, 2019 parent

You definitely can... the feature set keeps growing. We have multi-field filtered GROUP BY now. It's amazing to see how flexible Roaring Bitmaps can be!
jaffee May 31, 2019 parent

Pilosa is best used in conjunction with something like Kafka with (e.g.) separate consumers for Pilosa and a persistent data store.
jaffee May 31, 2019 parent

Good catch... that sounds pretty silly. It should probably read more like "converting relationships to be represented by single bits"
As a concrete example, we took the NYC taxi ride data set which is something like 300GB of CSV files and when it was indexed in Pilosa, the total size of all the bitmap files was closer to 40GB.
jaffee May 31, 2019 parent

Source code:
https://github.com/pilosa/pilosa
3 points Feb 13, 2019

I Hope You Like Charts

0 comments jaffee pilosa.com
11 points Jan 17, 2019

Oracle, AWS, and Azure Benchmarking Shootout

0 comments jaffee pilosa.com
jaffee Dec 14, 2018 parent

You can rant about an "attack on freedom" all you want, but how do you propose that Confluent protect its business so that they can continue paying their engineers to keep working on tools for which they publish all the source code?
The alternatives seem to be 1. they keep all their stuff proprietary or 2. they leave it truly open and AWS takes the majority of their market and they slowly suffocate.
Aren't both of those strictly worse than the path they've taken?
2 points Sep 13, 2018

Go and Algebraic Data Types

0 comments jaffee thegreenplace.net
2 points Aug 24, 2018

Derivability, Redundancy and Consistency of Relations Stored in Large Data Banks [pdf]

0 comments jaffee amis.nl

This user hasn’t submitted anything.

Preferences

Keyboard Shortcuts

Story Lists

Navigation

Miscellaneous