Preferences

jaffee
Joined 244 karma
[ my public key: https://keybase.io/jaffee; my proof: https://keybase.io/jaffee/sigs/m6DntPZSNkB2cmrKeaITaqkrHBbXJpgYP3QX_fQWS6I ] Twitter: @mattjaffee

  1. 1Password can be your 2fa and autofill those fields. It has a built in scanner which will look at your screen and read the QR code on the screen (no separate device needed).
  2. > text to servo movement

    yeah this was super impressive. If this is at the point where you can put an arbitrary object in front of it and ask it to move it somewhere, that's going to be huge for industrial automation type stuff I'd imagine.

    I do wonder how much of that demo was pre-baked/trained though. Could they repeat the same thing with a banana? What if the table was more cluttered? What if there were two people in the frame?

  3. > embedding vectors you've calculated from the code? If so, those are likely quite easily reversible

    I don't think embeddings are generally reversible... you're usually projecting onto a lower dimensional space, and therefore losing information.

  4. wait... but why did it work in the development environment?
  5. Well... sure. But OpenAI and MSFT have gone to a lot of trouble to build up the mystique around GPT-4 by being secretive about its architecture and publishing papers with tantalizing phrases like "sparks of AGI" and so on. I think this type of thing provides a useful counterbalance.
  6. This was my thought when we first wrote this back in... 2018 or whatever. The papers referenced sort of derive this technique in that way that feels rather roundabout. For the actual implementation we took the more direct approach... though I think we did switch from a twos complement to a sign/magnitude representation at one point which allows us to dynamically vary the bit depth used which can save some space and computation time.

    As far as the performance goes, in this system, we represent almost everything with compressed bitmaps, so there's some advantage to using them for integers and range queries as well as the output of a range query is very naturally a bitmap which can easily be combined with more typical categorical bitmaps when evaluating more complex queries.

  7. Depends what you want out of life and what career you're moving to... you can become pretty competent in a career in tech in just a few years and it will likely become lucrative pretty quickly. I feel that there will likely be increasing demand for programmers/engineers for decades to come. Check out Steve Yegge's youtube show for thoughts on this... https://www.youtube.com/watch?v=C8332hz8c2s&list=PLZfuUWMTtM...
  8. That's Sir Arthur C. Clarke to you!
  9. Andrew, see if anything here catches your eye... we've got a few openings. You can email me at my username at featurebase.com.

    https://www.featurebase.com/careers

  10. Generally speaking, the nice thing about bitmap indexes is that you're able to access the data in a very granular way. If you have a WHERE clause that's calling out specific values, you only access the data which is pertinent to those values within a column, you don't have to scan the whole column. This is simply due to the structure of a bitmap index where you have a separate bitmap for each value in the domain of a column.

    Furthermore, access patterns for bitmaps tend to be very linear and cache/prefetch friendly.

    I think it's very feasible that adding SIMD could result in a real-world speedup in an otherwise well-optimized in-memory system. I agree if you need to go to disk, that will likely dominate the overall performance of a single query, but it may still be overall more efficient which can still help in a multi-user situation.

  11. wait I'm thinking of fgprof.... this looks awesome too though
  12. this thing is awesome, have used it many times to quickly track down tricky performance issues.
  13. I wish there was a comparison to how dangerous the same commute is by car. I feel like you should also factor in the benefits of getting extra exercise twice a day for 20 years vs sitting still.

    Though maybe the alternative is walking, not driving...

  14. many! It was originally developed for marketing use cases- helping marketers understand up-to-date use her behavior and find interesting segments.

    But really it's useful anytime you need low latency analytics on fresh data.

  15. The full arrays do get expensive, although not too bad. I work at FeatureBase and we have a whole analytics DB built on a roaring variant... for perf reasons it's usually worth it to bias toward the bitmap representation when you get past about 2k set bits, though it does take a bit more space.
  16. Great question! We have actually done some experiments with this in the past and will likely be rolling out features like this on top of Pilosa as part of Molecula https://www.molecula.com/is-your-data-ai-ready/
  17. One does have to maintain some understanding of the how integer row and column ids are linked to what they actually represent.

    Sometimes this is a function which might map (for example) row 3 to the letter 'd', 4 to 'e', and so on. Sometimes it has to be a lookup table which can be kept within Pilosa, or externally. Sometimes the IDs map directly to what they represent (day-of-month, year, passenger count, etc.)

    So strictly speaking, not everything is a bitmap, but the bulk of the heavy lifting in terms of serving queries is computation on bitmaps.

  18. Bit-sliced indexing is the clever magic here. This post goes very deep on it https://www.pilosa.com/blog/range-encoded-bitmaps/

    But really, you use one bitmap for each binary bit of an integer, and it turns out you can generate arbitrary range queries on your dataset by doing various combinations of boolean operations on those bitmaps.

  19. You definitely can... the feature set keeps growing. We have multi-field filtered GROUP BY now. It's amazing to see how flexible Roaring Bitmaps can be!
  20. Pilosa is best used in conjunction with something like Kafka with (e.g.) separate consumers for Pilosa and a persistent data store.
  21. Good catch... that sounds pretty silly. It should probably read more like "converting relationships to be represented by single bits"

    As a concrete example, we took the NYC taxi ride data set which is something like 300GB of CSV files and when it was indexed in Pilosa, the total size of all the bitmap files was closer to 40GB.

  22. You can rant about an "attack on freedom" all you want, but how do you propose that Confluent protect its business so that they can continue paying their engineers to keep working on tools for which they publish all the source code?

    The alternatives seem to be 1. they keep all their stuff proprietary or 2. they leave it truly open and AWS takes the majority of their market and they slowly suffocate.

    Aren't both of those strictly worse than the path they've taken?

This user hasn’t submitted anything.

Keyboard Shortcuts

Story Lists

j
Next story
k
Previous story
Shift+j
Last story
Shift+k
First story
o Enter
Go to story URL
c
Go to comments
u
Go to author

Navigation

Shift+t
Go to top stories
Shift+n
Go to new stories
Shift+b
Go to best stories
Shift+a
Go to Ask HN
Shift+s
Go to Show HN

Miscellaneous

?
Show this modal