Preferences

rahimnathwani
Joined 15,144 karma
I've lived in San Francisco since 2019.

2010-2019 I lived in China.

Before that I lived in London, where I was born.

Some random stuff:

- English is my native language, but I also speak (to varying degrees of fluency) Mandarin, Gujarati and Spanish.

- My career so far has been a mix of startups/cofounding and big tech (I was a PM at Amazon and at Google).

- I'm a qualified accountant (CIMA), have two undergrad degrees and an MBA.

- I'm better at writing software than an average PM, and better at product management than an average software engineer :)

- You should use Anki

Email: rahim AT encona DOT com

LinkedIn: http://www.linkedin.com/in/rahimnathwani


  1. No it's not.

    With a cluster of two 512GB nodes, you have to send half the weights (350GB) over a TB5 connection. But you have to do this exactly once on startup.

    With a single 512GB node, you'll be loading weights from disk each time you need a different expert, potentially for each token. Depending on how many experts you're loading, you might be loading 2GB to 20GB from disk each time.

    Unless you're going to shut down your computer after generating a couple of hundred tokens, the cluster wins.

  2. Even with MoE you still need enough memory to load all experts. For each token, only 8 experts (out of 256) are activated, but which experts are chosen changes dynamically based on the input. This means you'll be constantly loading and unloading experts from disk.

    MoEs is great for distributed deployments, because you can maintain a distribution of experts that matches your workload, and you can try to saturate each expert and thereby saturate each node.

  3. The largest nodes in his cluster each have 512GB RAM. DeepSeek V3.1 is a 671B parameter model whose weights take up 700GB RAM: https://huggingface.co/deepseek-ai/DeepSeek-V3.1

    I would have expected that going from one node (which can't hold the weights in RAM) to two nodes would have increased inference speed by more than the measured 32% (21.1t/s -> 27.8t/s).

    With no constraint on RAM (4 nodes) the inference speed is less than 50% faster than with only 512GB.

    Am I missing something?

  4.   Call me when you're arrested or fined for buying/selling any book in US.
    
    Are you offering pro bono representation?
  5. When you share an app you created in Google AI Studio, it will use quota from the logged in user, instead of your own quota.
  6. After he described the rules, my immediate reaction was 'this is like mastermind'. Sure enough, further down the page:

      Other than that, in my research I came across a boardgame called Mastermind, which has been around since the 70s. This is a very similar premise - think of it as "Guess Who?" on hard mode.
  7. A couple of weeks ago, I bought a 'sensor kit' from Amazon for my son to use with his Raspberry Pi. It includes some input devices (e.g. button, moisture sensor) and output devices (e.g. LED) that can be plugged onto breadboard.

    The setup instructions included something to do with CircuitPython. I had not heard of it before then: https://github.com/sunfounder/universal-maker-sensor-kit/blo...

  8. This is cool.

    When you change the sliders and click 'Create', the preview on the right doesn't change.

  9.   Does one stay locked in place? Unclear.
    
    If you set C1=A1+B1 then, when you set a value for C1, A1 and B1 are each half of that value, even if they started off unbalanced.
  10. It's super-annoying that the article begins with a photo of a Kindle e-reader, and it's only once you read the last sentence that you find:

      "Ask this Book is currently only available in the Kindle iOS app in the US, but Amazon says it “will come to Kindle devices and Android OS next year."
  11. Regarding the last point, you can do better than sleep (lower power state). You can have the microcontroller cut its own power once it's done its work:

    https://randomnerdtutorials.com/latching-power-switch-circui...

  12. There's a web site where different people share what they think of each course, and how many hours they devote per week: https://www.omscentral.com/

    That might help you decide whether it's doable.

    My first (and only) course was somewhere in the middle in terms of effort, and the courses I was most interested would have required another 50% on top, which wasn't going to work for me, between work, parenting, other learning etc.

  13. Oh yeah I forgot to mention the class discussion board.

    I wasn't in any discord groups but the class discussion forum was a nice community.

  14. Sorry.

    I commented before realizing that someone else already made the same point earlier, and you already explained that the law covers more than what you mentioned in your first comment.

  15. OMSCS requires ten courses to graduate. I completed one course (with an A grade) before realizing that, even at a pace of one course per semester, it was not a high enough priority for me to devote the time required to do each course well.

    That course was great, though, and I definitely learned some things I'm glad to have learned!

    IMO the instructional materials are a small part of the value. The things that stood out to me were:

    - the assignments

    - the autograding of programming assignments

    - giving and receiving peer feedback about written assignments

    - learning some LaTeX for those assignments

    - having an artificial reason (course grade) to persist in improving my algorithm and code [on the problems taught in that course, I wouldn't have been self-motivated enough if they were just things I came across during a random weekend]

  16.   at the very least in my state (Illinois), it's not lawful for public bodies to disclose the license plate numbers read from ALPR cameras, so this data set is necessarily incomplete
    
    It's not a dataset of license plate numbers read from ALPR cameras. It's a dataset of license plate numbers that have been entered into search tools.

      Enter a license plate to see if it's one of the 2,207,426 plates seen in the 27,177,268 Flock searches we know about.
  17. Wow that's great experience.

    My son is 9yo and loves to make little animations in Scratch. He recently started to learn a bit of Python (just the syntax so far, no projects).

    I wonder whether you can share anything about your journey, especially if you have any tips for the stage my son is at.

  18. Many developers do this, and it's explicitly allowed under Apple's Developer Agreement (section 3.3.1).

      Interpreted code may be downloaded to an Application but only so long as such code: (a) does not change the primary purpose of the Application by providing features or functionality that are inconsistent with the intended and advertised purpose of the Application (b) does not bypass signing, sandbox, or other security features of the OS; and (c) for Applications distributed on the App Store, does not create a store or storefront for other Applications.
    
    The app store review guidelines (section 2.5.1) seem more narrow, but I think the above is what's enforced.

This user hasn’t submitted anything.