Preferences

menaerus
Joined 545 karma

  1. I actually wish the audience to take the opposite, or perhaps a more balanced view. Being pragmatic is like taking an extreme view and as much as this article is a great resource, and contains some legit advice otherwise difficult to find elsewhere in such a concise form, folks need to be aware that this advice is what Google found for their unfathomable scale codebase to gain some real world benefits.

    The things this article is describing are more nuanced than just "think about the performance sooner than latter". I say this as someone who does these kind of optimizations for a living and all too often I see teams wasting time trying to micro-optimize codepaths which by the end of the day do not provide any real demonstrable value.

    And this is a real trap you can get into really easily if you read this article as a general wisdom, which is not.

  2. I think because of capacity. This race is mainly driven with the AI power demand estimated to increase 10x in the next 5 years. Currently it's 5GW and by 2030 it is expected to be 50GW.
  3. > If somebody is in the EU already that calculation completely flips.

    Would you find it compelling to move your whole life for ~100k EUR when you can make as much or more at your home city, with a job that is almost certainly more stable?

    And I meant the Europeans. People in EU don't have a culture of moving between cities or countries unless they really have a strong reason to, e.g. can't find a job at home.

    > would it really be that surprising if there was more unallocated talent in the EU, at this point?

    I am pretty sure there is. It has changed over the course of last few years, primarily because of COVID, and companies willing to offer remote contracts, but it's far from being able to utilize the talent.

  4. They can't hire the best talent because the most experienced people will not leave their homes to chase a high-risk role with questionable remuneration by relocating their whole life to Paris or London.

    This goes to show how leaders in Mistral don't quite get that they are not special as they seem to think they are. Anthropic or OpenAI also require their talent to relocate but with stakes that are at least a high reward - $500k or $1M a year is a good start that is maybe worth investing into.

  5. Sure, now imagine answering 10 different people to all of their questions? It's the largest hindrance I have ever seen but I agree with the above comment that it largely depends on the team.
  6. > in the most general case the encapsulation and local reasoning here is between modules that use unsafe and everything else

    This would be the same narrative as in, let's say, C++. Wrap the difficult and low-level memory juggling stuff into "modules", harden the API, return the references and/or smart-pointers, and then just deal with the rest of the code with ease, right? Theoretically possible but practically impossible.

    First reason is that abstractions get really leaky, and they especially get really leaky in the code that demands the upmost performance. Anyone who implemented their own domain/workload specific hash-map or mutex or anything similarly foundational will understand this sentiment. Anyway, if we just have a look into the NVMe driver above, there're no "unsafe modules".

    Second, and as I already argued, UB in the module library transcends into the rest of your code so I fail to understand how is it so that the dozens of unsafe sections make the reasoning or debugging any more simpler when reasoning is actually not a function of number of unsafe sections but it is the function of interactions between different parts of the code that end up touching the memory in the unsafe block in a way that it was not anticipated. This is almost always the case when dealing with undefined behavior.

    > I don't get that sense from the statement at all.

    It is a bit exaggerated example of mine but I do - their framing suggests ~exactly that and which is simply not true.

  7. No, this is just an example of confirmation bias. You're given a totally unrealistic figure of 1 vuln per 200K/5M LoC and now you're hypothesizing why that could be so. Google, for anyone unbiased, lost the credibility when they put this figure into the report. I wonder what was their incentive for doing so.

    > But unsafe block are, by definition, limited in scope and assuming you design your codebase properly, they shouldn't interact with other unsafe blocks in a different module. So the complexity related to one unsafe block is in fact contained to his own module, and doesn't spread outside. And that makes everything much more tractable since you never have to reason about the whole codebase, but only about a limited scope everytime.

    For anyone who has written low-level code with substantial complexity knows that this is just a wishful thinking. In such code, abstractions fall-apart and "So the complexity related to one unsafe block is in fact contained to his own module, and doesn't spread outside" is just wrong as I explained in my other comment here - UB taking place in unsafe section will transcend into the rest of the "safe" code - UB is not "caught" or put into the quarantine with some imaginative safety net at the boundary between the safe and unsafe sections.

  8. Yes, but my point is when things blow up how exactly do you know which unsafe block you should look into? From their statement it appears as if there's such a simple correlation between "here's your segfault" and "here's your unsafe block that caused it", and which I believe there isn't, and which is why I said there's no encapsulation, local reasoning etc.
  9. 1 per 5M or 1 per 200K is pretty much unbelievable, especially in such a complex codebase, so all I can say then is to each their own.
  10. They also say

      The practice of encapsulation enables local reasoning about safety invariants.
    
    which is not fully correct. Undefined behavior in unsafe blocks can and will leak into the safe Rust code so there is nothing there about the "local reasoning" or "encapsulation" or "safety invariants".

    This whole blog always read to me as too much like a marketing material disguised with some data so that it is not so obvious. IMHO

  11. They say "With roughly 5 million lines of Rust in the Android platform and one potential memory safety vulnerability found (and fixed pre-release), our estimated vulnerability density for Rust is 0.2 vuln per 1 million lines (MLOC).".

    Do you honestly believe that there is 1 vulnerability per 5 MLoC?

  12. What am I exactly doing again? I am providing my reasoning, sorry if that itches you the wrong way. I guess you don't have to agree but let me express my view, ok? My view is not extremist or polarized as you see. I see the benefit of Rust but I say the benefit is not what Internet cargo-cult programming suggests. There's always a price to be paid, and in case of kernel development I think it outweighs the positive sides.

    If I spend 90% of time debugging freaking difficult to debug issues, and Rust solves the other 10% for me, then I don't see it as a good bargain. I need to learn a completely new language, surround myself with a team which is also not hesitant to learn it, and all that under assumption that it won't make some other aspects of development worse. And for surely it will.

  13. No, I am not saying keep the status quo. I am simply challenging the idea that kernel will enjoy benefits that is supposed to be provided by Rust.

    Distribution of bugs across the whole codebase is not following the normal distribution but multimodal. Now, imagine where the highest concentration of bugs will be. And how many bugs there will be elsewhere. Easy to guess.

  14. And

      It is now the only software in the world still written in C89.
    
    Hilarious.
  15. So, unsafe block every 70 LoC in 1500 LoC toy example? Sure, it's a strong argument.
  16. > let's start by prefacing that 'production quality' C is 100% unsafe in Rust terms.

    I don't know what one should even make from that statement.

    > here's where we fundamentally disagree: you listed a couple dozen unsafe places in 1.5kLOC of code; let's be generous and say that's 10%

    It's more than 10%, you didn't even bother to look at the code but still presented it, what in reality is a toy driver example, as something credible (?) to support your argument of me spreading FUD. Kinda silly.

    Even if it was only that much (10%), the fact it is in the most crucial part of the code makes the argument around Rust safety moot. I am sure you heard of 90/10 rule.

    The time will tell but I am not holding my breath. I think this is a bad thing for Linux kernel development.

  17. Sorry but what have I said wrong? The nature of code written in kernel development is such that using unsafe is inevitable. Low-level code with memory juggling and patterns that you usually don't find in application code.

    And yes, I have had a look into the examples - maybe one or two years there was a significant patch submitted to the kernel and number of unsafe sections made me realize at that moment that Rust, in terms of kernel development, might not be what it is advertised for.

    > https://git.kernel.org/pub/scm/linux/kernel/git/a.hindborg/l..

    Right? Thank you for the example. Let's first start by saying the obvious - this is not an upstream driver but a fork and it is also considered by its author to be a PoC at best. You can see this acknowledged by its very web page, https://rust-for-linux.com/nvme-driver, by saying "The driver is not currently suitable for general use.". So, I am not sure what point did you try to make by giving something that is not even a production quality code?

    Now let's move to the analysis of the code. The whole code, without crates, counts only 1500 LoC (?). Quite small but ok. Let's see the unsafe sections:

    rnvme.rs - 8x unsafe sections, 1x SyncUnsafeCell used for NvmeRequest::cmd (why?)

    nvme_mq/nvme_prp.rs - 1x unsafe section

    nvme_queue.rs - 6x unsafe not sections but complete traits

    nvme_mq.rs - 5x unsafe sections, 2x SyncUnsafeCell used, one for IoQueueOperations::cmd second for AdminQueueOperations::cmd

    In total, this is 23x unsafe sections/traits over 1500LoC, for a driver that is not even a production quality driver. I don't have time but I wonder how large this number would become if all crates this driver is using were pulled in into the analysis too.

    Sorry, I am not buying that argument.

  18. Similarly in regular SQL systems, the same is achieved by fsyncing to WAL.
  19. Technically it is not because eventually it will be mutated, and that's one way of achieving the scalability in multiple writers scenario.

This user hasn’t submitted anything.