Preferences

bri3d
Joined 10,350 karma
https://github.com/bri3d

  1. Congratulations, you have invented AMD Instinct, except without the $100 part!
  2. They're not really "open-sourcing" anything in the sense that I would think about it. As far as I can tell they're doing two things:

    * Removing cloud-server dependency from the app.

    * Publishing API documentation for the speaker.

    I actually think this is worth noting not so much in a "well aktshully it's not open source!" kind of way, but as a good lesson for other manufacturers - because this is meaningfully good without needing to do any of the things manufacturers hate:

    * They didn't have to publish any Super Secret First or Third Party Proprietary IP.

    * They didn't have to release any signing keys or firmware tools.

    * They get to remove essentially all maintenance costs and relegate everything to a "community."

    But yet people are happy! Manufacturers should take note that they don't have to do much to make customers much happier with their products at end of life.

  3. Yes, you're right and that's what I discuss in the last paragraph of my comment. And, yes, E-cores are rough descendants of some processors that were sometimes called Atom. Using Intel marketing names is fraught with peril, though (see, Celeron) as "Atom" has referred to many conceptually different microarchitectures over time; the modern E-Core is of no relation to the original in-order "Atom" processor many remember.

    I've had the same experiences as you with Intel mixed-core desktop parts. They're incredibly difficult to optimize for due to the heterogeneous core mixture, whereas AMD mobile parts are generally more reasonable (you're on a slow core or a fast core, basically), and AMD never made a mixed-core desktop part.

    However, Intel server parts several years ago switched to E-core only or P-core only, so all of the heterogeneous core mixture issues aren't a thing - you basically have two separate processor generations being sold at once, which isn't particularly surprising or uncommon.

    With AMD server processor families (linked in my comment), depending on the part's density you get either "slow" or "fast" cores and either "wide" or "narrow" units, so you do still have to think about things a little bit there too.

    Where Intel really screwed up in general, microarchitecture differences aside, is AVX512. That's the wrench that prevents the same compiled code from running across most Intel parts - they just couldn't decide what they wanted to do with it, whereas AMD just chose to support it and stick with it, even though the throughput for the wide instructions is wildly different between processors.

  4. The 32 core / die AMD products are almost certainly Zen 6c, which is the same "idea" as Intel E-Cores albeit way less crappy.

    https://www.techpowerup.com/forums/threads/amd-zen-6-epyc-ve...

    EDIT: actually, now that I think about it some more, my characterization of Zen-C cores as the same "idea" as Intel E-cores was pretty unfair too; they do serve the same market idea but the implementation is so much less silly that it's a bit daft to compare them. Intel E-Cores have different IPC, different tuning characteristics, and different feature support (ie, they are usually a different uarch) which makes them really annoying to deal with. Zen C cores are usually the same cores with less cache and sometimes fewer or narrower ports depending on the specific configuration.

  5. > I suspect that specifically car / aircraft / spacecraft computers receive regular updates, and these updates change the smallest part they can.

    In the space I am very familiar with, automotive, this is not true for code changes to most automotive control units; the "application software" code for each control unit is treated as a single entity and built, supplied, and modified at this level of granularity. Infotainment and digital cockpit is the only major exception, but even then, only for the "unsafe" part (Linux/QNX/Windows); the "safe" part is usually a single-image single-application running on a safety processor alongside.

    Sometimes personalization/vehicle-specific "data sets" or calibration _data_ (ie ECU tunes) can be updated without updating the application software, but the application software for each unit is generally treated as a large unified firmware blob. For example in every ECU I am aware of, modifying the application software logic (which is usually modeled in something like like Simulink/ASCET, not code directly) triggers a full code regeneration, recompilation, and generates a complete new firmware image with an updated Application Software version. There isn't any notion of shipping a new "turbocharger control" code module, or a new "diagnostics" code module, or whatever, even if they are constructed at this granularity in the code generation suite or run at this task granularity in the RTOS.

  6. Do you have access to Google?

    VAMT proxy activation is airgapped in the exact same way the “old” telephone way was; VAMT acts as the server that you used to call on the phone. It trades one token for another. You side channel the tokens across to and from the airgapped machine.

  7. I don’t think these devices represent a demand in the same way at all. Secure boot firmware is another “demand” here that’s not really a demand.

    All of these things, generally speaking, run unified, trusted applications, so there is no need for dynamic address space protection mechanisms or “OS level” safety. These systems can easily ban dynamic allocation, statically precompute all input sizes, and given enough effort, can mostly be statically proven given the constrained input and output space.

    Or, to make this thesis more concise: I believe that OS and architecture level memory safety (object model addressing, CHERI, pointer tagging, etc.) is only necessary when the application space is not constrained. Once the application space is fully constrained you are better off fixing the application (SPARK is actually a great example in this direction).

    Mobile phones are the demand and where we see the research and development happening. They’re walled off enough to be able to throw away some backwards compatibility and cross-compatibility, but still demand the ability to run multiple applications which are not statically analyzed and are untrusted by default. And indeed, this is where we see object store style / address space unflattening mitigations like pointer tagging come into play.

  8. VAMT proxy activation, or full fledged volume licensing with KMS
  9. I mean, sure, but to what end does that madness lead? Who backs up the backups?

    Usually this is to allow different departments / divisions / customers (in the case of an OEM model) to all sign code or encrypt binaries, although this is likewise a bit off as each enrolled key increases the amount of material which is available to leak in the leak model. Or to allow model line differentiation with crossover.

  10. This is the same hardware as a PC, but TPM and UEFI “Secure Boot” happen way, way later in the boot process and aren’t present here; this is the hardware root of trust, in this case the AMD PSP boot firmware, which runs on an ARM system alongside the x86 cores. Intel’s version is called Boot Guard and runs on a combination of x86 sub-cores (TXE) and ME.
  11. In this case, by using fault injection to induce a glitch into a test mode which bypasses secure boot and loads code from SPI, combined with a SPI emulator (and I2C to send the boot vectors).

    https://m.youtube.com/watch?v=cVJZYT8kYsI

  12. I have seen some manufacturers enroll multiple manufacturer keys, probably with this notion, but this isn’t useful against almost any threat model.

    If keys are recovered using some form of low level hardware attack, as was almost surely the case here, the attacker can usually recover the unused key sets too.

    If the chip manufacturing provisioning supply chain is leaky the new keys will probably be disclosed anyway, and if the key custody chain is broken (ie, keys are shared with OEMs or third parties) they will definitely be disclosed anyway.

  13. You are both correct and the article discusses it accurately:

    > Then you have 2048 bytes of user data, scrambled for the reasons mentioned before. The best way to look at the sector as a whole is to think of each sector as 12 “rows” consisting of 172 bytes each. After each 172-byte row is 10 bytes of ECC data called Parity Inner (PI), which is based on Reed-Solomon and applied to both the header and scrambled user data per row within the sector itself. Then, after the user data and parity inner data, is the 4-byte EDC, which is calculated over the unscrambled user data only. Then, finally, Parity Outer (PO) is another form of ECC that is applied by “column” that spans over an entire block of multiple sectors stacked horizontally, or in other words, a group of 16 sectors. Altogether, this adds up to 2366 bytes of recorded sector data.

  14. At a practical level I think a thesis that "good" process isolation systems (aka, not hosted on Linux) build on years of development that unikernels will struggle to replace holds true.

    At a conceptual level I really disagree with this piece, though:

    > one cannot play up Linux kernel vulnerabilities as a silent menace while simultaneously dismissing hypervisor vulnerabilities as imaginary.

    One can reasonably recognize Linux kernel vulnerabilities as extant and pervasive while acknowledging that hypervisors can be vulnerable. One can also realize that the surface area exposed by Linux is fundamentally much larger than that exposed by most hypervisors, and that the Linux `unshare` mechanism is insecure by default. It's kind of funny - the invocation of Linux really undermines this argument; there's no _reason_ a process / container isolation based system should be completely broken, but Linux _is_, and so it becomes a very weak opponent.

    I really don't think I can agree with the debugging argument here at a conceptual level, either. Issues with debugging unikernels are caused by poor outside-in tooling, but with good outside-in tooling, a unikernel should be _easier_ to debug than a container or OS process, because the VM-owner / hypervisor will often already have a way to inspect the unikernel-machine's entire state from the outside, without additional struggle of trying to maintain, juggle, and restore multiple contexts within a running system. There is essentially an ISP/ICE debugging probe attached to the entire system end to end by default, in the form of the hypervisor.

    For example, there is no reason a hosting hypervisor could not provide DTrace in a way which is completely transparent to the unikernel guest, and this would be much easier to implement than DTrace self-hosted in a running kernel!

    If done properly, this way a uni-application basically becomes debugging-agnostic: it doesn't need cooperative tracepoints or self-modifying patches (and all of the state juggling that comes with that, think like Kprobe), because the hypervisor can do the tracing externally. The unikernel does not need to grow (in security surface area, debug-size, blast radius, etc.) to add more trace and debug capability.

  15. There are lots of examples on YouTube, this one seems succinct: https://youtube.com/shorts/3Eb315vL9uw . They picked good tones to make it satisfying IMO. I don’t know of anyone who’s reversed the bitstream in public, though, but it doesn’t seem like it should be very hard.
  16. LG appliances at least used to use acoustic signaling for diagnostics: hold a phone up and the washer makes some modem-esque (I think it’s 4-tone / 4-FSK) noises and the app or technician can diagnose issues. It was originally engineered to even work over voice codecs, so a customer without a smartphone could relay the diagnostic session to a technician.
  17. The reasons aren’t exactly unknown, considering that the sensor is diagonally oriented also?

    Processing these does seem like more fun though.

  18. It's the shape of the delivered artifact that's driven the way things are implemented in the ecosystem, not a really fundamental architecture difference.

    The shape of historically delivered ARM artifacts has been embedded devices. Embedded devices usually work once in one specific configuration. The shape of historically delivered ARM Linux products is a Thing that boots and runs. This only requires a kernel that works on one single device in one single configuration.

    The shape of historically delivered x86 artifacts is socketed processors that plug into a variety of motherboards with a variety of downstream hardware, and the shape of historically delivered x86 operating systems is floppies, CDs, or install media that is expected to work on any x86 machine.

    As ARM moves out of this historical system, things improve; I believe that for example you could run the same aarch64 Linux kernel on Pi 2B 1.2+, 3, and 4, with either UEFI/ACPI or just different DTBs for each device, because the drivers for these devices are mainline-quality and capable of discovering the environment in which they are running at runtime.

    People commonly point to ACPI+UEFI vs DeviceTree as causes for these differences, but I think this is wrong; these are symptoms, not causes, and are broadly Not The Problem. With properly constructed drivers you could load a different DTB for each device and achieve similar results as ACPI; it's just different formats (and different levels of complexity + dynamic behavior). In some ways ACPI is "superior" since it enables runtime dynamism (ie - power events or even keystrokes can trigger behavior changes) without driver knowledge, but in some ways it's worse since it's a complex bytecode system and usually full of weird bugs and edge cases, versus DTB where what you see is what you get.

  19. Yes, exactly this. I don’t feel that I misattributed anything, but if I had to expound on the idea this is exactly how I would explain it.
  20. It's an interesting debate. The flip side of this coin is getting hires who are more interested in the language or approach than the problem space and tend to either burn out, actively dislike the work at hand, or create problems that don't exist in order to use the language to solve them.

    With that said, Rust was a good language for this in my experience. Like any "interesting" thing, there was a moderate bit of language-nerd side quest thrown in, but overall, a good selection metric. I do think it's one of the best Rewrite it in X languages available today due to the availability of good developers with Rewrite in Rust project experience.

    The Haskell commentary is curious to me. I've used Haskell professionally but never tried to hire for it. With that said, the other FP-heavy languages that were popular ~2010-2015 were absolutely horrible for this in my experience. I generally subscribe to a vague notion that "skill in a more esoteric programming language will usually indicate a combination of ability to learn/plasticity and interest in the trade," however, using this concept, I had really bad experiences hiring both Scala and Clojure engineers; there was _way_ too much academic interest in language concepts and way too little practical interest in doing work. YMMV :)

  21. They do have a loophole; they import them as kits and “build” them at a Magna facility in Arizona (similar to how early Sprinter vans were re-assembled in the US and sold as Freightliners). But, they are FMVSS compliant (besides steering wheel) and have had several NHTSA organized recalls like any other compliant car might.
  22. Exactly. The only way this makes sense to me is if the board needed this product in <1 cycle. Which makes no sense for a market player like NV who already have the PDK, volume, and literally everything else in the universe. But here it is, so there is clearly a factor I have not considered :)
  23. The only thing I can think of here is that OpenAI’s DRAM land grab is going to stack on a non-NV target and NV need to hedge with an SRAM design that’s on the market NOW. Otherwise, I can’t see how NV couldn’t eat Groq’s lunch in one development cycle - it’s not like NV can’t attach a TPU to some SRAM and an interconnect. Either that or Groq closed a deep enough book to scare them, but 40x is a lot of scared.
  24. The bottleneck in training and inference isn’t matmul, and once a chip isn’t a kindergarten toy you don’t go from FPGA to tape out by clicking a button. For local memory he’s going to have to learn to either stack DRAM (not “3000 lines of verilog” and requires a supply chain which openai just destroyed) or diffuse block RAM / SRAM like Groq which is astronomically expensive bit for bit and torpedoes yields, compounding the issue. Then comes interconnect.
  25. Proxying from the "hot" domain (with user credentials) to a third party service is always going to be an awful idea. Why not just CNAME Mintlify to dev-docs.discord.com or something?

    This is also why an `app.` or even better `tenant.` subdomain is always a good idea; it limits the blast radius of mistakes like this.

  26. This isn’t likely to be a good indicator. Essentially only the network permission and any fingerprint is necessary for the tracking in this accusation; the idea is not that TikTok were spying on Grindr on the device, but that a device fingerprinting firm who broker both TikTok and Grindr data were able to correlate the user.
  27. > Nobody forces you to use a real Unix timestamp.

    Besides the UUIDv7 specification, that is? Otherwise you have some arbitrary kind of UUID.

    > I would not count on the first 48 bits being a "real" timestamp.

    I agree; this is the existential hazard under discussion which comes from encoding something that might or might not be data into an opaque identifier.

    I personally don't agree as dogmatically with the grandparent post that extraneous data should _not_ be incorporated into primary key identifiers, but I also disagree that "just use UUIDv7 and treat UUIDs as opaque" is a completely plausible solution either.

  28. > You're not going to try and extract a timestamp from a uuid.

    What? The first 48 bits of an UUID7 are a UNIX timestamp.

    Whether or not this is a meaningful problem or a benefit to any particular use of UUIDs requires thinking about it; in some cases it’s not to be taken lightly and in others it doesn’t matter at all.

    I see what you’re getting at, that ignoring the timestamp aspect makes them “just better UUIDs,” but this ignores security implications and the temptation to partition by high bits (timestamp).

  29. If you absolutely need it, use a separate uC / “trigger” chip for PD negotiation.
  30. Yeah, I've often thought about what I'd do instead and there's no legitimate alternative. It might help developers feel better if they had some kind of "friendly name" functionality (ie - if registrations in the Registry had a package-identifier style string alongside), but that also wouldn't have flown when COM was invented and resources overall were much more scarce than they are today.

This user hasn’t submitted anything.

Keyboard Shortcuts

Story Lists

j
Next story
k
Previous story
Shift+j
Last story
Shift+k
First story
o Enter
Go to story URL
c
Go to comments
u
Go to author

Navigation

Shift+t
Go to top stories
Shift+n
Go to new stories
Shift+b
Go to best stories
Shift+a
Go to Ask HN
Shift+s
Go to Show HN

Miscellaneous

?
Show this modal