Preferences

10000truths
Joined 2,902 karma

  1. The biggest risk to this business model isn't the government, but the payment processor. Anonymity makes it easy for unsavory characters to use stolen credit cards to buy your compute. The inevitable barrage of chargebacks will then cause Stripe to cut you off. Hell, if you're particularly unlucky, your payment processor might even cut you off proactively, if they decide that your lack of KYC makes you a risk.
  2. > In practice one of the things that happens very often is that you compress a file filled with null bytes. Such files compress extremely well, and would trigger your A/B threshold.

    Sure, if you expect to decompress files with high compression ratios, then you'll want to adjust your knobs accordingly.

    > On the other hand, zip bomb described in this blog post relies on decompressing the same data multiple times - so it wouldn't trigger your A/B heuristics necessarily.

    If you decompress the same data multiple times, then you increment A multiple times. The accounting still works regardless of whether the data is same or different. Perhaps a better description of A and B in my post would be {number of decompressed bytes written} and {number of compressed bytes read}, respectively.

    > Finally, A just means "you can't compress more than X bytes with my file format", right? Not a desirable property to have. If deflate authors had this idea when they designed the algorithm, I bet files larger than "unreasonable" 16MB would be forbidden.

    The limitation is imposed by the application, not by the codec itself. The application doing the decompression is supposed to process the input incrementally (in the case of DEFLATE, reading one block at a time and inflating it), updating A and B on each iteration, and aborting if a threshold is violated.

  3. For any compression algorithm in general, you keep track of A = {uncompressed bytes processed} and B = {compressed bytes processed} while decompressing, and bail out when either of the following occur:

    1. A exceeds some unreasonable threshold

    2. A/B exceeds some unreasonable threshold

  4. That's not a cycle - service B isn't writing any new data to A.
  5. FreeBSD and OpenBSD explicitly mention the prefaulting behavior in the mlock(2) manpage. The Linux manpage alludes to it in that you have to explicitly pass the MLOCK_ONFAULT flag to the mlock2() variant of the syscall in order to disable the prefaulting behavior.
  6. Recursion: That's easy, don't. At least, not with a call stack. Instead, use a stack container backed by a bounded allocator, and pop->process->push in a loop. What would have been a stack overflow is now an error.OutOfMemory enum that you can catch and handle as desired. All that said, there is a proposal that addresses making recursive functions more friendly to static analysis [0].

    Function pointers: Zig has a proposal for restricted function types [1], which can be used to enforce compile-time constraints on the functions that can be assigned to a function pointer.

    [0]: https://github.com/ziglang/zig/issues/1006 [1]: https://github.com/ziglang/zig/issues/23367

  7. Which is why I said "allocate and pin". POSIX systems have mlock()/mlockall() to prefault allocated memory and prevent it from being paged out.
  8. Sure, but you can do the next best thing, which is to control precisely when and where those allocations occur. Even if the possibility of crashing is unavoidable, there is still huge operational benefit in making it predictable.

    Simplest example is to allocate and pin all your resources on startup. If it crashes, it does so immediately and with a clear error message, so the solution is as straightforward as "pass bigger number to --memory flag" or "spec out larger machine".

  9. ...over the course of 8.5 months, which is way too short for a meaningful result. If their strategy could outperform the S&P 500's 10-year return, they wouldn't be blogging about it.
  10. The reason I really like Zig is because there's finally a language that makes it easy to gracefully handle memory exhaustion at the application level. No more praying that your program isn't unceremoniously killed just for asking for more memory - all allocations are assumed fallible and failures must be handled explicitly. Stack space is not treated like magic - the compiler can reason about its maximum size by examining the call graph, so you can pre-allocate stack space to ensure that stack overflows are guaranteed never to happen.

    This first-class representation of memory as a resource is a must for creating robust software in embedded environments, where it's vital to frontload all fallibility by allocating everything needed at start-up, and allow the application freedom to use whatever mechanism appropriate (backpressure, load shedding, etc) to handle excessive resource usage.

  11. Disclosing an individual student's information to third parties without express consent is a violation of FERPA laws.
  12. This is addressed in the "information asymmetry" section of the article.
  13. What do you believe needs improving and why?
  14. I think the ambiguity was deliberate.
  15. Indeed, at some point, you can't lower tail latencies any further without moving closer to your users. But of the 7 round trips that I mentioned above, you have control over 3 of them: 2 round trips can be eliminated by supporting HTTP/3 over QUIC (and adding HTTPS DNS records to your zone file), and 1 round trip can be eliminated by server-side rendering. That's a 40-50% reduction before you even need to consider a CDN setup, and depending on your business requirements, it may very well be enough.
  16. The table is a bit misleading. Most of the resources of a website are loaded concurrently and are not on the critical path of the "first contentful paint", so latency does not compound as quickly as the table implies. For web apps, much of the end-to-end latency hides lower in the networking stack. Here's the worst-case latency for a modern Chrome browser performing a cold load of an SPA website:

    DNS-over-HTTPS-over-QUIC resolution: 2 RTTs

    TCP handshake: 1 RTT

    TLS v1.2 handshake: 2 RTTs

    HTTP request/response (HTML): 1 RTT

    HTTP request/response (bundled JS that actually renders the content): 1 RTT

    That's 7 round trips. If your connection crosses a continent, that's easily a 1-2 second time-to-first-byte for the content you actually care about. And no amount of bandwidth will decrease that, since the bottlenecks are the speed of light and router hop latencies. Weak 4G/WiFi signal and/or network congestion will worsen that latency even further.

  17. My rule of thumb is that unchecked access is okay in scenarios where both the array/map and the indices/keys are private implementation details of a function or struct, since an invariant is easy to manually verify when it is tightly scoped as such. I've seen it used it in:

    * Graph/tree traversal functions that take a visitor function as a parameter

    * Binary search on sorted arrays

    * Binary heap operations

    * Probing buckets in open-addressed hash tables

  18. Yes? Funnily enough, I don't often use indexed access in Rust. Either I'm looping over elements of a data structure (in which case I use iterators), or I'm using an untrusted index value (in which case I explicitly handle the error case). In the rare case where I'm using an index value that I can guarantee is never invalid (e.g. graph traversal where the indices are never exposed outside the scope of the traversal), then I create a safe wrapper around the unsafe access and document the invariant.
  19. I realize that this is meant as an exercise to demonstrate a property of variance. But most investors are risk-averse when it comes to their portfolio - for the example given, a more practical target to optimize would be worst-case or near-worst-case return (e.g. p99). For calculating that, a summary measure like variance or mean does not suffice - you need the full distribution of the RoR of assets A and B, and find the value of t that optimizes the p99 of At+B(1-t).
  20. They are absolutely aware of these sorts of abuses. I'll bet my spleen that it shows up as a line item in the roadmapping docs of their content integrity/T&S teams.

    The root problem is twofold: the inability to reliably automate distinguishing "good actor" and "bad actor", and a lack of will to throw serious resources at solving the problem via manual, high precision moderation.

This user hasn’t submitted anything.

Keyboard Shortcuts

Story Lists

j
Next story
k
Previous story
Shift+j
Last story
Shift+k
First story
o Enter
Go to story URL
c
Go to comments
u
Go to author

Navigation

Shift+t
Go to top stories
Shift+n
Go to new stories
Shift+b
Go to best stories
Shift+a
Go to Ask HN
Shift+s
Go to Show HN

Miscellaneous

?
Show this modal