Preferences

vitus
Joined 2,528 karma

  1. std::ignore's behavior outside of use with std::tie is not specified in any finalized standard.

    https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p29... aims to address that, but that won't be included until C++26 (which also includes _ as a sibling commenter mentions).

  2. > This is incorrect, the correct strategy is mostly to check the most probable match (the exception being if the people in that match has less possible pairings remaining than the next most probable match).

    Do you have any hard evidence, or just basing this on vibes? Because your proposed strategy is emphatically not how you maximize information gain.

    Scaling up the problem to larger sizes, is it worth explicitly spending an action to confirm a match that has 99% probability? Is it worth it to (most likely) eliminate 1% of the space of outcomes (by probability)? Or would you rather halve your space?

    This isn't purely hypothetical, either. The match-ups skew your probabilities such that your individual outcomes cease to be equally probable, so just looking at raw cardinalities is insufficient.

    If you have a single match out of 10 pairings, and you've ruled out 8 of them directly, then if you target one of the two remaining pairs, you nominally have a 50/50 chance of getting a match (or no match!).

    Meanwhile, you could have another match-up where you got 6 out of 10 pairings, and you've ruled out 2 of them (thus you have 8 remaining pairs to check, 6 of which are definitely matches). Do you spend your truth booth on the 50/50 shot (which actually will always reveal a match), or the 75/25 shot?

    (I can construct examples where you have a 50/50 shot but without the guarantee on whether you reveal a match. Your information gain will still be the same.)

  3. So, for 10 pairs, 45 guesses (9 + 8 + 7 + 6 + 5 + 4 + 3 + 2 + 1) in the worst case, and roughly half that on average?

    It's interesting how close 22.5 is to the 21.8 bits of entropy for 10!, and that has me wondering how often you would win if you followed this strategy with 18 truth booths followed by one match up (to maintain the same total number of queries).

    Simulation suggests about 24% chance of winning with that strategy, with 100k samples. (I simplified each run to "shuffle [0..n), find index of 0".)

  4. It should be easier to understand the optimal truth booth strategy. Since this is a yes/no type of question, the maximum entropy is 1 bit, as noted by yourself and others. As such, you want to pick a pair where the odds are as close to 50/50 as possible.

    > Employing that approach alone performed worse than the contestants did in real life, so didn't think it was worth mentioning!

    Yeah, this alone should not be sufficient. At the extreme of getting a score of 0, you also need the constraint that you're not repeating known-bad pairs. The same applies for pairs ruled out (or in!) from truth booths.

    Further, if your score goes down, you need to use that as a signal that one (or more) of the pairs you swapped out was actually correct, and you need to cycle those back in.

    I don't know what a human approximation of the entropy-minimization approach looks like in full. Good luck!

  5. It's way more lopsided than your example would suggest.

    My understanding is that Netflix can stream 100 Gbps from a 100W server footprint (slide 17 of [0]). Even if you assume every stream is 4k and uses 25 Mbps, that's still thousands of streams. I would guess that the bulk of the power consumption from streaming video is probably from the end-user devices -- a backbone router might consume a couple of kilowatts of power, but it's also moving terabits of traffic.

    [0] https://people.freebsd.org/~gallatin/talks/OpenFest2023.pdf

  6. To add to this: rough consensus is defined in BCP 25 / RFC 2418 (https://datatracker.ietf.org/doc/html/rfc2418#section-3.3):

       IETF consensus does not require that all participants agree although
       this is, of course, preferred.  In general, the dominant view of the
       working group shall prevail.  (However, it must be noted that
       "dominance" is not to be determined on the basis of volume or
       persistence, but rather a more general sense of agreement.) Consensus
       can be determined by a show of hands, humming, or any other means on
       which the WG agrees (by rough consensus, of course).  Note that 51%
       of the working group does not qualify as "rough consensus" and 99% is
       better than rough.  It is up to the Chair to determine if rough
       consensus has been reached.
    
    The goal has never been 100%, but it is not enough to merely have a majority opinion.
  7. I get that it's satisfying to tell them to go away because they're being unreasonable. But what's the legal strategy here? Piss off the regulators such that they really won't drop this case, and give them fodder to be able to paint the lawyer and his client as uncooperative?

    Is the strategy really just "get new federal laws passed so UK can't shove these regulations down our throats"? Is that going to happen on a timeline that makes sense for this specific case?

  8. The combative stance that he's taking really doesn't do him any favors in resolving the issue.

    Lawyer: "I've confirmed that at least one UK IP address is blocked."

    Regulators: "We've confirmed that at least one UK IP address is not blocked."

    In what world is the correct response "Dear regulators, you're incompetent. Pound sand." instead of "Can you share the IP address you used so my client can address this in their geoblock?"

  9. China is much more smartphone-centric than the US. QR codes are universal, WeChat and AliPay are the most common form of payments (online or in person).
  10. Not by $$$, which is the main focus of the article.

    In the second table, LoL esports is explicitly highlighted as a success by mindshare, but not profitability. And below that:

    > LoL Esports: loses hundreds of millions of dollars annually, exists solely as a marketing mechanism to get people to play the actual game

  11. > "valid, but unspecified"

    Annoyingly, it depends on the type, sometimes with unintuitive consequences.

    Move a unique_ptr? Guaranteed that the moved-from object is now null (fine). Move a std::optional? It remains engaged, but the wrapped object is moved-from (weird).

    Move a vector? Unspecified.

  12. I envy your intuition about high-dimensional spaces, as I have none (other than "here lies dragons"). (I think your intuition is broadly correct, seeing as billions of collision tests feels quite inadequate given the size of the space.)

    > Just intuitively, in such a high dimensional space, two random vectors are basically orthogonal.

    What's the intuition here? Law of large numbers?

    And how is orthogonality related to distance? Expansion of |a-b|^2 = |a|^2 + |b|^2 - 2<a,b> = 2 - 2<a,b> which is roughly 2 if the unit vectors are basically orthogonal?

    > Since the outputs are normalized, that corresponds to a ridiculously tiny patch on the surface of the unit sphere. Since the outputs are normalized, that corresponds to a ridiculously tiny patch on the surface of the unit sphere.

    I also have no intuition regarding the surface of the unit sphere in high-dimensional vector spaces. I believe it vanishes. I suppose this patch also vanishes in terms of area. But what's the relative rate of those terms going to zero?

  13. > Are you saying I am being unnecessarily cautious?

    Yes.

    If a game is marked with Linux, that means it has a native Linux port. However, Proton has gotten so good in recent years that some of the native Linux ports actually perform _worse_ than just downloading the Windows exe and running it with the compatibility layer.

    The investment in Proton makes sense in retrospect, since SteamOS is based on Arch Linux, and most of these games you mention should run just fine on a Steam Deck.

  14. Canada is the second-largest country in the world, so if you know that it's in the north and, um, not Russia, you stand a pretty good chance at picking it out. (Doubly so if you just know it's in the Americas.)

    Now, if you asked the same about Pakistan or Nigeria (#5 and #6 in terms of population, but far smaller and with far shorter sea borders), I'd bet that far fewer people would be able to pinpoint those with the same accuracy (whether in the English-speaking world or not).

  15. And if you look at the people behind those names, you'll find that most of them were either born in the Russian Empire or moved there at a young age.

    https://en.wikipedia.org/wiki/Michael_Andreas_Barclay_de_Tol... - moved to St Petersburg around the age of 3

    https://en.wikipedia.org/wiki/Pyotr_Bagration - born in Russia

    https://en.wikipedia.org/wiki/Peter_Wittgenstein - born near Kyiv

    https://en.wikipedia.org/wiki/Alexander_Ivanovich_Ostermann-... - part of Russian nobility; the Ostermann name came from his great-uncles.

  16. For various reasons, I believe the CFR defines flour as a food, not a food additive, but it may have additives that qualify. (I can't find any text that explicitly states that foods are not food additives, but either way, no form of wheat flour is listed in the Substances Added to Food database.)

    Common culprits include chemicals added during the bleaching process and addition of "enzyme" / other ingredients that help improve baking consistency. Some examples:

    https://www.hfpappexternal.fda.gov/scripts/fdcc/index.cfm?se...

    https://www.hfpappexternal.fda.gov/scripts/fdcc/index.cfm?se...

    https://www.hfpappexternal.fda.gov/scripts/fdcc/index.cfm?se...

  17. > The bill itself calls out using USDA databases for various ingredients and various sections of federal regulations, so I can't comment too much about how they'd feel about xanthum gum without diving deep.

    For reference: xanthan gum specifically would fall afoul of the rules, as... a (ii) stabilizer or thickener, (iv) coloring or coloring adjunct, and (v) emulsifier.

    https://www.hfpappexternal.fda.gov/scripts/fdcc/index.cfm?se...

    It's quite silly that it's classified as a coloring agent and an emulsifier, when it's neither of those things.

  18. > I may use a third party std::function that doesn't have the weird copy semantics though

    Note that C++23 brings std::move_only_function if you're storing a callback for later use, as well as std::function_ref if you don't need lifetime extension.

    https://en.cppreference.com/w/cpp/utility/functional/move_on...

  19. > google.com, youtube, chrome, android, gmail, google map etc

    Of those, it's 50/50. The acquisitions were YT, Android, Maps. Search was obviously Google's original product, Chrome was an in-house effort to rejuvenate the web after IE had caused years of stagnation, and Gmail famously started as a 20% project.

    There are of course criticisms that Google has not really created any major (say, billion-user) in-house products in the past 15 years.

  20. > given that we don't have access to that data?

    Actually, we do have some of that data. You can just look at the H-1B employer datahub, filter by Google (or whichever other employer you want to scrutinize), and then look at the crosstab.

    https://www.uscis.gov/tools/reports-and-studies/h-1b-employe...

    Looking at Google specifically, I observe two things:

    1. The number of new approvals has gone down drastically since 2019 (2019: 2706, 2020: 1680, 2021: 1445, 2022: 1573, 2023: 1263, 2024: 1065, 2025: 1250). (These numbers were derived by summing up all the rows for a given year, but it's close enough to just look at the biggest number that represents HQ.)

    Compared to the overall change in total employees as reported in the earnings calls (which was accelerating up through 2022 but then stagnated around 2025), we don't actually see much anything noteworthy.

    2. Most approvals are either renewals ("Continuation Approval"), external hires who are just transferring their existing H-1B visas ("Change of Employer Approval"), and internal transfers ("Amended Approval").

  21. Yeah, that was what I tried with row-Hamming / col-Hamming (namely: treat entire rows / cols as matches or not). I then used the min of the two to address those issues.

    Either way, I guess my implementation had a bug -- A* does yield a significant speedup, but adding the 0.25x scaling factor to ensure that the heuristic is admissible loses almost all of those gains.

    For some concrete numbers: with the bug that basically reduced to BFS, it ran in about 7s; with the bug fixed but a wildly inadmissible heuristic, it ran in about 0.01s; with the heuristic scaled down by 4x to guarantee its admissibility, it ran in about 5s.

    I think scaling it down by 2x would be sufficient: that lower bound would be tight if the problem is one row move and one column move away from the goal state, but potentially all four rows and columns would not match. In that case, it ran in about 1.6s.

  22. I tried implementing A* using pointwise Hamming distance, found that it was inadmissible (since it yielded a suboptimal result on par with my manual attempt), then tried again with row-wise Hamming distance but was pretty sure that's inadmissible too (although it did yield an optimal result). I then tried min(row-Hamming, column-Hamming) but I'm not convinced that's admissible either.

    I then switched to pure Dijkstra which ended up being faster because evaluation was much cheaper at each step, and despite these heuristics being inadmissible, they didn't result in substantially fewer nodes expanded.

    That's almost certainly a function of the problem size -- if it were 5x5, this approach would not have been as successful.

  23. > The GPU implementation's logarithmic scaling becomes evident at longer sequence lengths.

    I don't see logarithmic scaling, actually. From the table for GRU performance, going from 16384 -> 65536 (namely: increasing the input by 4x) is roughly a 4x increase in time whether looking at CPU-scan or GPU-scan. Okay, maybe the inputs need to be bigger. Looking at the next plot, which goes up to 524288, we see the same behavior: the delta between CPU-scan and GPU-scan doubles as we double the input. That's a constant multiplicative factor. Same holds for LSTM performance.

    Is this an artifact of the benchmark setup? Are we actually measuring the amount of time needed to load the full context into RAM? Or perhaps we're bottlenecked on memory bandwidth?

    > Success: The gate extraction kernel, which was a huge bottleneck, now only takes 8% of the total time and is memory-bandwidth bound, saturating L2 bandwidth at 1.9 TB/s. This is a good place to be.

    Sounds like that might be the case.

  24. > San Francisco, San Jose, Los Angeles, Austin, and Phoenix are ~10% of US population.

    Surely you're describing metro areas? There's no way those five cities add up to 34 million people within city limits, given that none of them have 6 million people.

    The MSAs added up to 27 million based on the 2020 census, so "close enough". https://en.wikipedia.org/wiki/Metropolitan_statistical_area

    That said, Waymo's service areas are nowhere close to covering the full MSAs: https://support.google.com/waymo/answer/9059119?hl=en

    - SF doesn't cover East Bay (two thirds of the MSA by population).

    - Silicon Valley doesn't cover San Jose, and barely reaches into Sunnyvale (basically just covering the Google Moffett Park office buildings).

    - The Phoenix area is missing most of the densest parts of Phoenix itself, as well as anything north / west of the city.

    - Los Angeles doesn't even come close to covering the city, much less the rest of LA County or any of Orange County. (Maybe 2-3 million out of 13, from just eyeballing the region.)

    On Uber (https://support.google.com/waymo/answer/16011725?hl=en) there's also Atlanta (which looks like it actually has very nice coverage, other than the western half of the city) and Austin (again focused on downtown / commercial districts) which help drive up the numbers.

    The population that's had opportunity to see Waymo in the wild is probably higher because they're testing in quite a few cities now (a sibling commenter mentions NYC, for instance).

  25. > if an argument fits into the size of a register, it's better to pass by value to avoid the extra indirection.

    Whether an argument is passed in a register or not is unfortunately much more nuanced than this: it depends on the ABI calling conventions (which vary depending on OS as well as CPU architecture). There are some examples where the argument will not be passed in a register despite being "small enough", and some examples where the argument may be split across two or more registers.

    For instance, in the x86-64 ELF ABI spec [0], the type needs to be <= 16 bytes (despite registers only being 8 bytes), and it must not have any nontrivial copy / move constructors. And, of course, only some registers are used in this way, and if those are used up, your value params will be passed on the stack regardless.

    [0] Section 3.2.3 of https://gitlab.com/x86-psABIs/x86-64-ABI

  26. I'm curious why anyone would pay $19.99/month for Alexa+ rather than just buy a Prime membership (which is $14.99/month).

    Unless of course this is going to be met with a price hike for Prime...

  27.     > •  The curved sides of the projection suggest the spherical form of Earth.
        > •  Straight parallels that make it easier to compare how far north or south places are from the equator.
    
    Okay, we've now added a constraint that this should be pseudocylindrical [0].

    So why pick this over, say, Eckert IV or something from the Tobler Hyperelliptical family?

    There is perhaps an additional argument (present on the wiki page [1], and elaborated on the paper introducing the projection [2]) that the equal earth projection is computationally easier to translate between lat/long and map coordinates, as it explicitly uses a polynomial equation instead of strict elliptical arcs. (This is the main argument presented against Eckert IV.)

    The paper also lists some additional aesthetic goals: poles do not converge to points (ruling out Tobler Hyperelliptical), and meridians do not bulge excessively.

    In fact, the paper describes Equal Area to be a blend of Craster parabolic and Eckert IV (then aesthetically tuned to avoid being stretched too much in either direction). It is also notable that the Equal Area paper measures both lower scale distortion and angular deformation for Eckert IV.

    [0] https://en.wikipedia.org/wiki/List_of_map_projections#pseudo...

    [1] https://en.wikipedia.org/wiki/Equal_Earth_projection

    [2] https://scholar.google.com/scholar?q=doi.org%2F10.1080%2F136...

    edit: I found https://map-projections.net/singleview.php which you can view a bunch of other possible candidates by selecting Pseudocylindric + Equal-Area.

This user hasn’t submitted anything.

Keyboard Shortcuts

Story Lists

j
Next story
k
Previous story
Shift+j
Last story
Shift+k
First story
o Enter
Go to story URL
c
Go to comments
u
Go to author

Navigation

Shift+t
Go to top stories
Shift+n
Go to new stories
Shift+b
Go to best stories
Shift+a
Go to Ask HN
Shift+s
Go to Show HN

Miscellaneous

?
Show this modal