Preferences

sparkie
Joined 2,233 karma

  1. Should really give the big-O notation for common operations for comparison of the approaches.

    ---

    There are some variations of Chunk/Bucket arrays which offer different characteristics.

    HAT (Hashed Array Trees[1]) are close to what is presented - the index block is always a power of 2 and points to data blocks of the same length - ie, it increases as: 1x1, 2x2, 4x4, 8x8, 16x16, ...

    The downside is each time the length reaches a new power of 4 it requires the data to be shuffled about, which may cause latency spikes.

    RAOTS (Resizeable Arrays in Optimal Time and Space[2]) is a variation where the index block block is approximately the square root of the length. It increases by adding a new block and resizing only the index block. The old blocks remain part of the structure and don't need reshuffling like HAT, which avoids the latency spikes, but there are more individual allocator calls. However, blocks are always a power of 2 which makes them arena-friendly.

    Both of these have O(1) element access and waste only O(√n) space, which makes them quite optimal for uses where memory may be a constraint.

    ---

    In the double-and-copy style array, one trick is we can avoid storing `capacity` and compute it on demand - determine the MSB of the length (using `lzcnt` or `bsr`), and then determine the next power of 2 sufficient to hold that length. In C23, this is `stdc_bit_ceil` (<stdbit.h>).

    In the "Xal" structure (which has several other names), we can use `stdc_first_leading_one` to determine the index in the index block, then zero out the leading one bit to determine the index in the bucket.

    This structure would be more efficient if processors provided a `log2mask` instruction which computes both the MSB and the remainder after masking the MSB in one instruction. (Similar to how `divmod` instructions work where the quotient and remainder are both computed in one instruction).

    ---

    [1]: https://en.wikipedia.org/wiki/Hashed_array_tree

    [2]:https://cs.uwaterloo.ca/research/tr/1999/09/CS-99-09.pdf

  2. So we can't return the closure?

    Then it's clearly only half a solution.

    The example I gave above should work fine in any language with first-class closures.

  3. Thread locals don't fully solve the problem. They work well if you immediately call the closure, but what if you want to store the closure and call it later?

        #include <stdlib.h>
        #include <string.h>
        #include <stddef.h>
    
        typedef int (*comp)(const void *untyped_left, const void *untyped_right);
    
        thread_local int in_reverse = 0;
    
        __attribute__((noinline))
        int compare_impl(const void *untyped_left, const void *untyped_right, int in_reverse) {
            const int* left = untyped_left;
            const int* right = untyped_right;
            return (in_reverse) ? *right - *left : *left - *right;
        }
    
        comp make_sort(int direction) {
            in_reverse = direction;
            int compare(const void *untyped_left, const void *untyped_right) {
                return compare_impl(untyped_left, untyped_right, in_reverse);
            }
            return compare;
        }
    
        int main(int argc, char* argv[]) {
    
            int list[] = { 2, 11, 32, 49, 57, 20, 110, 203 };
    
            comp normal_sort = make_sort(0);
            comp reverse_sort = make_sort(1);
    
            qsort(list, (sizeof(list)/sizeof(*list)), sizeof(*list), normal_sort);
                
            return list[0];
        }
    
    Because we create `reverse_sort` between creating `normal_sort` and calling it, we end up with a reverse sort despite clearly asking for a normal sort.
  4. > x86 is unusual in mostly having a maximum of two operands per instruction[2]

    Perhaps interesting for those who aren't up to date, the recent APX extension allows 3-operand versions of most of the ALU instructions with a new data destination, so we don't need to use temporary registers - making them more RISC-like.

    The downside is they're EVEX encoded, which adds a 4-byte prefix to the instruction. It's still cheaper to use `lea` for an addition, but now we will be able to do things like

        or rax, rdx, rcx
    
    https://www.intel.com/content/www/us/en/developer/articles/t...
  5. It's due to the way the instruction is encoded. `lea` would've needed special treatment in syntax to remove the brackets.

    In `op reg1, reg2`, the two registers are encoded as 3 bits each the ModRM byte which follows the opcode. Obviously, we can't fit 3 registers in the ModRM byte because it's only 8-bits.

    In `op reg1, [reg2 + reg3]`, reg1 is encoded in the ModRM byte. The 3 bits that were previously used for reg2 are instead `0b100`, which indicates a SIB byte follows the ModRM byte. The SIB (Scale-Index-Base) byte uses 3 bits each for reg2 and reg3 as the base and index registers.

    In any other instruction, the SIB byte is used for addressing, so syntax of `lea` is consistent with the way it is encoded.

    Encoding details of ModRM/SIB are in Volume2, Section 2.1.5 of the ISA manual: https://www.intel.com/content/www/us/en/developer/articles/t...

  6. This is why you should use Graphene's contact scopes and only allow access to contacts that you want to contact on WhatsApp.
  7. It's not only xor that does this, but most 32-bit operations zero-extend the result of the 64-bit register. AMD did this for backward compatibility. so existing programs would mostly continue working, unlike Intel's earlier attempt at 64-bits which was an entirely new design.

    The reason `xor eax,eax` is preferred to `xor rax,rax` is due to how the instructions are encoded - it saves one byte which in turn reduces instruction cache usage.

    When using 64-bit operations, a REX prefix is required on the instruction (byte 0x40..0x4F), which serves two purposes - the MSB of the low nybble (W) being set (ie, REX prefixes 0x48..0x4f) indicates a 64-bit operation, and the low 3 bits of low nybble allow using registers r8-r15 by providing an extra bit for the ModRM register field and the base and index fields in the SIB byte, as only 3-bits (8-registers) are provided by x86.

    A recent addition, APX, adds an additional 16 registers (r16-r31), which need 2 additional bits. There's a REX2 prefix for this (0xD5 ...), which is a two byte prefix to the instruction. REX2 replaces the REX prefix when accessing r16-r31, still contains the W bit, but it also includes an `M0` bit, which says which of the two main opcode maps to use, which replaces the 0x0F prefix, so it has no additional cost over the REX prefix when accessing the second opcode map.

  8. The primary focus of Lions/seL4 is security, and formal verification of it. It's a necessary project for a world where CVEs are frequently published for the Kernel and other underlying system services, in part because they're still using a 1970s OS design which didn't have internet-connected computers in mind.

    The seL4 kernel uses an Object-Capability system (ironically which have also been around since the '70s) in place of Access Control Lists used in POSIX systems. The idea behind capabilities is "don't separate designation from authority" - the capability is both a designator of a resource and the authority to use it - there are no separate ACLs. Separation of designation from authority leads to confused deputies - where one process with lower privileges can trick a higher-privileged process to use the higher privileges on its behalf, which very often leads to privilege escalation vulnerabilities (ie, RCE as root).

    Each process has a "cspace" (Capability Space), which holds the capabilities the process may use. When the process performs a system call, it references a capability slot from a set of tables in its own cspace where the capability is held, and the Kernel verifies that the capability is valid in order to perform the syscall. The capabilities can be revoked at any time, and importantly, they cannot be forged - only delegated.

    Additionally, seL4 is a microkernel design which is intended to have a small attack surface. It provides only a small number of system calls like `Send`, `Recv`, `Yield`, and combinations of, such as `Call`, `Reply`, `ReplyRecv`. It's a generic interface where an IPC buffer is passed along with a set of delegated capabilities, but there's no specific calls such as `SYS_write` or `SYS_mmap` - these are built on top of the `Send`/`Recv` syscalls and their services are provided by other user-space daemon processes.

    One of the best developments of seL4 is the "Mixed-criticality system" support, which provides capabilities for processing time - to enable some processes to have real-time support without becoming CPU hogs.

    seL4 can also be seen as a hypervisor which can virtualize other kernels, and part of the Lions effort is to be able to utilize Linux drivers using isolated Linux guests.

    To learn more, the seL4 manual is a good place to start: https://sel4.systems/Info/Docs/seL4-manual-latest.pdf. There's some videos on Youtube by Gernot Heiser discussing the MCS support.

  9. I don't specifically mean stdlib `rand` (hence comment about assuming no inherent bias) - but just some arbitrary PRNG. Updated the code above to say `random` so as not confuse.
  10. We could potentially use a conditional move and an unconditional jump to make the branch target predictor do the work instead - and flood it with a bunch of targets which are intended to mispredict. Eg, we could give 255 different paths for abandon and select one randomly:

        #define BYTE_VALUES \
            X(0x00) X(0x01) X(0x02) X(0x03) X(0x04) X(0x05) X(0x06) X(0x07) X(0x08) X(0x09) X(0x0A) X(0x0B) X(0x0C) X(0x0D) X(0x0E) X(0x0F) \
            X(0x10) X(0x11) X(0x12) X(0x13) X(0x14) X(0x15) X(0x16) X(0x17) X(0x18) X(0x19) X(0x1A) X(0x1B) X(0x1C) X(0x1D) X(0x1E) X(0x1F) \
            X(0x20) X(0x21) X(0x22) X(0x23) X(0x24) X(0x25) X(0x26) X(0x27) X(0x28) X(0x29) X(0x2A) X(0x2B) X(0x2C) X(0x2D) X(0x2E) X(0x2F) \
            X(0x30) X(0x31) X(0x32) X(0x33) X(0x34) X(0x35) X(0x36) X(0x37) X(0x38) X(0x39) X(0x3A) X(0x3B) X(0x3C) X(0x3D) X(0x3E) X(0x3F) \
            X(0x40) X(0x41) X(0x42) X(0x43) X(0x44) X(0x45) X(0x46) X(0x47) X(0x48) X(0x49) X(0x4A) X(0x4B) X(0x4C) X(0x4D) X(0x4E) X(0x4F) \
            X(0x50) X(0x51) X(0x52) X(0x53) X(0x54) X(0x55) X(0x56) X(0x57) X(0x58) X(0x59) X(0x5A) X(0x5B) X(0x5C) X(0x5D) X(0x5E) X(0x5F) \
            X(0x60) X(0x61) X(0x62) X(0x63) X(0x64) X(0x65) X(0x66) X(0x67) X(0x68) X(0x69) X(0x6A) X(0x6B) X(0x6C) X(0x6D) X(0x6E) X(0x6F) \
            X(0x70) X(0x71) X(0x72) X(0x73) X(0x74) X(0x75) X(0x76) X(0x77) X(0x78) X(0x79) X(0x7A) X(0x7B) X(0x7C) X(0x7D) X(0x7E) X(0x7F) \
            X(0x80) X(0x81) X(0x82) X(0x83) X(0x84) X(0x85) X(0x86) X(0x87) X(0x88) X(0x89) X(0x8A) X(0x8B) X(0x8C) X(0x8D) X(0x8E) X(0x8F) \
            X(0x90) X(0x91) X(0x92) X(0x93) X(0x94) X(0x95) X(0x96) X(0x97) X(0x98) X(0x99) X(0x9A) X(0x9B) X(0x9C) X(0x9D) X(0x9E) X(0x9F) \
            X(0xA0) X(0xA1) X(0xA2) X(0xA3) X(0xA4) X(0xA5) X(0xA6) X(0xA7) X(0xA8) X(0xA9) X(0xAA) X(0xAB) X(0xAC) X(0xAD) X(0xAE) X(0xAF) \
            X(0xB0) X(0xB1) X(0xB2) X(0xB3) X(0xB4) X(0xB5) X(0xB6) X(0xB7) X(0xB8) X(0xB9) X(0xBA) X(0xBB) X(0xBC) X(0xBD) X(0xBE) X(0xBF) \
            X(0xC0) X(0xC1) X(0xC2) X(0xC3) X(0xC4) X(0xC5) X(0xC6) X(0xC7) X(0xC8) X(0xC9) X(0xCA) X(0xCB) X(0xCC) X(0xCD) X(0xCE) X(0xCF) \
            X(0xD0) X(0xD1) X(0xD2) X(0xD3) X(0xD4) X(0xD5) X(0xD6) X(0xD7) X(0xD8) X(0xD9) X(0xDA) X(0xDB) X(0xDC) X(0xDD) X(0xDE) X(0xDF) \
            X(0xE0) X(0xE1) X(0xE2) X(0xE3) X(0xE4) X(0xE5) X(0xE6) X(0xE7) X(0xE8) X(0xE9) X(0xEA) X(0xEB) X(0xEC) X(0xED) X(0xEE) X(0xEF) \
            X(0xF0) X(0xF1) X(0xF2) X(0xF3) X(0xF4) X(0xF5) X(0xF6) X(0xF7) X(0xF8) X(0xF9) X(0xFA) X(0xFB) X(0xFC) X(0xFD) X(0xFE) X(0xFF)
            
        void resolve(Transaction *t) {
            void* jt[] = {
                #define X(n) [n] = &&abandon##n,
                BYTE_VALUES
                #undef X
                [0] = &send,
            };
            
            static uint8_t branch = 0;
            bool csend = should_send(t);
            branch = csend ? 0 : branch;      // cmov or SETcc
            goto * jt[branch];
            
            #define X(n) \
            abandon##n: \
                abandon(t); \
                branch = random() % 256; \
                return;
            BYTE_VALUES
            #undef X
            
            send:
                if (csend) send(t);
                else abandon(t);
                return;
        }
        
    Assuming no inherent bias in the low byte produced by `random`, there's only a ~1/255 chance that an abandon branch will correctly predict, though this is also true for the send branch. The conditional branch in send though should only mispredict 1/256 times (when random returns 0).

    If we're sending significantly more often than 1/256 calls to resolve, it may be possible to train the BTP to prefer the send branch, as it will correctly predict this branch more often than the others which are chosen randomly - though this would depend on how the branch target predictor is implemented in the processor.

  11. The branch taken hint (3EH) was re-added in Redwood Cove (2023), but it's only for static prediction where the branch predictor has not yet encountered the branch - ie, useful for things you would only use once or twice but would likely take the branch. Once the branch predictor has some information the static prediction hint is ignored, so it's best to omit it for anything that will eventually have dynamic branch prediction (ie, run many times), because you'll be consuming bytes in the i-cache which serve no purpose.
  12. It's most likely a combination of both genetics and society - neither are absolutes. There is no concrete evidence that intelligence is purely a social construct, nor that it is genetic. We simply don't know.

    People get cancelled not for saying that it is genetic, but for questioning whether it may be. Of course, we will never know if we're not allowed to ask. Cancel culture is anti-science.

    Watson may have been racist, but questioning whether there is a relationship between genetics and intelligence by itself is not racism.

  13. You realize that LLMs are trained on human discussions right?

    If everyone stops asking questions and asks the LLM instead, there is no new training data for future LLMs to learn from. They will stagnate, or consume their own slop, and regress.

  14. They're one byte opcodes, but not one byte ops. Most of them have operands which are encoded in a ModRM byte which follows the opcode. The ModRM may be followed by a SIB byte, and that may be followed by a a variable size immediate|displacement. There are also optional prefixes to the opcode.
  15. To state the obvious, IANAL.

    > This is indeed a weak-point in the contract approach: People can't be bound by an contract they never knew about nor agreed-to.

    "Prominent notice" is important in the terms of use approach. Many terms of use claims have been dismissed because there was failure to give prominent notice - however, there have been successes, such as Hubbert vs Dell[1], where an appeals court reversed the trial court's decision and ruled in Dell's favour on the basis that they had given prominent notice of terms.

    [1]:https://caselaw.findlaw.com/court/il-court-of-appeals/124479...

    There are other potential legal avenues besides contract law, such as Unjust Enrichment[2], which, according to Wiki is analysed as:

        1. Was the defendant enriched?
        2. Was the enrichment at the expense of the claimant?
        3. Was the enrichment unjust?
        4. Does the defendant have a defense?
        5. What remedies are available to the claimant?
    
    [2]:https://en.wikipedia.org/wiki/Restitution_and_unjust_enrichm...

    Since AI companies are likely to be enriched (by a large amount), and it could be at the expense of a claimant if the AI (re)produces the claimants work or closely related work based on it (or in the case of a class action suit, potentially many works), the AI companies which violate a terms of use would have to argue in their favour for #3 and #4 - is the enrichment unjust and do they have a defense.

    There are certainly arguments for AI trained on copyrighted works being unjust. Including a terms of use which specifically prevents this would be in the favor of the claimant. An AI company would have to defend their decision to ignore such terms and claim that doing so is not unjust.

    Arguably, if the AI is sufficiently "intelligent" enough, it should be able to make sense of such terms and be used to filter such works from training data (unless specifically prompted to ignore). If the AI companies are not filtering the data they aggregate then there's a potential argument for negligence.

    There's some efforts being made, such as the IETF aipref[3], which could standardize the way training data is collected/filtered by AI companies. Creative Commons have a related effort called "CC signals". These could also be helpful in a future claim if the AI companies ignore those preferences.

    [3]:https://datatracker.ietf.org/wg/aipref/about/

    So to me it seems having a clause in your license and/or terms of use which prevents use in AI training, providing prominent notice of such terms, and indicating AI preferences in the protocols, is going to be better than not having such clauses - because if you don't have the clause then the defendant can simply claim in their defense that "There was nothing in the terms which said we can't".

    It's up to us to take a proactive approach to protecting our works from AI, even if there is not yet any solid legal basis or case law, so I applaud OP's efforts even if the license doesn't have carry any weight. If we don't attempt to protect our works from use AI training then the AI companies will win at everyone else's expense.

    Creators should decide how their works are used, and we need to normalize it being the case that they decide whether or not use in AI training data should be permitted.

  16. Apologies, there's a mistake in the godbolt link above. `SIGN_BIT` should be `0x8000` and not `0x1000`.
  17. It's fine to use techniques like this in standard C provided you guard them with the preprocessor and provide a fallback option on unsupported hardware.
  18. C23 also has `alignas` and `alignof` (`_Alignas`/`_Alignof` in C11 with the lowercase as macros in stdalign.h), and also provides `aligned_alloc` and `free_aligned_size` in stdlib.

This user hasn’t submitted anything.

Keyboard Shortcuts

Story Lists

j
Next story
k
Previous story
Shift+j
Last story
Shift+k
First story
o Enter
Go to story URL
c
Go to comments
u
Go to author

Navigation

Shift+t
Go to top stories
Shift+n
Go to new stories
Shift+b
Go to best stories
Shift+a
Go to Ask HN
Shift+s
Go to Show HN

Miscellaneous

?
Show this modal