Comment by gwerbin - Hacker Neue

gwerbin Oct 12, 2017 parent

Would you be able to go into more detail on this? An example maybe? It might be useful for someone like me who's "interested in Rust" but hasn't tried it (yet).

steveklabnik Oct 12, 2017

There's also a lot of us who do enjoy writing kernels in Rust; https://os.phil-opp.com/ goes through the start of ones for x86_64, and really shows off some nice features about Rust.

Sag0Sag0 Oct 13, 2017

The amount of unsafe code necessary for low level work, the difficulty in sending a stream of bytes to an io port and the innate complexity of the language. C's philosophy is everything is a series of bits which you can manipulate at will. Rust's is the same except you have to through a huge amount of hoops first to make things "safe".

Things like converting a String to an i64 are a pain. There is large amount of boilerplate code needed.

All in all this is a personal preference. I prefer some unsafeness in c to (in my opinion) the constant inconvenience of programming in rust.

comex Oct 13, 2017

I know it's just one example, and you're reporting more of a general impression, but since you mentioned converting a String to an i64… well, in what way?

If you mean reading the bytes of the string as an i64, that's

    unsafe { *(s.as_ptr() as *const i64) }

...which is a bit more verbose than C's

    *(int64_t *) ptr

but not that verbose, especially given that there will often be more code in the unsafe block. On the other hand, in C that's often technically undefined behavior due to strict aliasing, so to get strictly conformant code you have to use memcpy. Rust has no strict aliasing.

(I wouldn't actually recommend the above Rust code, because it doesn't verify that the string's length matches i64 - but neither does C, so it's a fair comparison. More idiomatic wrappers can be found in third-party crates like byteorder or pod.)

If you mean casting the pointer to the bytes to an i64, that's

    s.as_ptr() as i64

which doesn't seem verbose. (No unsafe needed in that case.)

oconnor663 Oct 13, 2017

There are safe interfaces for this stuff too. The byteorder crate (not in the standard library, but widely used) has:

    let i = NativeEndian::read_i64(...);

Or if you're literally parsing a string of digits:

    let i: Option<i64> = digits.parse();

The `parse` method on strings is a particularly smooth piece of trait system magic.

makapuf Oct 13, 2017

> to get strictly conformant [C] code you have to use memcpy couldn't you use an union there ?

> Rust has no strict aliasing. does that mean that rust cannot optimize for things C could ? (real quesiton I don't know rust)

oconnor663 Oct 13, 2017

> does that mean that rust cannot optimize for things C could?

Here's my best understanding of the situation. Someone who actually understands the compiler might have to correct me:

- Pointers in unsafe Rust don't do any strict aliasing optimizations, which C compilers sometimes do. The Rust memory model isn't fully specified, though, and the status quo seems to be related to not actually passing type information to LLVM. Not clear whether this will change in the future. There's some discussion of it here: http://smallcultfollowing.com/babysteps/blog/2016/05/27/the-...

- References in safe Rust (the vast majority of code) have much stronger aliasing information than pointers do in C. This is one of the core features of Rust, that references that allow mutation are guaranteed not to be aliased. I think the status quo is that this information isn't passed to LLVM because of some LLVM bugs getting in the way, but that it should start working in the near future. When all of this is working, I think it should produce code that's faster than C, in the same way that Fortran sometimes does.

comex Oct 13, 2017

You’re on target. Just to clarify: even references (“safe”) do not need type-based alias analysis (aka “strict aliasing”, which is what GCC calls the flag enabling/disabling it). All Rust references have semantics similar to C “restrict”: there should never be any conflicting writes from other sources, because immutable references imply the data shouldn’t change at all (nobody has a mutable reference), and mutable references are exclusive. (Types with interior mutability are an exception, but the compiler knows what types those are and special-cases them.) So the compiler can assume “no alias” most of the time with no need to care about types.

The formal specification of rules for unsafe code hasn’t been written yet, because, well, it’s an ambitious goal! Even the C standard is sometimes not really clear about what counts as undefined behavior; Rust wants to do better, while being more permissive, and offering a ‘sanitizer’ tool to verify correctness at runtime. And implement this on top of LLVM, which was written by other people, is designed for C’s rules, and, like other compilers, doesn’t even get those right in every case (even when the spec is clear).

For now, the effort is still fairly tentative. But I’m pretty confident that type-based aliasing analysis will never be a thing in Rust, so it will always be legal to read data through ‘wrong-typed’ pointers, both raw pointers and references (as long as it’s valid data, alignment is right, etc.).

Actually, I’m embarrassed: my code from earlier isn’t actually legal in all cases. It requires the pointer to be correctly aligned, which in the case of String it probably will be, but it’s not guaranteed. Meh.

comex Oct 13, 2017

Regarding union: Say you have

    char *p

You definitely can’t do

    union { char *cp; int64_t *ip; } u;
    u.cp = p;
    return *u.ip; // bad

...because the undefined part isn’t casting the pointer, but reading through it. Nor may you read through a pointer to union, if the memory was not already typed as that union type:

    union { char c[8]; int64_t i; } *up;
    up = (void *) p;
    return up->i; // bad

(...well, you probably shouldn’t, anyway. One of the clauses in the C standard seems to condone it, but a WG document[1] suggests that the clause is just misworded, meant to allow accessing a pointer to a structure field with the field’s type, but allowing the opposite instead. And suggests that it will be fixed in future standards. Who knows, though.)

You can do this:

    union { char c[8]; int64_t i; } u;
    for (int i = 0; i < 8; i++)
        u.c[i] = p[i];
    return up->i; // ok

But that’s more verbose than memcpying directly into an int64_t, and the compiler will optimize both versions into a plain load (as long as the target arch supports unaligned loads), so there’s not much point doing it that way.

If your data was declared as a union in the first place, of course, that’s different. But the OP was asking about strings, so I assumed the input in the C case was just a char *.

[1] http://open-std.org/jtc1/sc22/wg14/www/docs/n1520.htm

Rusky Oct 13, 2017

Those sound like relatively straightforward things to write safe and convenient wrappers for. Obviously not as barebones as C but certainly not constantly fighting the language.

Sag0Sag0 Oct 13, 2017

True. I suppose i just prefer the bare bonededness.

hossbeast Oct 13, 2017

This slogan could sell a lot of T shirts.

This item has no comments currently.

Preferences

Keyboard Shortcuts

Story Lists

Navigation

Miscellaneous