Things like converting a String to an i64 are a pain. There is large amount of boilerplate code needed.
All in all this is a personal preference. I prefer some unsafeness in c to (in my opinion) the constant inconvenience of programming in rust.
If you mean reading the bytes of the string as an i64, that's
unsafe { *(s.as_ptr() as *const i64) }
...which is a bit more verbose than C's *(int64_t *) ptr
but not that verbose, especially given that there will often be more code in the unsafe block. On the other hand, in C that's often technically undefined behavior due to strict aliasing, so to get strictly conformant code you have to use memcpy. Rust has no strict aliasing.(I wouldn't actually recommend the above Rust code, because it doesn't verify that the string's length matches i64 - but neither does C, so it's a fair comparison. More idiomatic wrappers can be found in third-party crates like byteorder or pod.)
If you mean casting the pointer to the bytes to an i64, that's
s.as_ptr() as i64
which doesn't seem verbose. (No unsafe needed in that case.) let i = NativeEndian::read_i64(...);
Or if you're literally parsing a string of digits: let i: Option<i64> = digits.parse();
The `parse` method on strings is a particularly smooth piece of trait system magic.> Rust has no strict aliasing. does that mean that rust cannot optimize for things C could ? (real quesiton I don't know rust)
Here's my best understanding of the situation. Someone who actually understands the compiler might have to correct me:
- Pointers in unsafe Rust don't do any strict aliasing optimizations, which C compilers sometimes do. The Rust memory model isn't fully specified, though, and the status quo seems to be related to not actually passing type information to LLVM. Not clear whether this will change in the future. There's some discussion of it here: http://smallcultfollowing.com/babysteps/blog/2016/05/27/the-...
- References in safe Rust (the vast majority of code) have much stronger aliasing information than pointers do in C. This is one of the core features of Rust, that references that allow mutation are guaranteed not to be aliased. I think the status quo is that this information isn't passed to LLVM because of some LLVM bugs getting in the way, but that it should start working in the near future. When all of this is working, I think it should produce code that's faster than C, in the same way that Fortran sometimes does.
The formal specification of rules for unsafe code hasn’t been written yet, because, well, it’s an ambitious goal! Even the C standard is sometimes not really clear about what counts as undefined behavior; Rust wants to do better, while being more permissive, and offering a ‘sanitizer’ tool to verify correctness at runtime. And implement this on top of LLVM, which was written by other people, is designed for C’s rules, and, like other compilers, doesn’t even get those right in every case (even when the spec is clear).
For now, the effort is still fairly tentative. But I’m pretty confident that type-based aliasing analysis will never be a thing in Rust, so it will always be legal to read data through ‘wrong-typed’ pointers, both raw pointers and references (as long as it’s valid data, alignment is right, etc.).
Actually, I’m embarrassed: my code from earlier isn’t actually legal in all cases. It requires the pointer to be correctly aligned, which in the case of String it probably will be, but it’s not guaranteed. Meh.
char *p
You definitely can’t do union { char *cp; int64_t *ip; } u;
u.cp = p;
return *u.ip; // bad
...because the undefined part isn’t casting the pointer, but reading through it. Nor may you read through a pointer to union, if the memory was not already typed as that union type: union { char c[8]; int64_t i; } *up;
up = (void *) p;
return up->i; // bad
(...well, you probably shouldn’t, anyway. One of the clauses in the C standard seems to condone it, but a WG document[1] suggests that the clause is just misworded, meant to allow accessing a pointer to a structure field with the field’s type, but allowing the opposite instead. And suggests that it will be fixed in future standards. Who knows, though.)You can do this:
union { char c[8]; int64_t i; } u;
for (int i = 0; i < 8; i++)
u.c[i] = p[i];
return up->i; // ok
But that’s more verbose than memcpying directly into an int64_t, and the compiler will optimize both versions into a plain load (as long as the target arch supports unaligned loads), so there’s not much point doing it that way.If your data was declared as a union in the first place, of course, that’s different. But the OP was asking about strings, so I assumed the input in the C case was just a char *.