Comment by sacnoradhq

sacnoradhq Oct 17, 2023 parent

This statement would be technically legal on its own in x86 real mode if the compiler didn't do null pointer checks. However it would set the divide-by-zero IRQ handler to itself 0000:0000, and when the next division by zero happened, the machine run into UB (likely a reset or halt) because it would jump there, do 4x ADD byte ptr [BX + SI], AL (or ADD byte ptr [EAX], AL) followed by running the remaining interrupt vectors as instructions.

cjensen Oct 17, 2023

Not quite. (char *) 0 is the null pointer. The null pointer is not necessarily a binary all-zero. On some compilers in x86, the null pointer intentionally points to something which will cause a crash when written to.

sacnoradhq OP Oct 17, 2023

Find me one contemporary example (ANSI C) with a disassembled screenshot.

This is writing sizeof(char) (== 1 almost everywhere) zero to address zero. It is not using a NULL macro or other predefined symbol.

In the real world, this would generally write a byte to address 0000:0000, leading to UB because it would fuck up the divide-by-zero IV.

PS: I used Borland C++ 3.1, Microsoft C++ 3.x and 4.5x, Watcom, and early GNU.

pxx Oct 17, 2023

(void *) 0 is a null pointer constant and is not necessarily an all zeros representation. This has been defined virtually forever.

https://c-faq.com/null/null2.html

https://c-faq.com/null/machexamp.html

Actual ways to do what you want to do are described in

https://c-faq.com/null/accessloc0.html

but technically speaking the pointer with a constant zero assigned to it _is_ a null pointer (which can be implemented as whatever bit pattern), independent of the preprocessor macro.

monocasa Oct 17, 2023

> Find me one contemporary example (ANSI C) with a disassembled screenshot.

Here in godbolt, clang compiling C simply deletes the code in the function past and including the null pointer dereference.

https://godbolt.org/z/9aqWPazsP

> This is writing sizeof(char) (== 1 almost everywhere)

1 everywhere. sizeof's unit is "how many chars". For instance there was a cray machine that could only access 64bit words. sizeof(char) is still 1, with 64bit chars.

> zero to address zero. It is not using a NULL macro or other predefined symbol.

NULL is defined as literal 0.

gpderetta Oct 17, 2023

make it a '*(volatile char*)0 = 0' to force the store.

jcelerier Oct 17, 2023

https://gcc.godbolt.org/z/hWEMnjT83

gpderetta Oct 17, 2023

> sizeof(char) (== 1 almost everywhere)

sizeof char is 1 by definition everywhere.

/pedantic

shric Oct 18, 2023

> sizeof char is 1 by definition everywhere.

Parentheses are required around char because it's a type.

/pedantic

cjensen Oct 18, 2023

That is incorrect :-).

sizeof is an operator in C, and does not need parenthesis any more than pointer operator *. It is true that programmers frequently think of it as a function and use parenthesis.

3 More Comments →

lmm Oct 18, 2023

In the mathematical sense of almost, a property that holds everywhere does qualify as holding almost everywhere.

sargstuff Oct 19, 2023

Guess can be taken as a shortened explaination of what a programming language committee is tasked with making happen. Likely why Lisp so successful/useful.

-----

unless initial property is start of dynamic operation, in which case, holding almost anywhere begins at the first operation after the start of the dynamic operation. process / lambda / epsilon calculi is just symbolic math. address 0 static, everything else dynamic.

per math, dimension N is static, to be able to "change things up" in dimension N, need to to be almost everywhere higher than dimension n. Edge cases are weird in any dimension. Guess why logicians just do the equivalent of C's !0

(cast classic logic) A=1 (cast boolean logic) B=0

C statement !(!B == A) hold everywhere and almost everywhere depends on how read C spec to interpret A & B.

cjensen Oct 17, 2023

Regarding your PS, I used Borland's Turbo C++ 1.0, and I think you've forgotten that memory models existed. Honestly, that's a good nightmare to forget.

HybridCurve Oct 17, 2023

I hated that about DOS, real-mode and BC++. After about 6-8 months of that misery, installing linux and learning to write C code with GCC was the best thing that ever happened to me. I felt like an animal being released from a cage and into the wild.

pjmlp Oct 18, 2023

In those days MS-DOS, Linux was barely usable, when Linux became usable Windows 95 was already around, without those limitations.

My first kernel was 1.0.9 released alongside Slackware 2.0, offering initial support for IDE CD-ROM drives and experimental support for ELF files, by the way.

dbrower Oct 17, 2023

It doesn't have to be the NULL macro, which is correctly defined as plain 0.

The literal 0 is treated specially, so this could indeed be one of those 'turns into a weird bit pattern NULL pointers', if such a thing existed in the wild anymore.

But you're correct in that there probably haven't been any since the turn of the century or whenever the last Univac mainframes got turned off.

pests Oct 18, 2023

Apparently according to the c-faqs link elsethread

    execl takes a variable-length, null-pointer-terminated list of character pointer arguments, and is correctly called like this:
    execl("/bin/sh", "sh", "-c", "date", (char *)0);

Due to ececl being a variadic function it can not take advantage of a prototype to instruct the compiler that one of its arguments needs to be treated as a pointer context.

jfbastien Oct 18, 2023

Good thing this was covered in the talk

lmm Oct 18, 2023

Having it in text is much nicer than having it in video.

jfbastien Oct 18, 2023

Having it in .rodata is much nice than having it in .text

sargstuff Oct 19, 2023

Having in in machine readable form that runs really keeps things running.

layer8 Oct 17, 2023

Not quite, I think. Since this is a char pointer being used, only the first byte of the interrupt address would be zeroed. Since in real mode those are far pointers, the lower byte of the segment would be zeroed. So xx00:xxxx.

But yes, the interrupt table was my first thought when reading the headline.

kevin_thibedeau Oct 18, 2023

Char can be the same size as short or int. You can't assume it is one byte.

_kst_ Oct 18, 2023

You can't assume char is one octet. It is one byte by definition.

A byte is CHAR_BIT bits, where CHAR_BIT >= 8. (It's exactly 8 on most implementations; DSPs are the most common exception).

short and int are both required to be at least 16 bits wide. It's possible for int to be 1 byte (sizeof (int) == 1), but only if CHAR_BIT >= 16.

_kst_ Oct 18, 2023

A clarification: You can certainly assume that char is 8 bits if you don't mind losing portability to a small minority of systems.

If I'm being pedantic, I might add something like

    #if CHAR_BIT != 8
    #error "This code assumes 8-bit char"
    #endif

But realistically, if I'm using headers defined by either POSIX or Windows, that's probably enough of a guarantee. (Though I'd still use CHAR_BIT rather than 8 to refer to the number of bits in a byte.)

gpderetta Oct 18, 2023

posix indeed guarantees CHAR_BIT == 8.

layer8 Oct 18, 2023

Yeah, if you're going to be pedantic, check your facts, see the sibling. Since I'm assuming an 8086 interrupt table, I'm also going to assume 8-bit chars, as that's the x86 addressing model. And dereferencing a null pointer is UB, so you can't count on anything anyway without making further assumptions.

saghm Oct 17, 2023

If I'm understanding what you're saying correctly, the memory location with address 0 is actually a writable address, but with the value being used semantically to handle division by zero? It's kind of wild to me that would even something that's even allowed to be done manually, let alone required by a certain mode. Is this something provided for compatibility reasons that you'd have to opt into, or is it just something enabled by default?

lmm Oct 18, 2023

Which part is wild? "Magic" memory addresses are a fairly normal way to communicate with hardware; nowadays there are more layers to how you set up mappings in the MMU etc., but in the old days it was normal for everything to just have a fixed address (e.g. I remember back on the Apple ][ the screen's framebuffer was in a particular memory range, or rather two - to avoid tearing you'd draw on one and then flip which one it was using). And particularly for the CPU, it's hard to see how else it could do customizable interrupt handling - I guess you could have some kind of special API with dedicated CPU instructions or something for "programming" in an interrupt table, but that would be more complex and have no particular benefit. "It reads your table of pointers from this address in memory, in this format" is pretty straightforward and easy to use.

As for why it's address 0, well, it has to go somewhere, every machine has a CPU so everyone needs an interrupt table even if they don't have much memory. And when memory was precious there was no sense wasting even one byte of it; 0 was a real address on your physical memory chip, so why not use it just like any other?

(The fact that it's "address 0" for "division by 0" is just coincidence as far as I can see; division by 0 just happens to be the first kind of possible CPU interrupt. Perhaps it was the most common one?)

saghm Oct 18, 2023

The part that surprised me is that this would be the way things worked on a modern C++ compiler without any special flags. The article is about C++, and using "magic" memory addresses doesn't seem at all what I'd expect to be the default way to handle division by zero.

From the numerous responses here, it's clear that people interpret my question as about how the hardware itself works, which isn't at all what I was asking about; I'm aware of how stuff like this works at the assembly level, but my understanding was that in C and C++, trying to write arbitrarily to "special" addresses like that would be considered undefined behavior (often resulting in segfaults). When I read the comment I responded to above, it surprised me, so I wanted to check whether I understood what was said correctly. It's honestly kind of confusing to me that so many people seem very upset by the idea that a stranger on the internet might have a misconception about how hardware abstractions are exposed via compiled code to the point that they feel the need to explain in detail how hardware works but not actually answer the question I asked.

mkup Oct 18, 2023

The difference between modern days and days of DOS isn't in C/C++ compiler, it's in virtual memory and address space isolation and privilege isolation. So it's not a job of a C/C++ compiler to enforce protection from writing to "special" addresses, because interrupt table updates (and memory-mapped hardware I/O in general) still must happen somewhere (i.e. in kernel, hypervisor, drivers etc) and that code is still written in C/C++, same as in the DOS era.

sargstuff Oct 18, 2023

Mmmm..;. job of modern OS is to use/manage MMU. Prior to DEC, OS just automated version of human feeding punch cards/spooling up tape.

DEC provided the necessary hardware MMU to do actual real time multi-processing/multi-user access in feasibile/practical manner.

lmm Oct 18, 2023

> The article is about C++, and using "magic" memory addresses doesn't seem at all what I'd expect to be the default way to handle division by zero.

They're not saying this is, like, a portable standard way to handle division by zero in C++. You're right that it would be undefined behaviour under the standard (but a C++ compiler for real-mode x86 would be expected to support it, at least implicitly; obviously this specific case is not a particularly useful, but C++ is used in embedded settings and setting a custom interrupt handler is something its users want and expect).

A decent, well-behaved language would do some kind of structured error handling on divide by zero, like throwing an exception. IMO that includes any C++ compiler worth bothering with (though again the standard makes it undefined behaviour so it's possible that some compilers don't). But, the way the runtime of such a decent C++ compiler would actually implement that would be by setting up an interrupt handler for the divide by zero interrupt (that would contain code to construct the exception etc.), and by performing this write to address 0 you're overwriting (the pointer to) that interrupt handler. So, this line of code would cause your program to behave (almost certainly) badly on the next division by zero, even if you were using a well-behaved C++ compiler that normally handled division by zero gracefully.

(OTOH with a maliciously pedantic C++ compiler that division by zero would already be undefined behaviour, so in practice, since most C++ compilers tend to be maliciously pedantic, you might be no worse off than you were before that line).

The original post you replied to was just talking about the somewhat interesting details of what would actually happen because of the quirks of what these addresses are used for on that hardware (e.g. the fact that address 0 is supposed to contain a pointer to the handler, so by setting it to 0 you cause the CPU to start executing the interrupt handler table as code, is kind of interesting - not as a point about C++, but as a point about funny emergent behaviour of hardware), not about what this is specified as doing or the normal way of doing things in C++. I don't know why you were downvoted.

sargstuff Oct 18, 2023

> ".. the fact that address 0 is supposed to contain a pointer to the handler .."

What got missed though, is ther has to be an "unused"/"reserve" bit(s) space in order for things to run without requiring additional specific hardware operations.

wvenable Oct 17, 2023

Back in the day there were no protections. You could write to any address whether it was used by the CPU for interrupt vectors, part of the OS, hardware addresses, anything.

saghm Oct 18, 2023

Sure, but I'm asking if that's something that's enabled by default today or not. I don't see why it's unreasonable for things that were useful "back in the day" to not be available with the default arguments on current versions of compilers but available with certain flags. I'm not sure why my question touched a nerve, because I'm genuinely asking both if I understand correctly and if it's something that needs a flag to enable.

wvenable Oct 18, 2023

In older CPUs there was no memory management hardware and no virtual memory. You could just read/write in code from any address anywhere and you'd be writing to that actual physical memory in the computer. This wasn't a feature so much as it is a lack of a feature.

Modern CPUs with virtual memory means the question is a lot more complicated. Every process in a modern OS gets it's own address space so you can write to 0 but it could go anywhere (even virtualized to disk) and all the actual hardware is not directly accessible (must go through the OS).

I'm not sure I'd call this ability "useful" except if you're writing an operating system. This is vast simplification but when your computer boots it's effectively in a mode that allows reading/writing to anywhere. The OS kernel has direct access to all the hardware and then it limits access when running user processes.

sargstuff Oct 17, 2023

pre-MMU, unless using DEC box.

inkyoto Oct 18, 2023

It is still true even today for microcontrollers – many of them come with a miniscule amount of RAM, no MMU and generally unpredictable memory maps.

layer8 Oct 17, 2023

Think of it as part of the “API” of the CPU that a program can make use of however it likes. In the early days (for DOS and the like), the distinction between operating system and application was more one of convention and not enforced by hardware mechanisms. The program was supposed to control the hardware, and not the other way around.

HansHamster Oct 17, 2023

The interrupt vector table on x86 sits by default at 0000:0000 and the CPU uses it to handle interrupts and other exceptions by jumping to the address entry corresponding to the event. Entry 0 is division by 0, but there are also entries for illegal instruction, hardware interrupts and so on.

The address can be changed with the LIDT instruction and operating systems nowadays will just put it wherever, but for backward compatibility it is expected to still be at 0000:0000 (not sure how this is handled nowadays in UEFI, but it should still be possible t o set it up that way).

marcosdumay Oct 17, 2023

The kernel can write almost anywhere. (Well, actually, nothing can write on most addresses in a 64 bits machine, but if it's usable for something, the kernel can use it directly.)

And yes, some addresses are special. (AFAIK, on all current mainstream architectures.) This is the expected way to set those signal handlers, output (and input) data, configure devices, etc.

That said, there are some gotchas on using specific addresses in C. AFAIK none apply to x86, but it's something you usually do in assembly.

This item has no comments currently.

Preferences

Keyboard Shortcuts

Story Lists

Navigation

Miscellaneous