When the 0x01000000p Dropped

So, I got bit

More accurately, 32 of them and it sucked. It made me realise that my understanding of memory was fundamentally flawed and I hadn’t even noticed. I’ve fixed a lot of low level memory tramples and alignment issues in a lot of games too. Felt like I should know better, but then other people were surprised by the same behaviour.

Specifically, I’ve been bitten by cross word boundary memory access on little-endian architectures. In game terms, I was probably reading some awesome Fire-Breathing animation for a dapper dragon from a buffer.

dapper

Here’s the thing…

It turns out I’ve been leaning on a crutch for years. Within VS there’s the incredibly useful memory view, I’ve used this view countless times and it’s given me crucial information to squash many buffer overruns, memory tramples and the like. One of its core features is to display information in a number of different ways, for example:

#include <cinttypes>
using namespace std;

struct SomeMemoryBlock
{
int32_t mWord1;
int32_t mWord2;
int32_t mWord3;
};

SomeMemoryBlock block{ 0x01020304, 0x05060708, 0x090a0b0c };

Where the address of block on my last execution was 0x006EFB14.

4bytememory

Like the above shows, 4-byte integer mode is fantastic for looking at most word aligned data structures. It’s so useful and so commonly used in fact that it can get a little dangerous if you’re not thinking of the underlying representation. Let’s try and read a value over a word boundary:

4bytewordboundary

The second capture is just reading one byte over the boundary, again interpreting as a 32 bit integer. This feels wrong, why would:

05060708 090a0b0c

When read one byte to the right not read?

06070809 0a0b0ccc

And though you can probably see because of the predictable pattern I gave, where does the 0c value appear from? Even more peculiar, what do you think the following code produces?

unsigned char* memPtr = reinterpret_cast<unsigned char*>(&block.mWord2);
memPtr += 1;

int32_t mysteryValue;
memcpy(&mysteryValue, memPtr, sizeof(int32_t));

That’s right! 0c050607 , the same as above. Either way the behaviour appears to be functionally correct, even if it’s not obvious yet. So let’s take a step back and look at the fundamentals.

1p or 16777216p?

Like the title. How you interpret 01000000 depends on a lot of factors. If we’re assuming 32 bit integers, this actually all comes down to endianness.

A single byte can be represented by two hex values:

01 = 1 = 00000001
02 = 2 = 00000010
10 = 16 = 00010000
11 = 17 = 00010001
FF = 255 = 11111111

32-bit structures are divided into 4 bytes, often referred to as words. But how you store those values depends on the endianness of the architecture. Most definitions will refer to the structures in terms of the ‘most significant’ bit or byte, but to me the differentiation is much simpler:

Big endian makes sense intuitively.

From the example above, most numeric representations consider positive 1 in hex to be 01. For a 32 bit integer, we need to pad out the other 3 bytes, even though they contain nothing, in endian representations only the whole bytes are shuffled.

Big Endian: 00 00 00 01
Little Endian: 01 00 00 00

So far there’s not much to it. But lets try a better example… we can specify hexidecimal literal (0x or 0X) prefixes to numbers. We always write these in Big Endian.

0x01020304
Big Endian: 01 02 03 04
Little Endian: 04 03 02 01

The fact that networks agreed on big endian for their standard speaks volumes, this is still only a 32 bit example. It gets tough mentally shuffling more bytes than required, you might even argue that our brains aren’t wired for it.

Just to fill out the last bit of info I mentioned above.

Dragons Eat Words

Machine architecture is constantly reinventing itself. As the capability of hardware goes up, limitations on its most efficient use also get imposed. One great example is memory alignment. Dependent on data structures it’s much more efficient to put values on alignment boundaries, the alternatives being either multiple inefficient reads/writes to get the same operation or an operation fail. 2, 4, 8, 32,64 and 128 are all common byte alignments dependent on the data stored and it often falls to a programmer to handle this.

For 32 bit integers a 4 byte boundary makes the most sense and is often referred to as the ‘natural boundary’.  You’ll notice that these values are stored in memory addresses ending in 0, 4, 8 or C as a result (from the screenshots above, they ended in 4, 8 and C)

0x006EFB10
0x006EFB14
0x006EFB18
0x006EFB1C

Sometimes though it’s necessary to read over boundaries. You might find some code that already did it or a particular blob of data may balloon in size if padded too much. That’s when you find yourself in a similar position to the memory read at the start of the post.

For context, you have the awesome dragon animation data buffer and are parsing the values individually. Often it can be more efficient to do the load in place, but let’s put that to one side:

unsigned char* position;

// Read 4 bytes from word aligned boundary
int32_t numberOfScaleAnimations = read(&buffer, position, 4);

// Read 1 byte from word aligned boundary
unsigned char hasWings = read(&buffer, position, 1);

// Read another 4 bytes from non-aligned boundary
int32_t heightAboveGround = read(&buffer, position, 4);

If you were to step through the read like we did initially and pay attention to memory you’d see the same sort of surprising behaviour. Also, if a dragon doesn’t have wings is he just a lizard?

The Penny Drops – Endian Debugging

Like a lot of things, this ‘surprising behaviour’ is not actually that surprising. Back to our 32-bit sample representation of 3 words:

01020304 05060708 090a0b0c

In 1-byte integer memory mode, this is actually displayed as:

1bytememory

I felt pretty stupid when I saw this previously, the values are actually stored in little endian format! This means when we did the single byte move we actually found:

07 06 05 0c

Which was then converted back to logical big endian format for use in code as

0c 05 06 07

The 4-byte integer mode was behaving correctly and interpreting that data as it should do. Problem solved.

You’ll probably have already worked out by now that the examples above are totally endian specific and dependant on your architecture you might not have seen the strange values at all. That doesn’t mean you should just ignore things though. If you’re supporting a mixture of endian architectures for your game though it pays to pay attention and be consistent when processing data to avoid issues like this.

Also, just because the 4-byte view is easy to work with, that doesn’t mean its always showing you the the data you expect.

One thought on “When the 0x01000000p Dropped”

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>