Hidden Wonders

Programming·Technology

Offsetof: What, How, and Why?


Published:

Lastmod:

Table of Contents

Short Introduction to Pointers[#]


offsetof can wait for now; let’s talk pointers quick.

In C and C++, there is a datatype called a pointer. Here’s an example:

/* Define some number, `x` */
int x = 0;

/* Assign memory location of `x` to a pointer, `p` */
int *p = &x;

/* Dereference `p`, setting what `p` points to (in this case, `x`) to 5 */
*p = 5; // this changes the value of x to 5

And that’s a very quick introduction to pointers. Play around with them if the concept’s not clear to you, the key points for now are:

Now, also cool is that you can do pointer arithmetic. For example, we can (most likely) do this:

int x = 5;
int y = 8;

/* Set pointer to `x` */
int *p = &x;

/* Modify `x` */
*p = 100;

/* Increment `p` */
p++; //same result as p += 1

/* Modify `y` (technically undefined behavior I think, but usually works!) */
*p = 1000;

Here, we do what we did the same thing we did before with x and p. However, then we increment p, and then we are able to dereference p and access the value of y! How is this possible? Well, local variables are allocated in the same segment of memory in the program (the stack, but that’s not important for now), and they’re usually allocated right next to each other in memory, so we’re able to just add the sizeof(int) to a pointer to x to get p to point to the next integer in memory, y.

Now, that p++ operation might be confusing, because it usually only increments a number by 1, and sizeof(int) == 4 (for most platforms used today). Don’t worry though; pointer arithmetic is a well defined C operation:

If `p` is a pointer to some element of an array, then `p++` increments `p` to point to the next element, and `p+=i` increments it to point `i` elements beyond where it currently does.

— The C Programming Language, 2nd edition

Key things about pointer arithmetic here:

Short Introduction to Structs[#]


Almost there to offsetof, let’s talk about structs first. A struct is very simple; it’s just a container that holds arbitrary data. For instance:

/* Create struct named my_struct */
struct my_struct {
    int x;
    float z;
    long num;
    char str[4];
};

/* Make an instance of my_struct named s that holds our data */
struct my_struct s = {
    10,
    5.8,
    11342,
    "hey" // 4 bytes, last one is '\0'
};

So, we’ve got a bunch of stuff stored contiguously in memory (ignore struct padding for now). Using what we just learned about pointers, we can access elements of a struct in exactly the same way using a pointer to a struct!

/* Our struct from before */
struct my_struct {
    int x;          // bytes 0-4
    float z;        // bytes 4-8
    long num;       // bytes 8-16
    char str[4];    // bytes 16-24 (padding!!!)
};
struct my_struct s = (struct my_struct) {
    10,
    5.8,
    11342,
    "hey"
};

/* Make `p` point to `s` */
struct my_struct *p = &s;

#include <stdio.h>
printf("prints 10: %d\n", *((int*)p));
printf("prints 5.8: %f\n", *(float*)((char*)p + 4));

So we have this pointer to a struct my_struct, and in the first printf we’re casting it to an int pointer, then dereferencing it. This should make perfect sense: the int we want to access, x, is the first element of the struct; therefore, if we cast our my_struct pointer, p, to an int pointer, it will point at the int.

That final printf has a bit more going on in it. Let’s write the relevant part again to get a better look:

*(float*)((char*)p + 4)

This piece of code takes the pointer, p, casts it to type char *, adds 4 to it, casts that incremented pointer to type float *, and dereferences it to access the float z stored inside the struct my_struct pointed to by p.

The final float * cast and dereference is the same as the previous printf, and shouldn’t be confusing. Looking at the potentially confusing part: we cast p to type char * because of what we previously learned about pointer arithmetic——that, if we do p + n, we increment p by sizeof(p's type) * n. On my system, this math checks out:

printf("%p\n", p);          // printed 0x7ffcda62f9e0 on my system
printf("%p\n", p+4);        // printed 0x7ffcda62fa40 on my system
printf("%d\n", (p+4)-(p));  // prints 96, and 4 * sizeof(struct my_struct) == 96!

If tried to access the memory at location p+96, that would be undefined behavior. We’d just be accessing random, garbage memory! Most likely, the program would just print out 0 if we tried to access that memory location (it’s not that many bytes away from the rest of our program data) but it’s also quite likely we’d get a segmentation fault error.

We need to increment p by the correct offset value by adding sizeof(int) to p, since the 4 bytes making up int x are all that separates p from the struct member we want to access, float z. By casting p to type char *, we can now do p+4 and get the expected result since sizeof(char) == 1.

Determining the byte offset for each element of a struct manually is beyond the scope of this article: see here for a quick explanation about it, and consult C/C++ documentation for the size of each primitive type on your platform (for instance, long is 4 bytes in Windows, but 8 on most other 64-bit systems).

At this point, everything here should be fairly clear. Let’s recap:

  1. A struct allows us to store data in contiguous locations in memory.
  2. We can calculate the offset of a particular struct data member by adding the bytes needed to store all previous data members to the memory location of the struct object.
  3. A struct has padding, which tries to align bytes to make the sizeof(struct) a multiple of 8 or 4——dependent on your system——so it can be confusing and unportable to compute the offset of a struct member manually.

If any points remain confusing, trying drawing out a memory diagram, or trying yourself with more examples. If you’ve made it this far, you should be able to understand this stuff.

For further reading, I’d recommend this link, The Lost Art of Structure Packing. Everything I’ve written about applies to a basic struct in both C and C++, but, as the link mentions for C++, “classes that look like structs may ignore the rule that the address of a struct is the address of its first member!” (source).

offsetof[#]


Finally, we’re here! Hopefully you noticed my bold text before and already can guess what offsetof does, but now let’s cover the details. First, some information from the man page:

/* From man offsetof(3) */

#include <stddef.h>
size_t offsetof(type, member);

/* The macro offsetof() returns the offset of the field `member` from
 * the start of the structure `type`.
 */

Let’s bullet the major takeaways here:

Here’s a quick example of a call to offsetof, using our struct named my_struct as an example:

#include <stddef.h>

/* Returns 8 (for a Linux 64-bit system) */
size_t my_offset = offsetof(struct my_struct, num);

Now, let’s actually see how offsetof is implemented. Wikipedia has a very good article on the topic, so let’s examine their implementation of offsetof:

#define offsetof(type, member) \
    ((size_t)&(((type *)0)->member))

This looks a bit similar to our previous example, but a lot more confusing. Let’s unravel it:

In practice, this means we can access a struct member in a more programmatic way using this calculated offset value:

/* Our struct from before */
struct my_struct {
    int x;          // bytes 0-4
    float z;        // bytes 4-8
    long num;       // bytes 8-16
    char str[4];    // bytes 16-24 (padding!!!)
};
struct my_struct s = (struct my_struct) {
    10,
    5.8,
    11342,
    "hey"
};
struct my_struct *p = &s;

/* Get offset of num (== 8 on Linux) */
size_t my_offset = offsetof(struct my_struct, num);

/* Access the field with the offset */
long my_num = *((char *)p + my_offset);

As a final note about offsetof, a bunch of people like to have a long argument over whether this macro is actually undefined behavior or not, which is why compilers like gcc implement the macro using a compiler builtin function. All I know for sure is that the code works with gcc and MSVC; therefore, while it’s probably safe to use your own implementation in practice, it’s better to just use offsetof as defined in stddef.h.

Motivations for offsetof[#]


Pointer arithmetic, offsetof, it might all come off as needlessly complex to a freshman computer science student. “Why would I ever need this in real life?” he asks himself while not paying attention to any of his classes because he thinks they’re all pointless, only to come out of school not having learned anything because he decided everything was “pointless.” However, I promise that offsetof has actual uses.

First, I’d just like to highlight how useful of a learning exercise this was. Obtaining a deep understand of what offsetof is and how to use it has allowed us to strengthen our understand about the memory layout of a struct and see how pointers work, two of the most important concepts for a C programmer to understand.

Anyways, here is is a quick code snippet that roughly illustrates how offsetof might be used in a real production C or C++ program:

struct Database {
    /* A ton of data members in here */
    ...
    int data_member;
};


struct UpdateEntry {
    const char *data_name;
    size_t offset;
    void (*update_func)(void *);
};

Now, we might create an array of struct UpdateEntry, which would look something like this:

struct UpdateEntry table[] {
    /* Lots of entries here... */
    { "my favorite member", offsetof(Database, data_member), &my_update_func }
};

It should hopefully be clear what we’re doing here: the struct UpdateEntry contains some string, an offset into our database, and a pointer to a function update_func. This could allow us to generically update our database items through each of these entries present in the array by, for instance, doing a function call like this:

// Earlier in the file, a struct Database named db is defined
struct UpdateEntry *my_entry = &table[some_madeup_index];

// Bit ugly with my parenthesis because I'm not sure of operator precedence
my_entry->update_func(((uint8_t *)&db) + my_entry->offset);

// Note that the implicit conversion to void * might not work in C++

We have now passed the member data_member by pointer to the function pointed to by update_func, and we can now update and data member of the struct as we please in a fairly self-contained and generic fashion.

This is the major use case of offsetof: there’s not really another easy way to statically reference a certain member of a struct in C. offsetof allows us to do this, making programming constructs like my struct UpdateEntry possible and, therefore, relatively common among larger C codebases.


Home Top


Site Licensing Site last updated: 2024-06-08