Detailed explanation of the problem of member variable offset in structures in C language

The principle of position shifting structures in C language is simple, but I often forget that taking notes is a good way to remember.

There are three principles:

a. The first address offset of all members in the structure must be an integer of the length of the user data type, and the first address offset of the first member is 0.

For example, if the second member type is int, its first address offset must be a multiple of 4, otherwise it will have to be "header padded"; and so on

b. The total number of bytes occupied by the structure, that is, the value returned by the sizeof() function must be an integer multiple of the length of the maximum member, otherwise "end padding" must be performed;

c. If structure A takes structure B as its member, the offset of the first address stored in structure B must be an integer multiple of the maximum length of the member data contained in B.

If the members in B are int, double, and char, then the offset of B must be an integer multiple of 8; otherwise, "intermediate fill" is performed.

I believe that everyone must have used structures in the process of C language program development. So I wonder how you understand the offset of member variables in structures? This article will share with you some recent thoughts and summary on structure offset in C language.

Example 1

Let's define the requirements first:

The known structure types are defined as follows:

struct node_t{
 char a;
 int b;
 int c;
};

And structure 1Byte is aligned

#pragma pack(1)

beg:

The offset of member variable c in struct node_t.

Note: The offset here refers to the offset relative to the starting position of the structure.

When I see this problem, I believe that the solutions that appear in different people's minds may vary. Let's analyze the following possible solutions:

Method 1

If you are familiar with C language library functions, then the first thing you think of is the offsetof function (actually just a macro, let's call it that way). Let's view the function prototype of man 3 offsetof as follows:

 #include <>

  size_t offsetof(type, member);

With the above library functions, we can do it in one line of code:

offsetof(struct node_t, c);

Of course, this is not the focus of this article, please continue reading.

Method 2

When we are not familiar with the library functions of the C language, don't worry at this time, we can still use our own methods to solve the problem.

The most direct idea is: [Address of the structure member variable c] Subtract [Structure start address]

Let's first define a structure variable node:

struct node_t node;

Next, calculate the offset of member variable c:

(unsigned long)(&()) - (unsigned long)(&node)

&() is the address of the structure member variable c and is forced to be converted to unsigned long;

&node is the starting address of the structure, which is also forced to be converted into unsigned long;

Finally, we subtract the above two values to obtain the offset of the member variable c;

Method 3

According to the idea of method 2, we can still get the offset of the member variable c without using the library function. But as programmers, we should be good at thinking. Can we make some improvements to the above code to make our code more concise? Before making specific improvements, we should analyze what problems exist in Method 2.

I believe I don’t need to say more. If you are careful, you must have noticed that the most important problem in Method 2 is that we have customized a structure variable node. Although the question does not restrict us from customizing variables, when we encounter stricter and custom variables are not allowed in the question, we must think about new solutions.

Before exploring new solutions, let's first explore a small problem about offsets:

Small questions

This is a simple geometric problem. Assuming that the axis moves from point A to point B, how to calculate the offset of B relative to A? This question is very simple for us, and most people may blurt out and get the answer B-A.

So is this answer completely accurate? You think it is obviously not the case, because when A is the origin of the coordinate, that is, A=0, the above answers B-A are directly simplified to B.

What is this small and simple question about us?

We combined the idea of Method 2 and the above small questions, and did we quickly get the following correlation:

(unsigned long)(&()) - (unsigned long)(&node)

and

B - A
The idea of our small problem is that when A is the origin of the coordinate, B-A is simplified to B. Then, corresponding to our method 2, when the memory address of node is 0, that is, (&node==0), the above code can be simplified to:

(unsigned long)(&())

Since the node memory address ==0,

  //Member variable c in structure node

We can use another way to express it, as follows:

((struct node_t *)0)->c

The above code should be easier to understand. Since we know that the memory address of the structure is 0, we can directly access the member variables of the structure through the memory address. The meaning of the corresponding code is to obtain the member variable c of the structure struct node_t with the memory address number 0.

Note: This is just using the characteristics of the compiler to calculate the structure offset, and there is no operation on memory address 0. Some students may still have some questions about this. For a detailed understanding of this issue, please refer to some thoughts on the access method of C structure member variables.

At this time, our offset method eliminates the custom variable struct node_t node, and solves it directly in one line of code:

(unsigned long)(&(((struct node_t *)0)->c))

Is the above code more concise than method 2?

Here we define the above code function as a macro, which is used to calculate the offset of member variables in a structure body (the subsequent example will use this macro):

#define OFFSET_OF(type, member) (unsigned long)(&(((type *)0)->member))

Using the above macro, you can directly get the offset of the member variable c in the structure struct node_t as:

OFFSET_OF(struct node_t, c)

Example 2

Like Example 1, we first define the requirements as follows:

The known structure types are defined as follows:

struct node_t{
 char a;
 int b;
 int c;
};

int *p_c, which pointer points to the member variable c of struct node_t x

Structure 1Byte alignment

#pragma pack(1)
beg:

The value of the member variable b of the structure x?

When we get this problem, let’s do a simple analysis first. The question means how to find the value of another member variable of the structure based on a pointer to a member variable of a structure.

Then several possible solutions are:

Method 1

Since we know that the structure is 1Byte aligned, the easiest solution to this problem is:

*(int *)((unsigned long)p_c - sizeof(int))
The above code is very simple. The address of member variable c is subtracted from sizeof(int) to obtain the address of member variable b, and then cast it to int *, and finally get the value of member variable b;

Method 2

Although the code of Method 1 is simple, it is not very scalable. We hope to directly get the pointer p_node to the structure through p_c, and then access any member variable of the structure through p_node.

From this we get the idea of calculating the starting address of the structure p_node as:

【Address p_c of member variable c】 Subtract 【Offset of c in structure】

From Example 1, we get the offset of the member variable c in the struct node_t is:

(unsigned long)&(((struct node_t *)0)->c)

So we get the starting address pointer p_node of the structure as:

(struct node_t *)((unsigned long)p_c - (unsigned long)(&((struct node_t *)0)->c))

We can also directly use the OFFSET_OF macro defined in Example 1, and the above code becomes:

(struct node_t *)((unsigned long)p_c - OFFSET_OF(struct node_t, c))

Finally, we can use the following code to get the values of member variables a and b:

p_node->a

p_node->b

We also define the function of the above code as the following macro:

#define STRUCT_ENTRY(ptr, type, member) (type *)((unsigned long)(ptr)-OFFSET_OF(type, member))

The function of this macro is to obtain a pointer to the structure through a pointer to any member variable of the structure.

We use the above macro to modify the previous code as follows:

STRUCT_ENTRY(p_c, struct node_t, c)

p_c is a pointer to the struct node_t member variable c;

struct node_t structure type;

c is a member variable pointed to by p_c;

Note:

Some explanations about address operations in the above example:

int a = 10;
int * p_a = &a;

set up

p_a == 0x95734104；

The following are the relevant results of the compiler's calculation:

p_a + 10 == p_a + sizeof(int)*10 =0x95734104 + 4*10 = 0x95734144

(unsigned long)p_a + 10 == 0x95734104+10 = 0x95734114

(char *)p_a + 10 == 0x95734104 + sizeof(char)*10 = 0x95734114

From the above three situations, I believe you should be able to understand what I want to express. (Note: A subsequent blog post will explain the issue in detail from the perspective of the compiler)

in conclusion

This article describes some interesting things about offsets in C-language structures through several examples, and hope it will be helpful to you. I believe some students have seen some clues about why the above thoughts are there, which is the topic that will be described in the subsequent blog posts.

If there are any errors in the article, please point them out.