Surprises in C code

Article By : Colin Walls

The C language is quite flexible and expressive; these are some of the reasons why it has been successful and resilient to being replaced by “better”...

The C language is quite flexible and expressive; these are some of the reasons why it has been successful and resilient to being replaced by “better” languages. An example of its flexibility is the possibility to write an expression in a number of ways that are functionally equivalent. This enables the style of coding to be adapted to personal needs. However, there is a catch: sometimes, apparently equivalent code has subtle differences. This can occur in the simplest code and we will explore some possibilities in this article.

It is common for C to provide several different ways to do something, all of which are exactly equivalent. For example, given that x is a normal int variable, each of the following statements will do exactly the same job:

x = x + 1;
x += 1;

In each case 1 will be added to x. The only possible difference is that a less capable compiler might generate slightly better code for the last two options (which would be a hint that getting a better compiler would be worthwhile).

The two forms of the ++ operator, used in this way, produce the same result. However, if the value of the expression is used, the pre-increment and post-increment are different, thus:

y = x++;   // y has the value of x before the increment
y = ++x;   // y has the value of x after the increment

Interestingly the post-increment is slightly more “expensive”, as storage needs to be allocated to keep the old value of x. However, a compiler would probably optimize this away. If the storage is allocated when the expression value is not used, then a new compiler is definitely required!

If, instead of being an intx were a pointer to int, the adding 1 would have the effect of adding 4 (on a 32-bit machine). If this comes as a big surprise, a brush up on pointer arithmetic is in order.

However, sometimes constructs that appear to be equivalent have very subtle differences…

Probably the simplest thing that you can do in any programming language is assign a value to a variable. So, in C, we might write:

alpha = 99;
beta = 99;
gamma = 99;

Of course, this might be written more compactly like this:

alpha = beta = gamma = 99;

And these are 100% equivalent. Or are they?

Most of the time, these two constructs are entirely equivalent, but there are (at least) four situations when choosing one or the other might make a difference:

Firstly, and most prosaically, each variable is separate and perhaps a comment indicating why it is set to this value might be appropriate.

Second, it is always good to write maintainable code. Maybe, at some point in the future, the code might need to be changed so that all three variables are not set to the same value. The first format lends itself more readily to modification.

The third reason relates to substandard compilers, which might generate code like this for the first construct:

mov r0, #99
mov alpha, r0
mov r0, #99
mov beta, r0
mov r0, #99
mov gamma, r0

The second construct gives the hint that r0 only needs to be loaded once. Again, a better compiler would not need the hint.

Lastly, there is the question of execution order. In the first construct, it is entirely clear that alpha will be assigned first and gamma last. A compiler will interpret the second construct thus:

alpha = (beta = (gamma = 99));

This means that the assignment order is reversed. But does that matter? Most of the time, it does not. But if these were device registers, not ordinary variables, it might make a big difference. It is very common for hardware to need set-up values to be loaded in a precise sequence.

So, I would say that the multiple assignments in one statement construct should be avoided.

Overall, although C is a small language, it could be argued that it could be even smaller by giving less ways to do things. The result might be clearer, more maintainable code.

Leave a comment