Home » C++ » Why is a = (a+b) – (b=a) a bad choice for swapping two integers?

Why is a = (a+b) – (b=a) a bad choice for swapping two integers?

Posted by: admin November 30, 2017 Leave a comment

Questions:

I stumbled into this code for swapping two integers without using a temporary variable or the use of bitwise operators.

int main(){

    int a=2,b=3;
    printf("a=%d,b=%d",a,b);
    a=(a+b)-(b=a);
    printf("\na=%d,b=%d",a,b);
    return 0;
}

But I think this code has undefined behavior in the swap statement a = (a+b) - (b=a); as it does not contain any sequence points to determine the order of evaluation.

My question is: Is this an acceptable solution to swap two integers?

Answers:

No. This is not acceptable. This code invokes Undefined behavior. This is because of the operation on b is not defined. In the expression

a=(a+b)-(b=a);  

it is not certain whether b gets modified first or its value gets used in the expression (a+b) because of the lack of the sequence point.
See what standard syas:

C11: 6.5 Expressions:

If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar
object
, the behavior is undefined. If there are multiple allowable orderings of the
subexpressions of an expression, the behavior is undefined if such an unsequenced side
effect occurs in any of the orderings.84)
1.

Read C-faq- 3.8 and this answer for more detailed explanation of sequence point and undefined behavior.


1. Emphasis is mine.

Questions:
Answers:

My Question is – Is this an acceptable solution to swap two integers?

Acceptable to who? If you’re asking if it is acceptable to me, that would not get past any code review I was in, believe me.

why is a=(a+b)-(b=a) a bad choice for swapping two integers?

For the following reasons:

1) As you note, there is no guarantee in C that it actually does that. It could do anything.

2) Suppose for the sake of argument that it really does swap two integers, as it does in C#. (C# guarantees that side effects happen left-to-right.) The code would still be unacceptable because it is completely not obvious what its meaning is! Code shouldn’t be a bunch of clever tricks. Write code for the person coming after you who has to read and understand it.

3) Again, suppose that it works. The code is still unacceptable because this is just plain false:

I stumbled into this code for swapping two integers without using a Temporary variable or the use of bit-wise operators.

That’s simply false. This trick uses a temporary variable to store the computation of a+b. The variable is generated by the compiler on your behalf and not given a name, but it’s there. If the goal is to eliminate temporaries, this makes it worse, not better! And why would you want to eliminate temporaries in the first place? They’re cheap!

4) This only works for integers. Lots of things need to be swapped other than integers.

In short, spend your time concentrating on writing code that is obviously correct, rather than trying to come up with clever tricks that actually make things worse.

Questions:
Answers:

There are at least two problems with a=(a+b)-(b=a).

One you mention yourself: the lack of sequence points means that the behavior is undefined. As such, anything at all could happen. For example, there is no guarantee of which is evaluated first: a+b or b=a. The compiler may choose to generate code for the assignment first, or do something completely different.

Another problem is the fact that the overflow of signed arithmetic is undefined behavior. If a+b overflows there is no guarantee of the results; even an exception might be thrown.

Questions:
Answers:

Apart from the other answers, about undefined behavior and style, if you write simple code that just uses a temporary variable the compiler can likely trace the values and not actually swap them in the generated code, and just use the swapped values later on in some cases. It cant do that with your code. The compiler is usually better than you at micro optimizations.

So it’s likely your code is slower, harder to understand, and probably unreliable undefined behavior too.

Questions:
Answers:

If you use gcc and -Wall the compiler already warns you

a.c:3:26: warning: operation on ‘b’ may be undefined [-Wsequence-point]

Whether to use such a construct is debatable from an performance point as well. When you look at

void swap1(int *a, int *b)
{
    *a = (*a + *b) - (*b = *a);
}

void swap2(int *a, int *b)
{
    int t = *a;
    *a = *b;
    *b = t;
}

and examine the assembly code

swap1:
.LFB0:
    .cfi_startproc
    movl    (%rdi), %edx
    movl    (%rsi), %eax
    movl    %edx, (%rsi)
    movl    %eax, (%rdi)
    ret
    .cfi_endproc

swap2:
.LFB1:
    .cfi_startproc
    movl    (%rdi), %eax
    movl    (%rsi), %edx
    movl    %edx, (%rdi)
    movl    %eax, (%rsi)
    ret
    .cfi_endproc

you can see no benefit for obfuscating the code.


Looking at C++ (g++) code, which does basically the same, but takes move into account

#include <algorithm>

void swap3(int *a, int *b)
{
    std::swap(*a, *b);
}

gives identical assembly output

_Z5swap3PiS_:
.LFB417:
    .cfi_startproc
    movl    (%rdi), %eax
    movl    (%rsi), %edx
    movl    %edx, (%rdi)
    movl    %eax, (%rsi)
    ret
    .cfi_endproc

Taking gcc’s warning into account and seeing no technical gain, I would say, stick with standard techniques. If this ever becomes a bottleneck, you can still investigate, how to improve or avoid this small piece of code.

Questions:
Answers:

The statement:

a=(a+b)-(b=a);  

invokes undefined behavior. The second shall in the quoted paragraph is violated:

(C99, 6.5p2) “Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be read only to determine the value to be stored.

Questions:
Answers:

A question was posted back in 2010 with the exact same example.

a = (a+b) - (b=a);

Steve Jessop warns against it:

Behaviour of that code is undefined, by the way. Both a and b are read
and written without an intervening sequence point. For starters, the
compiler would be well within its rights to evaluate b=a before
evaluating a+b.

Here’s an explanation from a question posted in 2012. Note that the sample is not exactly the same because of the lack of parentheses, but the answer is still relevant nonetheless.

In C++, subexpressions in arithmetic expressions do not have temporal
ordering.

a = x + y;

Is x evaluated first, or y? The compiler can choose either, or it can
choose something completely different. The order of evaluation is not
the same thing as operator precedence: operator precedence is strictly
defined, and order of evaluation is only defined to the granularity
that your program has sequence points.

In fact, on some architectures it is possible to emit code that
evaluates both x and y at the same time — for example, VLIW
architectures.

Now for C11 standard quotes from N1570:

Annex J.1/1

It is unspecified behavior when:

— The order
in which subexpressions are evaluated and the order in which side
effects take place, except as specified for the function-call (), &&,
||, ? :, and comma operators (6.5).

— The order in which the operands of an assignment operator are evaluated (6.5.16).

Annex J.2/1

It is undefined behavior when:

— A side effect on a scalar object is unsequenced relative to either a
different side effect on the same scalar object or a value computation
using the value of the same scalar object (6.5).

6.5/1

An expression is a sequence of operators and operands that specifies
computation of a value, or that designates an object or a function, or
that generates side effects, or that performs a combination thereof.
The value computations of the operands of an operator are sequenced
before the value computation of the result of the operator.

6.5/2

If a side effect on a scalar object is unsequenced relative to either
a different side effect on the same scalar object or a value
computation using the value of the same scalar object, the behavior is
undefined. If there are multiple allowable orderings of the
subexpressions of an expression, the behavior is undefined if such an
unsequenced side effect occurs in any of the orderings.84)

6.5/3

The grouping of operators and operands is indicated by the syntax.85)
Except as specified later, side effects and value computations of
subexpressions are unsequenced.86)

You shouldn’t rely on undefined behavior.

Some alternatives: In C++ you can use

  std::swap(a, b);

XOR-swap:

  a = a^b;
  b = a^b;
  a = a^b;

Questions:
Answers:

The problem is that according to the C++ Standard

Except where noted, evaluations of operands of individual operators
and of subexpressions of individual expressions are unsequenced.

So this expression

a=(a+b)-(b=a);  

has undefined behavioir.

Questions:
Answers:

You can use XOR swap algorithm to prevent any overflow issues and still have a one-liner.

But, since you’ve got a c++ tag, I’d prefer just a simple std::swap(a, b) to make it more easily readable.

Questions:
Answers:

Besides the problems invoked by the other users, this type of swap has another problem: the domain.

What if a+b goes beyond it’s limit? Let’s say we work with number from 0 to 100. 80+ 60 would go out of range.