Home » C++ » Efficient way to bit-copy a signed integer to an unsigned integer

Efficient way to bit-copy a signed integer to an unsigned integer

Posted by: admin January 9, 2018 Leave a comment

Questions:
/* [1] */
int i = -1;
unsigned u = (unsigned)i;

/* [2] */
int i = -1;
unsigned u;
memcpy(&u, &i, sizeof i);

/* [3] */
int i = -1;
unsigned u = *(unsigned *)&i;

In order to bit-copy a signed integer to its unsigned partner, [1] should work on most machines, but as far as I know it is not guaranteed behaviour.

[2] should do exactly what I want, but I want to avoid the overhead of calling a library function.

So how about [3]? Does it efficiently achieve what I intend?

Answers:
[3] is correct in both C and C++ (as of C++14 but not previously); there is no need to use memcpy in this case. (That said, there’s no reason not to use memcpy, as it communicates your intent effectively, is obviously safe, and has zero overhead.)

C, 6.5 Expressions:

7 – An object shall have its stored value accessed only by an lvalue expression that has one of
the following types: […]

  • a type that is the signed or unsigned type corresponding to the effective type of the
    object, […]

C++, [basic.lval]:

10 – If a program attempts to access the stored value of an object through a glvalue of other than one of the
following types the behavior is undefined: […]

  • a type that is the signed or unsigned type corresponding to the dynamic type of the object, […]

As you can see, the wording in the two standards is very similar and so can be relied upon across the two languages.

Questions:
Answers:
/* [4] */
union unsigned_integer
{
  int i;
  unsigned u;
};

unsigned_integer ui;
ui.i = -1;
// You now have access to ui.u

Warning: As discussed in the comments, this seems to be okay in C and Undefined Behaviour in C++, since your question has both tags i’ll leave this here. For more info check this SO question:

Accessing inactive union member and undefined behavior?

I would then advise for reinterpret_cast in C++:

/* [5] */
int i = -1;
unsigned u = reinterpret_cast<unsigned&>(i);

Questions:
Answers:
/* [1] */
int i = -1;
unsigned u = (unsigned)i;

↑ This is guaranteed to not work on a sign-and-magnitude or 1’s complement machine, because conversion to unsigned is guaranteed to yield the signed value modulo 2n where n is the number of value representation bits in the unsigned type. I.e. the conversion is guaranteed to yield the same result as if the signed type used two’s complement representation.


/* [2] */
int i = -1;
unsigned u;
memcpy(&u, &i, sizeof i);

↑ This would work nicely, because the types are guaranteed to have the same size.


/* [3] */
int i = -1;
unsigned u = *(unsigned *)&i;

↑ This is formally Undefined Behavior in C++11 and earlier, but it’s one of the cases included in the “strict aliasing” clause in the standard, and so it’s probably supported by every extant compiler. Also, it’s an example of what reinterpret_cast is there for. And in C++14 and later the language about undefined behavior has been removed from (1)the section on lvalue to rvalue conversion.

If I did this I would use the named C++ cast for clarity.

I would however try out what the sometimes look-the-standard-allows-me-to-do-the-impractical-thing compilers have to say about it, in particular g++ with its strict aliasing option, whatever it is, but also clang, since it’s designed as a drop-in replacement for g++.

At least if I planned on the code being used with those compilers and options.


1) [conv.lval], §4.1/1 in both C++11 and C++14.

Questions:
Answers:

This is from Paragraph 4.7 “Integral Conversions” of document N3797, the latest working draft of the C++14 standard:

If the destination type is unsigned, the resulting value is the least
unsigned integer congruent to the source integer (modulo 2n where n is
the number of bits used to represent the unsigned type). [ Note: In a
two’s complement representation, this conversion is conceptual and
there is no change in the bit pattern (if there is no truncation).
—end note ]

To a first approximation, all computers in the world use two’s complement representation. So [1] is the way to go (unless you are porting C++ to the IBM 7090).