Home » C++ » performance of unsigned vs signed integers

performance of unsigned vs signed integers

Posted by: admin November 30, 2017 Leave a comment

Questions:

Is there any performance gain/loss by using unsigned integers over signed integers?

If so, does this goes for short and long as well?

Answers:

Division by powers of 2 is faster with unsigned int, because it can be optimized into a single shift instruction. With signed int, it usually requires more machine instructions, because division rounds towards zero, but shifting to the right rounds down. Example:

int foo(int x, unsigned y)
{
    x /= 8;
    y /= 8;
    return x + y;
}

Here is the relevant x part (signed division):

movl 8(%ebp), %eax
leal 7(%eax), %edx
testl %eax, %eax
cmovs %edx, %eax
sarl $3, %eax

And here is the relevant y part (unsigned division):

movl 12(%ebp), %edx
shrl $3, %edx

Questions:
Answers:

In C++ (and C), signed integer overflow is undefined, whereas unsigned integer overflow is defined to wrap around. Notice that e.g. in gcc, you can use the -fwrapv flag to make signed overflow defined (to wrap around).

Undefined signed integer overflow allows the compiler to assume that overflows don’t happen, which may introduce optimization opportunities. See e.g. this blog post for discussion.

Questions:
Answers:

This will depend on exact implementation. In most cases there will be no difference however. If you really care you have to try all the variants you consider and measure performance.

Questions:
Answers:

unsigned leads to the same or better performance than signed.
Some examples:

  • Division by a constant which is a power of 2 (see also the answer from FredOverflow)
  • Division by a constant number (for example, my compiler implements division by 13 using 2 asm instructions for unsigned, and 6 instructions for signed)
  • Checking whether a number is even (i have no idea why my MS Visual Studio compiler implements it with 4 instructions for signed numbers; gcc does it with 1 instruction, just like in the unsigned case)

short usually leads to the same or worse performance than int (assuming sizeof(short) < sizeof(int)). Performance degradation happens when you assign a result of an arithmetic operation (which is usually int, never short) to a variable of type short, which is stored in the processor’s register (which is also of type int). All the conversions from short to int take time and are annoying.

Note: some DSPs have fast multiplication instructions for the signed short type; in this specific case short is faster than int.

As for the difference between int and long, i can only guess (i am not familiar with 64-bit architectures). Of course, if int and long have the same size (on 32-bit platforms), their performance is also the same.


A very important addition, pointed out by several people:

What really matters for most applications is the memory footprint and utilized bandwidth. You should use the smallest necessary integers (short, maybe even signed/unsigned char) for large arrays.

This will give better performance, but the gain is nonlinear (i.e. not by a factor of 2 or 4) and somewhat unpredictable – it depends on cache size and the relationship between calculations and memory transfers in your application.

Questions:
Answers:

This is pretty much dependent on the specific processor.

On most processors, there are instructions for both signed and unsigned arithmetic, so the difference between using signed and unsigned integers comes down to which one the compiler uses.

If any of the two is faster, it’s completely processor specific, and most likely the difference is miniscule, if it exists at all.

Questions:
Answers:

The performance difference between signed and unsigned integers is actually more general than the acceptance answer suggests. Division of an unsigned integer by any constant can be made faster than division of a signed integer by a constant, regardless of whether the constant is a power of two. See http://ridiculousfish.com/blog/posts/labor-of-division-episode-iii.html

At the end of his post, he includes the following section:

A natural question is whether the same optimization could improve signed division; unfortunately it appears that it does not, for two reasons:

The increment of the dividend must become an increase in the magnitude, i.e. increment if n > 0, decrement if n < 0. This introduces an additional expense.

The penalty for an uncooperative divisor is only about half as much in signed division, leaving a smaller window for improvements.

Thus it appears that the round-down algorithm could be made to work in signed division, but will underperform the standard round-up algorithm.

Questions:
Answers:

In short, don’t bother before the fact. But do bother after.

If you want to have performance you have to use performance optimizations of a compiler which may work against common sense. One thing to remember is that different compilers can compile code differently and they themselves have different sorts of optimizations. If we’re talking about a g++ compiler and talking about maxing out it’s optimization level by using -Ofast, or at least an -O3 flag, in my experience it can compile long type into code with even better performance than any unsigned type, or even just int.

This is from my own experience and I recommend you to first write your full program and care about such things only after that, when you have your actual code on your hands and you can compile it with optimizations to try and pick the types that actually perform best. This is also a good very general suggestion about code optimization for performance, write quickly first, try compiling with optimizations, tweak things to see what works best. And you should also try using different compilers to compile your program and choosing the one that outputs the most performant machine code.

An optimized multi-threaded linear algebra calculation program can easily have a >10x performance difference finely optimized vs unoptimized. So this does matter.

Optimizer output contradicts logic in plenty of cases. For example, I had a case when a difference between a[x]+=b and a[x]=b changed program execution time almost 2x. And no, a[x]=b wasn’t the faster one.

Here’s for example NVidia stating that for programming their GPUs:

Note: As was already the recommended best practice, signed arithmetic
should be preferred over unsigned arithmetic wherever possible for
best throughput on SMM. The C language standard places more
restrictions on overflow behavior for unsigned math, limiting compiler
optimization opportunities.

Questions:
Answers:

Traditionally int is the native integer format of the target hardware platform. Any other integer type may incur performance penalties.

EDIT:

Things are slightly different on modern systems:

  • int may in fact be 32-bit on 64-bit systems for compatibility reasons. I believe this happens on Windows systems.

  • Modern compilers may implicitly use int when performing computations for shorter types in some cases.

Questions:
Answers:

IIRC, on x86 signed/unsigned shouldn’t make any difference. Short/long, on the other hand, is a different story, since the amount of data that has to be moved to/from RAM is bigger for longs (other reasons may include cast operations like extending a short to long).

Questions:
Answers:

Unsigned integer is advantageous in that you store and treat both as bitstream, I mean just a data, without sign, so multiplication, devision becomes easier (faster) with bit-shift operations

Questions:
Answers:

Not only division by powers of 2 are faster with unsigned type, division by any other values are also faster with unsigned type. If you look at Agner Fog’s Instruction tables you’ll see that unsigned divisions have similar or better performance than signed versions

For example with the AMD K7

╔═════════════╤══════════╤═════╤═════════╤═══════════════════════╗
║ Instruction │ Operands │ Ops │ Latency │ Reciprocal throughput ║
╠═════════════╪══════════╪═════╪═════════╪═══════════════════════╣
║ DIV         │ r8/m8    │ 32  │ 24      │ 23                    ║
║ DIV         │ r16/m16  │ 47  │ 24      │ 23                    ║
║ DIV         │ r32/m32  │ 79  │ 40      │ 40                    ║
║ IDIV        │ r8       │ 41  │ 17      │ 17                    ║
║ IDIV        │ r16      │ 56  │ 25      │ 25                    ║
║ IDIV        │ r32      │ 88  │ 41      │ 41                    ║
║ IDIV        │ m8       │ 42  │ 17      │ 17                    ║
║ IDIV        │ m16      │ 57  │ 25      │ 25                    ║
║ IDIV        │ m32      │ 89  │ 41      │ 41                    ║
╚═════════════╧══════════╧═════╧═════════╧═══════════════════════╝

The same thing applies to Intel Pentium

╔═════════════╤══════════╤══════════════╗
║ Instruction │ Operands │ Clock cycles ║
╠═════════════╪══════════╪══════════════╣
║ DIV         │ r8/m8    │ 17           ║
║ DIV         │ r16/m16  │ 25           ║
║ DIV         │ r32/m32  │ 41           ║
║ IDIV        │ r8/m8    │ 22           ║
║ IDIV        │ r16/m16  │ 30           ║
║ IDIV        │ r32/m32  │ 46           ║
╚═════════════╧══════════╧══════════════╝

Of course those are quite ancient. Newer architectures with more transistors might close the gap but the basic things apply: generally you need more macro ops, more logic, more latency to do a signed division

Leave a Reply

Your email address will not be published. Required fields are marked *