Home » Python » numba – Why is my Python program running faster on the CPU than on the GPU with CUDA?-Exceptionshub

numba – Why is my Python program running faster on the CPU than on the GPU with CUDA?-Exceptionshub

Posted by: admin February 24, 2020 Leave a comment

Questions:

So I was following this tutorial, basically running this code in Python (3.7.4) on Windows 10:

import numpy as np
from timeit import default_timer as timer
from numba import vectorize

@vectorize(['float32(float32, float32)'], target='cuda')
def pow(a, b):
    return a ** b

def main():
    vec_size = 100000000

    a = b = np.array(np.random.sample(vec_size), dtype=np.float32)
    c = np.zeros(vec_size, dtype=np.float32)

    start = timer()
    c = pow(a, b)
    duration = timer() - start

    print(duration)

if __name__ == '__main__':
    main()

The problem: Different results than in the tutorial (which uses a 1050Ti)

When target is set to ‘cuda’, which means it’s running on my GTX 970 (driver 441.41), the program takes ~0.6 seconds to run.
But when set to ‘parallel’, which is multiple cores on the CPU (i5 4690k), it only needs ~0.1 seconds.
Even setting it to ‘cpu’, so it uses only 1 cpu core, it runs faster than the GPU, at ~0.4 seconds.

So have I maybe configured my GPU incorrectly somehow? As far as I can tell, the whole installation of the CUDA toolkit has not gone wrong.

Also, when I run the thing, in Windows Task Manager I see a slight rise of the graph for a short amount of time (Image), so it does appear to use the GPU, but maybe not as ideally as it could.

I don’t really know what I can do here – google didn’t lead me to a solution unfortunately. Any suggestions on what is going on here? Can I test this in another way? In case I cannot solve this: Are there any good alternatives to CUDA?

How to&Answers: