So I was following this tutorial, basically running this code in Python (3.7.4) on Windows 10:
import numpy as np from timeit import default_timer as timer from numba import vectorize @vectorize(['float32(float32, float32)'], target='cuda') def pow(a, b): return a ** b def main(): vec_size = 100000000 a = b = np.array(np.random.sample(vec_size), dtype=np.float32) c = np.zeros(vec_size, dtype=np.float32) start = timer() c = pow(a, b) duration = timer() - start print(duration) if __name__ == '__main__': main()
The problem: Different results than in the tutorial (which uses a 1050Ti)
When target is set to ‘cuda’, which means it’s running on my GTX 970 (driver 441.41), the program takes ~0.6 seconds to run.
But when set to ‘parallel’, which is multiple cores on the CPU (i5 4690k), it only needs ~0.1 seconds.
Even setting it to ‘cpu’, so it uses only 1 cpu core, it runs faster than the GPU, at ~0.4 seconds.
So have I maybe configured my GPU incorrectly somehow? As far as I can tell, the whole installation of the CUDA toolkit has not gone wrong.
Also, when I run the thing, in Windows Task Manager I see a slight rise of the graph for a short amount of time (Image), so it does appear to use the GPU, but maybe not as ideally as it could.
I don’t really know what I can do here – google didn’t lead me to a solution unfortunately. Any suggestions on what is going on here? Can I test this in another way? In case I cannot solve this: Are there any good alternatives to CUDA?