Home » Python » numpy – Is it safe to reference SharedMemory instances straight when using the multiprocessing.shared_memory library in Python 3.8?-Exceptionshub

numpy – Is it safe to reference SharedMemory instances straight when using the multiprocessing.shared_memory library in Python 3.8?-Exceptionshub

Posted by: admin February 24, 2020 Leave a comment

Questions:

I started using the multiprocessing.shared_memory library as described in the documentation.

That is, I create a numpy array and write it onto the SharedMemory object in one process:

import multiprocessing.managers
import numpy

a = numpy.array([1, 2, 3, 4, 5, 6])

smm = multiprocessing.managers.SharedMemoryManager()
smm.start()

sm = smm.SharedMemory(size=a.nbytes)

b = numpy.ndarray(a.shape, dtype=a.dtype, buffer=sm.buf)
b[:] = a[:]

print(sm.name)
psm_ad5d71e1

And in another process I refer to the SharedMemory using its name and access the numpy array that way:

import multiprocessing.shared_memory
import numpy
sm = multiprocessing.shared_memory.SharedMemory(name="psm_ad5d71e1")
c = numpy.ndarray((6,), dtype=numpy.int64, buffer=sm.buf)

print(c)
[1 2 3 4 5 6]

So far so good. But in my example, when the numpy array grows far larger (GBs in size) the interpreter keeps dying (as in it just quits without throwing any exceptions) when I dereference the numpy array created from the SharedMemory object, that is, the print(c) line.

After a long time of experimenting I realized if I referred to the MemoryObject straight everything worked as expected:

import multiprocessing.managers
import numpy
import pickle

a = numpy.array([1, 2, 3, 4, 5, 6])

smm = multiprocessing.managers.SharedMemoryManager()
smm.start()

sm = smm.SharedMemory(size=a.nbytes)

b = numpy.ndarray(a.shape, dtype=a.dtype, buffer=sm.buf)

b[:] = a[:]

with open('sm.pcl', 'wb') as f:
    pickle.dump(sm, f)

And in the other process I do this instead:

import multiprocessing.shared_memory
import numpy
import pickle

with open('sm.pcl', 'rb') as f:
    sm = pickle.load(f)
c = numpy.ndarray((6,), dtype=numpy.int64, buffer=sm.buf)

print(c)
[1 2 3 4 5 6]

(In my actual example I don’t pickle the SharedMemory object manually, that’s done automatically as I pass the SharedMemory object in the arguments to the separate process.)

So my question is simply, is this safe? Or am I using the library in an unintended (and unsafe) way?

If I were to guess, it should be safe as the SharedMemory object is a pointer in itself. So I reckon it should be the same thing to refer to the SharedMemory object itself, or reference it through its name attribute using the SharedMemory function…

But I would love to have someone who actually knows comment! Thanks in advance!

How to&Answers: