Lambda Multiprocessing

A python library I wrote for multiprocessing in AWS Lambda

Servers

|

If you deploy Python code to an AWS Lambda function, the multiprocessing functions in the standard library such as multiprocessing.Pool.map will not work.

For example:

from multiprocessing import Pool
def func(x):
    return x*x
args = [1,2,3]
with Pool() as p:
    result = p.map(func, args)

will give you:

OSError: [Errno 38] Function not implemented

This is because AWS Lambda functions are very bare bones, and have no shared memory device (/dev/shm).

There is a workaround using Pipes and Processes. Amazon documented it in this blog post. However that example is very much tied to the work being done, it doesn't have great error handling, and is not structured in the way you'd expect when using the normal multiprocessing library.

I have written the lambda_multiprocessing library as a drop-in replacement for multiprocessing.Pool which works in AWS Lambda functions using this workaround.

It is unit tested, handlers errors properly, and matches the interface of multiprocessing.Pool.

To install it, run pip install lambda_multiprocessing. Then you can use it just like the normal Pool, for example:

from lambda_multiprocessing import Pool
def func(x):
    return x*x
args = [1,2,3]
with Pool() as p:
    result = p.map(func, args)

If you use moto to unit test your code, you cannot do multiprocessing, because moto is not concurrency safe. As a workaround, pass 0 as the argument to Pool when unit testing, and a None or positive integer when really deployed. This way when unit testing you'll get the interface of Pool, but everything will actually run in the main thread, to keep moto happy.

To read more, visit the GitHub repository: