======================================================== Multiprocessing in Python: a guided tour with examples ======================================================== :author: Dave Kuhlman :contact: dkuhlman (at) davekuhlman (dot) org :address: http://www.davekuhlman.org :revision: 1.1a :date: |date| .. |date| date:: %B %d, %Y :Copyright: Copyright (c) 2013 Dave Kuhlman. All Rights Reserved. This software is subject to the provisions of the MIT License http://www.opensource.org/licenses/mit-license.php. :Abstract: This document is an introduction, guide, and how-to on multiprocessing, parallel processing, and distributed processing programming in Python using several different technologies, for example, XML-RPC, IPython parallel processing, and Erlang+Erlport. .. sectnum:: .. contents:: Introduction -- some options and alternatives =============================================== This document is a survey of several different ways of implementing multiprocessing systems in Python. It attempts to provide a small amount of guidance on when it is appropriate and useful to use these different approaches, and when not. Motivation ------------ We all have multi-core machines. It's easy to imagine a home will multiple computers and devices of several different kinds connected on a LAN (local area network) through Ethernet or wireless connections. Most (soon all) of those devices have multiple cores. And, yet most of that power is wasted while many of those cores are idle. So, why do we all have machines with so many unused cores. Because Intel and AMD must compete, and to do so, must give us what appear to be faster machines. They can't give us more cycles (per second), since, if they did, our machines would melt. So, they give us additional cores. They number of transistors goes up, and Moore's law (technically) holds true, but for most of us, that power is largely unused and unusable. Enough ranting ... The alternatives and options discussed in this document are all intended to solve that problem. We have tools that are looking for uses. We need to learn how to put them to fuller use so that next year we can justify buying yet another machine with more cores to add to our home networks. My central goal in writing this document is to enable and encourage more of us to write the software that puts those machines and cores to work. And, note Larry Wall's three virtues of programming, in particular: "Impatience: The anger you feel when the computer is being lazy. This makes you write programs that don't just react to your needs, but actually anticipate them. Or at least pretend to." See: http://threevirtues.com/ Alternatives -- an overview ----------------------------- - XML-RPC -- Remote procedure calls across the network: - https://docs.python.org/2/library/xmlrpclib.html#module-xmlrpclib - https://docs.python.org/2/library/simplexmlrpcserver.html#module-SimpleXMLRPCServer - https://docs.python.org/2/library/docxmlrpcserver.html#module-DocXMLRPCServer - The ``multiprocessing`` module in the Python standard library -- https://docs.python.org/2/library/multiprocessing.html#module-multiprocessing - The ``pp`` (Parallel Python) module -- http://www.parallelpython.com/ - Parallel programming in IPython -- See: http://ipython.org/ipython-doc/dev/parallel/index.html - Parallel processing using a pool of Erlang processes, each of which "contains" a Python process - Message based parallel processing using ZeroMQ and the Python interface to ZeroMQ -- See: (1) http://zeromq.org/; (2) http://zeromq.org/bindings:python Our approach in this document ------------------------------- For each of the above alternatives I'll try to cover: (1) appropriate (and inappropriate) uses; (2) possible use cases; (3) some how-to instruction; and example code. Notice that we will be paying special attention to one specific multiprocessing programming pattern. We want a scheme in which (1) there are multiple servers; (2) there are multiple clients; (3) any client can submit a task (function call) to be evaluated by any available server. You might thing of this pattern as using a *pool* of servers (processes) to which clients can submit (often compute intensive) function calls. XML-RPC ========= XML-RPC is a simple and easy way to get distributed processing. With it, you cat request that a function be called in a Python process on a remote machine and that the result be returned to you. We'll use the modules in the Python standard library. On the server side, we implement conventional Python functions, and then register them with an XML-RPC server. Here is a simple, sample server:: #!/usr/bin/env python """ Synopsis: A simple XML-RPC server. """ #import xmlrpclib from SimpleXMLRPCServer import SimpleXMLRPCServer import inspect class Methods(object): def multiply(self, x, y): return x * y def is_even(n): """Return True if n is even.""" return n % 2 == 0 def is_odd(n): """Return True if n is odd.""" return n % 2 == 1 def listMethods(): """Return a list of supported method names.""" return Supported_methods.keys() def methodSignature(method_name): """Return the signature of a method.""" if method_name in Supported_methods: func = Supported_methods[method_name] return inspect.getargspec(func).args else: return 'Error. Function "{}" not supported.'.format(method_name) def methodHelp(method_name): """Return the doc string for a method.""" if method_name in Supported_methods: func = Supported_methods[method_name] return func.__doc__ else: return 'Error. Function "{}" not supported.'.format(method_name) Supported_methods = { 'is_even': is_even, 'is_odd': is_odd, 'listMethods': listMethods, 'methodSignature': methodSignature, 'methodHelp': methodHelp, } def start(): node = '192.168.0.7' port = 8000 server = SimpleXMLRPCServer((node, port)) print "Listening on {} at port {} ...".format(node, port) for name, func in Supported_methods.items(): server.register_function(func, name) methods = Methods() multiply = methods.multiply server.register_function(multiply, 'multiply') server.register_function(listMethods, "listMethods") server.serve_forever() def main(): start() if __name__ == '__main__': main() And, on the client side, it's simply a matter of creating a "proxy" and doing what looks like a standard Python function call through that proxy. Here is a simple, sample client:: #!/usr/bin/env python """ Synopsis: A simple XML-RPC client. """ import xmlrpclib def discover_methods(proxy): method_names = proxy.listMethods() for method_name in method_names: sig = proxy.methodSignature(method_name) help = proxy.methodHelp(method_name) print 'Method -- {}'.format(method_name) print ' Signature: {}'.format(sig) print ' Help : {}'.format(help) def request(proxy, ival): ret_ival = str(proxy.is_even(ival)) print "{0} is even: {1}".format(ival, ret_ival) def main(): node = '192.168.0.7' port = 8000 url = "http://{}:{}".format(node, port) proxy = xmlrpclib.ServerProxy(url) print "Requests sent to {} at port {} ...".format(node, port) discover_methods(proxy) for ival in range(10): request(proxy, ival) answer = proxy.multiply(5, 3) print 'multiply answer: {}'.format(answer) if __name__ == '__main__': main() Notes: - If you only want to access this XML-RPC server *only* from the local machine, then you might create the server with the following:: server = SimpleXMLRPCServer(('localhost', 8000)) And, in the client, create the proxy with the following:: proxy = xmlrpclib.ServerProxy(http://localhost:8000) - Notice that in the server, we can expose a method from within a class, also. FYI, I've been able to run the above XML-RPC scripts across my LAN. In fact, I've run the server on one of my desktop machines, and I connect via WiFi from the client on my Android smart phone using QPython. For more information about QPython see: http://qpython.com/. IPython parallel computing ============================ There is documentation here: http://ipython.org/ipython-doc/dev/parallel/index.html One easy way to install Python itself and IPython, SciPy, Numpy, etc. is to install the Anaconda toolkit. You can find out about it here: http://www.continuum.io/ and here https://store.continuum.io/cshop/anaconda/. We'd like to know how to submit tasks for parallel execution. Here is a bit of instruction on how to do it. 1. Create the cluster. Use the ``ipcluster`` executable from the IPython parallel processing. Example:: $ ipcluster start -n 4 2. Create a client and a load balanced view. Example:: client = Client() view = client.load_balanced_view() 2. Submit several tasks. Example:: r1 = view.apply(f1, delay, value1, value2) r2 = view.apply(f1, delay, value1 + 1, value2 + 1) 3. Get the results. Example:: print r1.result, r2.result Here is the code. Example:: from IPython.parallel import Client def test(view, delay, value1, value2): r1 = view.apply(f1, delay, value1, value2) r2 = view.apply(f1, delay, value1 + 1, value2 + 1) r3 = view.apply(f1, delay, value1 + 2, value2 + 2) r4 = view.apply(f1, delay, value1 + 3, value2 + 3) print 'waiting ...' return r1.result, r2.result, r3.result, r4.result def f1(t, x, y): import time time.sleep(t) r = x + y + 3 return r def main(): client = Client() view = client.load_balanced_view() results = test(view, 5, 3, 4) print 'results:', results if __name__ == '__main__': main() Notes: - This example asks parallel python to execute four function calls in parallel in four separate processes. - Because these function calls are executed in separate processes, they avoid conflict over Python's GIL (global interpreter lock). - We started the cluster with the default scheduler scheme, which is "least load". For other schemes do the following and look for "scheme":: $ ipcontroller help Also see: http://ipython.org/ipython-doc/dev/parallel/parallel_task.html#schedulers Remote machines and engines ----------------------------- Submitting jobs to be run on IPython engines on a remote machine turns out, in some cases at least, to be very easy. Do the following: - Start the IPython controller and engines on the remote machine. For example:: $ ipcluster start -n 4 - Copy your client profile ``~/.ipython/profile_default/security/ipcontroller-client.json`` from the remote machine to the ``security/`` directory under the profile you will be using on the local machine. - When you create your client, use something like the following:: client = Client(sshserver='your_user_name@192.168.0.7') But change the user name and IP address to that of the remote machine. There is more information on using IPython parallel computing with remote hosts here: http://ipython.org/ipython-doc/dev/parallel/parallel_process.html#using-the-ipcontroller-and-ipengine-commands Decorators for parallel functions ----------------------------------- You can also create parallel functions by using a Python decorator. Example:: from IPython.parallel import Client import numpy as np Client = Client(sshserver='remote_user_name.168.0.7') Dview = Client[:] @Dview.parallel(block=True) def parallel_multiply(a, b): return a * b def main(): array1 = np.random.random((64, 48)) for count in range(10): result_remote = parallel_multiply(array1, array1) print result_remote if __name__ == '__main__': main() For more information on IPython remote function decorators, see: http://ipython.org/ipython-doc/dev/parallel/parallel_multiengine.html#remote-function-decorators The multiprocessing module from the Python standard library ============================================================= The python standard library contains the module ``multiprocessing``. That module (it's actually a Python package or a library that acts like a module) contains some reasonable support for creating and running multiple processes implemented in Python and for communicating between those processes using ``Queues`` and ``Pipes`` (also in the ``multiprocessing`` module). You can learn more about that module here: https://docs.python.org/2/library/multiprocessing.html Be aware that the ``multiprocessing`` module creates separate *operating system* processes. Each one runs in its own memory space; each one has its own Python interpreter; each one has its own GIL (global interpreter lock); each one has its own copies of imported modules; and each module in each of these multiple processes has its own copies of global variables. The documentation has examples. And, here is some sample code that is a little more complex:: #!/usr/bin/env python """ synopsis: Example of the use of the Python multiprocessing module. usage: python multiprocessing_module_01.py """ import argparse import operator from multiprocessing import Process, Queue import numpy as np import py_math_01 def run_jobs(args): """Create several processes, start each one, and collect the results. """ queue01 = Queue() queue02 = Queue() queue03 = Queue() queue04 = Queue() m = 4 n = 3 process01 = Process(target=f_multiproc, args=(queue01, 'process01', m, n)) process02 = Process(target=f_multiproc, args=(queue02, 'process02', m, n)) process03 = Process(target=f_multiproc, args=(queue03, 'process03', m, n)) process04 = Process(target=f_multiproc, args=(queue04, 'process04', m, n)) process01.start() process02.start() process03.start() process04.start() raw_input('Check for existence of multiple processes, then press Enter') process01.join() process02.join() process03.join() process04.join() raw_input('Check to see if they disappeared, then press Enter') print queue01.get() print queue02.get() print queue03.get() print queue04.get() def f_multiproc(queue, processname, m, n): seed = reduce(operator.add, [ord(x) for x in processname], 0) np.random.seed(seed) result = py_math_01.test_01(m, n) result1 = result.tolist() result2 = 'Process name: {}\n{}\n-----'.format(processname, result1) queue.put(result2) def main(): parser = argparse.ArgumentParser( description=__doc__, formatter_class=argparse.RawDescriptionHelpFormatter,) args = parser.parse_args() run_jobs(args) if __name__ == '__main__': #import ipdb; ipdb.set_trace() main() The above code does the following: 1. Create a number of processes (instances of class ``Process``) and a queue (instance of class ``Queue``) for each one. 2. Start each process. 3. Wait for the user to press enter. This gives the user time to check to see that separate processes have actually been created. On Linux, I use ``htop`` or ``top`` to view processes. On MS Windows, you should be able to use the ``Task Manager``. 4. Wait for each process to finish by calling ``process.join()``. 5. Get and print the contents that each process put in its queue. Some benefits of using the ``multiprocessing`` module: - For CPU intensive computations, you will be able to make use of multiple cores and multiple CPUs concurrently. This is because each process has its own Python interpreter with its own GIL (global interpreter lock). - For processes that are I/O bound, you should also be able to get the speed-up from concurrency. - Since each process runs independently in its own memory space, your processes will be safer from the possibility of stepping on each other's values. However, the ``multiprocessing`` module provides synchronization primitives (for example, class ``multiprocessing.Lock``) and a facility for shared memory across processes (the ``multiprocessing.Array`` class), if you really want that kind of problem. pp -- parallel python ======================= Information about ``Parallel Python`` is here: http://www.parallelpython.com/ Here is a description from the ``Parallel Python`` Web site: PP is a python module which provides mechanism for parallel execution of python code on SMP (systems with multiple processors or cores) and clusters (computers connected via network). It is light, easy to install and integrate with other python software. PP is an open source and cross-platform module written in pure python Features: - Parallel execution of python code on SMP and clusters - Easy to understand and implement job-based parallelization technique (easy to convert serial application in parallel) - Automatic detection of the optimal configuration (by default the number of worker processes is set to the number of effective processors) - Dynamic processors allocation (number of worker processes can be changed at run-time) - Low overhead for subsequent jobs with the same function (transparent caching is implemented to decrease the overhead) - Dynamic load balancing (jobs are distributed between processors at run-time) - Fault-tolerance (if one of the nodes fails tasks are rescheduled on others) - Auto-discovery of computational resources - Dynamic allocation of computational resources (consequence of auto-discovery and fault-tolerance) - SHA based authentication for network connections - Cross-platform portability and interoperability (Windows, Linux, Unix, Mac OS X) - Cross-architecture portability and interoperability (x86, x86-64, etc.) - Open source The examples provided with the distribution work well. But, the project does not seem very active. ZeroMQ and zmq/pyzmq ====================== Here is a quote: ØMQ in a Hundred Words ØMQ (also known as ZeroMQ, 0MQ, or zmq) looks like an embeddable networking library but acts like a concurrency framework. It gives you sockets that carry atomic messages across various transports like in-process, inter-process, TCP, and multicast. You can connect sockets N-to-N with patterns like fan-out, pub-sub, task distribution, and request-reply. It's fast enough to be the fabric for clustered products. Its asynchronous I/O model gives you scalable multi-core applications, built as asynchronous message-processing tasks. It has a score of language APIs and runs on most operating systems. ØMQ is from iMatix and is LGPLv3 open source. [Pieter Hintjens; http://zguide.zeromq.org/page:all] ``pyzmq``, which provides ``zmq``, is the Python bindings for ZeroMQ. Note that ZeroMQ is underneath IPython parallel. So, it may be appropriate to think of IPython parallel computing as a high level wrapper around ZeroMQ. There is a good set of examples written in a number of different languages for ZeroMQ. To get them, download the ZeroMQ guide (https://github.com/imatix/zguide.git), then (for us Python programmers) look in ``zguide/examples/Python``. In order to use ``pyzmq`` and to run the examples, you will need to install: - ZeroMQ -- http://zeromq.org/ - Python bindings for ZeroMQ -- http://zeromq.org/bindings:python You can also find it at the Python Package Index: https://pypi.python.org/pypi/pyzmq For my testing with Python, I used the Anaconda Python distribution, which contains support for ``zmq``. We should note that with ZeroMQ, our programming is in some sense using the Actor model, as does Erlang. This is the Actor model in the sense that (1) we are creating separate processes which do not share (in memory) resources and (2) we communicate between those processes by sending messages and waiting on message queues. ZeroMQ differs from Erlang, with respect to the Actor model in the following ways: - In Erlang the processes are internal to Erlang; in ZeroMQ the different processes are operating system processes. Therefore, in ZeroMQ the processes are more heavy weight and slow to start and stop than those in Erlang. However, notice that when we use an Erlang port or use Erlport to call Python code from Erlang, we do in fact create separate OS processes. - One way to start those ZeroMQ processes is at the shell (e.g., ``bash``) command line. I can imagine starting multiple processes from within Python code using the `subprocess module `_ or within Node.js code using the `Child Process (child_process) `_ module. But, whereas in Erlang, it is reasonable to start up hundreds of concurrent processes, with ZeroMQ, we are unlikely to want to do that. - In Erlang the processes each have an identity, and we send messages to a process; in Erlang, the queues are anonymous. In ZeroMQ, the queues have an identity, and we send messages to a queue and receive messages from a queue; in ZeroMQ, the processes are anonymous, and, in fact, our code often does not even know how many processes have connected to the far end of one of our queues. Hello world server and client ------------------------------- Here is a "Hello, World" server that uses ``pyzmq``:: #!/usr/bin/env python """ Hello World server in Python Binds REP socket to tcp://*:5555 Expects b"Hello" from client, replies with b"World" """ import sys import time import zmq def main(): args = sys.argv[1:] if len(args) != 1: sys.exit('usage: python hwserver.py