Multiprocessing Basics in Python
Introduction
Multiprocessing in Python allows you to run multiple processes simultaneously, making your programs faster and more efficient by utilizing multiple CPU cores.
This tutorial covers the basics of Python's multiprocessing module, including how to create processes, communicate between them, and manage their lifecycle.
Parallelism is not about doing more things at once, but about doing things faster.
What is Multiprocessing?
Multiprocessing is a technique to run multiple processes concurrently, each with its own Python interpreter and memory space.
Unlike threading, multiprocessing bypasses Python's Global Interpreter Lock (GIL), enabling true parallel execution on multi-core systems.
- Each process runs independently.
- Processes do not share memory by default.
- Useful for CPU-bound tasks.
The multiprocessing Module
Python provides the multiprocessing module to create and manage processes easily.
This module offers classes and functions to spawn processes, share data, and synchronize execution.
- Process class to create new processes.
- Queue and Pipe for inter-process communication.
- Pool for managing a pool of worker processes.
Creating a Process
You can create a new process by instantiating the Process class and passing a target function to run.
Start the process with the start() method and wait for it to finish with join().
Inter-Process Communication
Since processes have separate memory, they need special mechanisms to exchange data.
multiprocessing provides Queue and Pipe objects for safe communication.
- Queue allows multiple producers and consumers.
- Pipe creates a two-way communication channel.
Example: Basic Multiprocessing
Let's see a simple example that runs two processes concurrently, each printing messages.
Examples
import multiprocessing
import time
def worker(name):
for i in range(3):
print(f'Worker {name} is running iteration {i}')
time.sleep(1)
if __name__ == '__main__':
p1 = multiprocessing.Process(target=worker, args=('A',))
p2 = multiprocessing.Process(target=worker, args=('B',))
p1.start()
p2.start()
p1.join()
p2.join()
print('Both processes finished.')This example creates two processes running the worker function concurrently. Each process prints its name and iteration, demonstrating parallel execution.
Best Practices
- Use multiprocessing for CPU-bound tasks to improve performance.
- Avoid sharing state between processes; use communication primitives instead.
- Always protect the entry point of the program with if __name__ == '__main__' to prevent unintended process spawning.
- Use Pool for managing multiple worker processes efficiently.
Common Mistakes
- Not using if __name__ == '__main__' guard, causing recursive process creation.
- Trying to share mutable objects directly between processes without synchronization.
- Ignoring process termination and not calling join(), leading to orphaned processes.
- Using multiprocessing for I/O-bound tasks where threading might be more appropriate.
Hands-on Exercise
Create a Multiprocessing Counter
Write a Python program that starts 4 processes, each incrementing a shared counter 1000 times using multiprocessing.Value and locks to synchronize access.
Expected output: The final counter value should be 4000 after all processes finish.
Hint: Use multiprocessing.Value for shared memory and multiprocessing.Lock to avoid race conditions.
Interview Questions
What is the difference between threading and multiprocessing in Python?
InterviewThreading runs multiple threads within the same process and memory space but is limited by the Global Interpreter Lock (GIL), which prevents true parallel execution of Python bytecode. Multiprocessing runs separate processes with independent memory, allowing true parallelism on multiple CPU cores.
Why do we need the if __name__ == '__main__' guard when using multiprocessing?
InterviewThis guard prevents the child processes from recursively spawning new processes when the module is imported. It ensures that the multiprocessing code runs only when the script is executed directly.
Summary
Multiprocessing in Python enables parallel execution by running multiple processes independently.
The multiprocessing module provides tools to create processes, communicate between them, and manage their lifecycle.
Proper use of multiprocessing can significantly improve performance for CPU-bound tasks.
FAQ
Can multiprocessing be used on all operating systems?
Yes, the multiprocessing module works on Windows, macOS, and Linux, but process spawning behavior differs slightly between platforms.
Is multiprocessing suitable for I/O-bound tasks?
Multiprocessing can be used for I/O-bound tasks, but threading or asynchronous programming is often more efficient for those scenarios.
How do processes communicate in Python multiprocessing?
Processes communicate using special objects like Queue and Pipe provided by the multiprocessing module, which allow safe data exchange between processes.
