by Matias Guijarro
Simultaneous execution
Potentially interacting tasks
Examples
Concurrency implies multitasking
If only 1 CPU is available, the only way to run multiple tasks is by rapidly switching between them
For I/O, tasks must wait (sleep)
Behind the scenes, the underlying system will carry out I/O operation and wake the task when it's finished
Traditional multithreading
"Actor model" (from Erlang)
Async or Evented I/O
What most programmers think of when they hear about "concurrent programming"
Independent task running inside a parent program
Simultaneous access to objects
Often a source of unspeakable peril
Threads share all data in your program
Thread scheduling is non-deterministic
Operations often take several steps and might be interrupted mid-stream (non-atomicity)
Thus, access to any kind of shared data is also non-deterministic !
Race conditions
Threads needs to be synchronized
Global Interpreter Lock
Whenever a thread run, it holds the GIL
Multiprocessing module is part of standard Python since 2.6
Messaging implies serialization
A task is "CPU bound" if it spends most of its time with little I/O
Examples
A task is "I/O bound" if it spends most of its time waiting for I/O
Examples
Most programs are I/O bound
"non-blocking I/O"
permits other processing to continue before I/O operation has completed
more scalable than threads or processes (much less memory consumption)
usually gives great response time, latency and CPU usage on I/O bound programs
one single main thread (SPED: Single Process Event Driven)
No silver bullet, though
Recent progress on Linux kernel since 2.6 (better threading) and 64-bits architecture (more memory) make it less interesting than before in term of pure performance
Files are taken from the W3C web site: http://www.w3.org
1 files = ("/TR/html401/html40.txt",
2 "/TR/2002/REC-xhtml1-20020801/xhtml1.pdf",
3 "/TR/REC-html32.html",
4 "/TR/2000/REC-DOM-Level-2-Core-20001113/DOM2-Core.txt")
1 class download_task(threading.Thread):
2 def __init__(self, host, filepath, proxy="proxy.esrf.fr:3128"):
3 threading.Thread.__init__(self)
4
5 self.host = host
6 self.filepath = filepath
7 self.proxy = proxy
8 self.file_contents = ""
9
10 def run(self):
11 s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
12 proxy_host, proxy_port = self.proxy.split(":")
13 s.connect((proxy_host, int(proxy_port)))
14
15 s.send("GET http://"+self.host+self.filepath+" HTTP/1.0\r\n\r\n")
16
17 buf = []
18 while True:
19 data = s.recv(1024)
20 if not data:
21 break
22 buf.append(data)
23
24 s.close()
25 self.file_contents = "".join(buf)
26
27 tasks = []
28 for filepath in files:
29 tasks.append(download_task("www.w3.org", filepath, "proxy.esrf.fr:3128"))
30 [task.start() for task in tasks]
31 [task.join() for task in tasks]
Use of the 'asyncore' standard Python module. Callbacks are fired when socket is ready to do a non-blocking read or write operation.
1 class download_task(asyncore.dispatcher):
2 def __init__(self, host, filepath, proxy="proxy.esrf.fr:3128"):
3 asyncore.dispatcher.__init__(self)
4
5 self.host = host
6 self.filepath = filepath
7 self.proxy = proxy
8 self.buffer = []
9 self.file_contents = ""
10 self.request_sent = False
11
12 def start(self):
13 self.create_socket(socket.AF_INET, socket.SOCK_STREAM)
14 proxy_host, proxy_port = self.proxy.split(":")
15 self.connect((proxy_host, int(proxy_port)))
16
17 def handle_close(self):
18 self.close()
19 self.file_contents = "".join(self.buffer)
20
21 def handle_read(self):
22 data = self.recv(1024)
23 self.buffer.append(data)
24
25 def writable(self):
26 return not self.request_sent
27
28 def handle_write(self):
29 self.send("GET http://"+self.host+self.filepath+" HTTP/1.0\r\n\r\n")
30 self.request_sent = True
31
32 tasks = []
33 for filepath in files:
34 tasks.append(download_task("www.w3.org", filepath, "proxy.esrf.fr:3128"))
35 [task.start() for task in tasks]
36
37 asyncore.loop()
Fully asynchronous without callbacks
1 def download_task(host, filepath, proxy="proxy.esrf.fr:3128"):
2 s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
3 proxy_host, proxy_port = proxy.split(":")
4 self.connect((proxy_host, int(proxy_port)))
5
6 s.send("GET http://"+host+filepath+" HTTP/1.0\r\n\r\n")
7
8 buf = []
9 while True:
10 data = s.recv(1024)
11 if not data:
12 break
13 buf.append(data)
14
15 s.close()
16 return "".join(buf)
17
18 tasks = []
19 for filepath in files:
20 tasks.append(gevent.spawn(download_task, "www.w3.org", filepath, "proxy.esrf.fr:3128"))
21 gevent.joinall(tasks)
greenlet provides coroutines to Python via a C extension module
spin-off of Stackless, a version of Python supporting micro-threads
the idea of coroutine is from a 1963 paper from Melvin Conway
gevent trick #1: yield automatically when doing blocking I/O (or when 'sleeping')
Execution flow example
Greenlets execution is deterministic. No preemption.
Cooperative scheduling
Python allows for most objects to be modified at runtime including modules, classes, and even functions
gevent patches blocking system calls in the standard library including those in socket, ssl, threading and select modules to instead behave cooperatively
Beware