Introduction to gevent

One of the best kept secrets of Python

by Matias Guijarro

ESRF Software Group Lab presentation, 11/02/2013

Presenter Notes

A refresher on Concurrency

Presenter Notes

Concurrency

  • Simultaneous execution

  • Potentially interacting tasks

  • Examples

    • a network server that communicates with several hundred clients all connected at once
    • a big number crunching job that spreads its work across multiple CPUs

Presenter Notes

Multitasking

  • Concurrency implies multitasking

  • If only 1 CPU is available, the only way to run multiple tasks is by rapidly switching between them

Presenter Notes

Parallel processing

  • In the case of multiple CPUs

  • If total number of tasks exceeds the number of CPUs, then each CPU also multitasks

Presenter Notes

Task execution

  • All tasks execute by alternating between CPU processing and I/O handling

  • For I/O, tasks must wait (sleep)

  • Behind the scenes, the underlying system will carry out I/O operation and wake the task when it's finished

Presenter Notes

Overview of Concurrency models

Presenter Notes

Ways to Concurrency

  • Traditional multithreading

    • OS threads
    • Shared memory, locks, etc.
  • "Actor model" (from Erlang)

    • Multiple processes, share nothing
    • Messaging
  • Async or Evented I/O

    • I/O loop + callback chains

Presenter Notes

OS Threads

  • What most programmers think of when they hear about "concurrent programming"

  • Independent task running inside a parent program

    • relatively lightweight
    • share the memory and state of its parent
    • gets its own stack
    • independent flow of execution

Presenter Notes

Programming with threads is hard

Presenter Notes

Really hard

Presenter Notes

Shared memory

  • Tasks may run in the same memory space

  • Simultaneous access to objects

  • Often a source of unspeakable peril

Presenter Notes

Access to Shared Data

  • Threads share all data in your program

  • Thread scheduling is non-deterministic

  • Operations often take several steps and might be interrupted mid-stream (non-atomicity)

  • Thus, access to any kind of shared data is also non-deterministic !

Presenter Notes

Problems with threads

  • Race conditions

    • The corruption of shared data due to thread scheduling is often known as a "race condition"
    • It's quite diabolical--a program may produce slightly different results each time it runs
    • Or it may just flake out mysterioulsy once every two weeks...
  • Threads needs to be synchronized

    • Synchronization primitives: locks, semaphores, mutexes
    • A lot harder than it seems => deadlocks, nasty corner cases

Presenter Notes

Problems with threads with Python

  • Global Interpreter Lock

    • Only one Python thread can execute in the interpreter at once
    • GIL ensures each thread gets exclusive access to entire interpreter
  • Whenever a thread run, it holds the GIL

    • Impossible to have true parallelism
    • Except when having C extensions that release the GIL

Presenter Notes

The Actor Model (Erlang) "in Python"

  • An alternative to threads is to run multiple independent copies of the Python interpreter
    • In separate processes
    • Possibly on different machines
    • Get the interpreters to communicate through "message passing" (IPC)
      • Pipes, FIFOs, memory mapped regions, sockets, etc.

Presenter Notes

Multiprocessing module

  • Multiprocessing module is part of standard Python since 2.6

    • It mirrors Python's threading API
  • Messaging implies serialization

    • convert Python objects to a byte stream
    • done through the standard "pickle" module

Presenter Notes

The third path to Concurrency: Async. or Evented I/O

Presenter Notes

CPU bound tasks

  • A task is "CPU bound" if it spends most of its time with little I/O

  • Examples

    • Crunching big matrices
    • Image processing

Presenter Notes

I/O bound tasks

  • A task is "I/O bound" if it spends most of its time waiting for I/O

  • Examples

    • Reading input from user
    • GUI
    • Networking
    • File processing
  • Most programs are I/O bound

Presenter Notes

Async. or Evented I/O

  • "non-blocking I/O"

  • permits other processing to continue before I/O operation has completed

    • one of the main function of Operating Systems (e.g old "select" sys. call)
  • more scalable than threads or processes (much less memory consumption)

  • usually gives great response time, latency and CPU usage on I/O bound programs

  • one single main thread (SPED: Single Process Event Driven)

    • lots of troubles are avoided, easier to debug

Presenter Notes

Async. or Evented I/O

  • No silver bullet, though

    • a loop has to run to dispatch I/O operations
    • callbacks "spaghetti" make code less readable
  • Recent progress on Linux kernel since 2.6 (better threading) and 64-bits architecture (more memory) make it less interesting than before in term of pure performance

Presenter Notes

What is the best approach ?

Presenter Notes

Depends on the problem :)

Presenter Notes

The C10k problem (D. Kegel, 1999)

  • Comes from web servers world (I/O bound process)
    • How to handle 10.000 simultaneous connections ?

Presenter Notes

The C10k problem (D. Kegel, 1999)

  • Since 1999, solutions have emerged
    • 2002: Nginx, Lighttpd
      • successfully uses the Asynchronous I/O paradigm
      • 1 single thread to serve them all
    • 2003: "Why events are a bad idea (for High-concurrency servers)"
      • paper from Behren, Condit and Brewer (University of California, Berkeley)
      • read the paper! Conclusions:
        • weaknesses of threads (in term of performance) are artifacts of current implementations
        • compilers will evolve to help with thread safety issues
        • threads => classic programming model (no I/O callbacks)

Presenter Notes

Presenter Notes

Presenter Notes

2nd part: gevent

Presenter Notes

What is gevent ?

  • Python concurrency library

  • based around:

  • clean API for a variety of concurrency and network related tasks

Presenter Notes

Talk is cheap - show me the code

  • Download of 4 files over the Internet
    • Using threads
    • Using async I/O
    • Using gevent

Files are taken from the W3C web site: http://www.w3.org

1 files = ("/TR/html401/html40.txt", 
2          "/TR/2002/REC-xhtml1-20020801/xhtml1.pdf",
3          "/TR/REC-html32.html",
4          "/TR/2000/REC-DOM-Level-2-Core-20001113/DOM2-Core.txt")

Presenter Notes

Threaded version

 1 class download_task(threading.Thread):
 2   def __init__(self, host, filepath, proxy="proxy.esrf.fr:3128"):
 3     threading.Thread.__init__(self)
 4 
 5     self.host = host
 6     self.filepath = filepath
 7     self.proxy = proxy
 8     self.file_contents = ""
 9 
10   def run(self):
11     s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
12     proxy_host, proxy_port = self.proxy.split(":")
13     s.connect((proxy_host, int(proxy_port)))
14 
15     s.send("GET http://"+self.host+self.filepath+" HTTP/1.0\r\n\r\n")
16 
17     buf = []
18     while True:
19       data = s.recv(1024)
20       if not data:
21         break
22       buf.append(data)
23 
24     s.close()
25     self.file_contents = "".join(buf)
26 
27 tasks = []
28 for filepath in files:
29   tasks.append(download_task("www.w3.org", filepath, "proxy.esrf.fr:3128"))
30 [task.start() for task in tasks]
31 [task.join() for task in tasks]

Presenter Notes

Asynchronous flavor

Use of the 'asyncore' standard Python module. Callbacks are fired when socket is ready to do a non-blocking read or write operation.

 1 class download_task(asyncore.dispatcher):
 2   def __init__(self, host, filepath, proxy="proxy.esrf.fr:3128"):
 3     asyncore.dispatcher.__init__(self)
 4 
 5     self.host = host
 6     self.filepath = filepath
 7     self.proxy = proxy
 8     self.buffer = []
 9     self.file_contents = ""
10     self.request_sent = False
11 
12   def start(self):
13     self.create_socket(socket.AF_INET, socket.SOCK_STREAM)
14     proxy_host, proxy_port = self.proxy.split(":")
15     self.connect((proxy_host, int(proxy_port)))
16 
17   def handle_close(self):
18     self.close()
19     self.file_contents = "".join(self.buffer)
20 
21   def handle_read(self):
22     data = self.recv(1024)
23     self.buffer.append(data)
24 
25   def writable(self):
26     return not self.request_sent
27 
28   def handle_write(self):
29     self.send("GET http://"+self.host+self.filepath+" HTTP/1.0\r\n\r\n")
30     self.request_sent = True
31 
32 tasks = []
33 for filepath in files:
34   tasks.append(download_task("www.w3.org", filepath, "proxy.esrf.fr:3128"))
35 [task.start() for task in tasks]
36 
37 asyncore.loop()

Presenter Notes

Gevent version

Fully asynchronous without callbacks

 1 def download_task(host, filepath, proxy="proxy.esrf.fr:3128"):
 2   s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
 3   proxy_host, proxy_port = proxy.split(":")
 4   self.connect((proxy_host, int(proxy_port)))
 5 
 6   s.send("GET http://"+host+filepath+" HTTP/1.0\r\n\r\n")
 7 
 8   buf = []
 9   while True:
10     data = s.recv(1024)
11     if not data:
12       break
13     buf.append(data)
14 
15   s.close()
16   return "".join(buf)
17 
18 tasks = []
19 for filepath in files:
20   tasks.append(gevent.spawn(download_task, "www.w3.org", filepath, "proxy.esrf.fr:3128"))
21 gevent.joinall(tasks)

Presenter Notes

How does it work ?

Presenter Notes

Presenter Notes

greenlet

  • greenlet provides coroutines to Python via a C extension module

  • spin-off of Stackless, a version of Python supporting micro-threads

    • micro-threads are coroutines + a scheduler
  • the idea of coroutine is from a 1963 paper from Melvin Conway

    • like a normal subroutine, except that it has yielding points instead of a single return exit
    • when yielding, execution goes to another coroutine
  • gevent trick #1: yield automatically when doing blocking I/O (or when 'sleeping')

    • libev provides efficient async. I/O and a scheduler

Presenter Notes

greenlet

Execution flow example

Greenlets execution is deterministic. No preemption.

Cooperative scheduling

Presenter Notes

Presenter Notes

gevent trick #2 to the rescue

Presenter Notes

Monkey-patching

from gevent import monkey; monkey.patch_all()

  • Python allows for most objects to be modified at runtime including modules, classes, and even functions

    • this is generally an astoudingly bad idea!!!
    • nevertheless in extreme situations where a library needs to alter the fundamental behavior of Python itself monkey patches can be used
  • gevent patches blocking system calls in the standard library including those in socket, ssl, threading and select modules to instead behave cooperatively

  • Beware

    • libraries that wrap C libraries
      • it is possible to patch those: greenify (but requires recompiling)
    • disk I/O

Presenter Notes

Presenter Notes

gevent features

  • Greenlet objects (aka tasks)
    • spawn, spawn_later, pause, kill, join...
    • ready, link, get
      • can even be used to implement futures

Presenter Notes

gevent features

Timeouts

1 with gevent.Timeout(5):
2   response = urllib2.urlopen(url)
3   for line in response:
4     print line
5 # raises Timeout exception if not done after 5 seconds

Presenter Notes

gevent features

  • API
    • Event & AsyncResult objects
    • Queues
    • Greenlets Groups & Pool
    • TCP, SSL & WSGI servers
    • Locks and Semaphores (almost never needed)

Presenter Notes

Third part

Presenter Notes

gevent for DAQ and Beamline Control ?

Presenter Notes

gevent advocacy

  • DAQ and BL control is 99% I/O bound tasks

    • talking to devices via Tango, serial line, Ethernet, USB...
    • waiting for user input
    • reading/writing files
  • gevent scheduler + Python = sequencer

  • Already benefits to mxCuBE since May, 2012

    • helped reducing 4k awful lines of code to 1k, cleaner code
    • organization in "tasks" is a perfect fit for Control
  • Allows to concentrate on experiments problems, not concurrency issues

Presenter Notes

TaskUtils module

  • originally written for mxCuBE

  • a small layer on top of gevent, for our business

  • brings "task" decorator and cleanup/error_cleanup context managers

Presenter Notes

Example (with a bit of imagination)

Inspired from MX data collection code; in mxCuBE, safety_shutter, detm, phi, etc. are already gevent-friendly objects. Most methods use the "task" decorator, to be used as such by upper level sequences.

 1 @task
 2 def my_sequence(DET_POS, START, END, N_IMAGES, EXP_T):
 3   with cleanup(safety_shutter.close):
 4     # detm move is slow; don't wait for end of move here
 5     detector_move_task = detm.move(DET_POS, wait=False)
 6 
 7     safety_shutter.open()
 8 
 9     i0_diode.adjust_gain()
10 
11     phi.move(START)
12 
13     detector_move_task.join()
14 
15     pilatus.prepare(N_IMAGES)
16     with error_cleanup(pilatus.reset):
17       oscillation(START, END, EXP_T)
18 
19 my_sequence(timeout = 5)

Presenter Notes

IPython + gevent

  • Scientists and Pythonistas love IPython

    • used as a workbench
    • or like an enhanced REPL
  • Let's mix gevent with IPython

    • just like it is already done for GUI loops

Presenter Notes

IPython + gevent

 1 import gevent
 2 # monkey-patch everything except threads
 3 from gevent import monkey; monkey.patch_all(thread=False)
 4 from IPython.core.interactiveshell import InteractiveShell
 5 from IPython.lib.inputhook import allow_CTRL_C, ignore_CTRL_C, stdin_ready
 6 from IPython.lib.inputhook import InputHookManager
 7 
 8 def create_inputhook_gevent(mgr):
 9     # Create an input hook for running the gevent event loop.
10 
11     ip = InteractiveShell.instance()
12     if hasattr(ip, '_inputhook_gevent'):
13         return ip._inputhook_gevent
14 
15     got_kbdint = [False]
16 
17     def inputhook_gevent():
18         # PyOS_InputHook python hook for Gevent
19         try:
20             ignore_CTRL_C()
21             gevent.sleep(0.01)          ### THIS RUNS THE LOOP FOR 10 ms
22             while not stdin_ready():
23                 gevent.sleep(0.05)      ### THIS RUNS THE LOOP FOR 50 ms
24         except:
25             ignore_CTRL_C()
26             from traceback import print_exc
27             print_exc()
28         finally:
29             allow_CTRL_C()
30         return 0
31 
32     def preprompthook_gevent(ishell):
33         if got_kbdint[0]:
34             mgr.set_inputhook(inputhook_gevent)
35         got_kbdint[0] = False
36 
37     ip._inputhook_gevent = inputhook_gevent
38     ip.set_hook('pre_prompt_hook', preprompthook_gevent)
39 
40     return inputhook_gevent
41 
42 def enable_gevent_hook():
43     mgr = InputHookManager()
44     mgr.set_inputhook(create_inputhook_gevent(mgr))

Presenter Notes

IPython + gevent

Starting IPython using the gevent hook defined on the previous slide

1 #!/usr/bin/python
2 from IPython.frontend.terminal.interactiveshell import TerminalInteractiveShell
3 
4 shell = TerminalInteractiveShell()
5 
6 enable_gevent_hook()
7 
8 shell.mainloop()

Good news: IPython 0.14 will support gevent natively !

Presenter Notes

Presenter Notes

Is there anything missing ?

  • Would be great to have a gevent-friendly Python Tango client

  • Ask forgiveness, not permission

    • Experiment: hapPyTango
      • works fine, but CORBA CDR decoding in Python impacts performance
      • no old-style events (kind of incompatibility between Fnorb and omniORB)
      • no zmq events (pyzmq supports gevent, but not zmq 3.1)
    • Only implements a subset of Tango client library
    • Unofficial

Presenter Notes

Conclusion

  • Building on top of gevent is promising

  • Nothing revolutionary (gevent is just a tool...)

  • Simplicity is the key

    • single threaded process
    • but still good performance
    • easy to debug (compared to multithreaded code)
    • deterministic

Presenter Notes

Thanks for your attention :) Questions ?

Presenter Notes

Presenter Notes