Thread Locals in Python: Mostly easy

Thread locals are an interesting idea: a way to give each thread its own storage, useful for global state that you don't want to share between threads.

The obvious caveat is that threadlocals are still effectively global (for the current thread), and like all global state, should be treated with great suspicion. If you can do without one, you're probably better off.

Let's look at the Python official docs about threading.local. Here is all of it:

class threading.local

A class that represents thread-local data. Thread-local data are data whose values are thread specific. To manage thread-local data, just create an instance of local (or a subclass) and store attributes on it:

mydata = threading.local()

mydata.x = 1

The instance’s values will be different for separate threads.

For more details and extensive examples, see the documentation string of the _threading_local module.

Okay, so let's play with that example and see what happens.

>>> import threading
>>>
>>> mydata = threading.local()
>>> mydata.x = 'hello'
>>> class Worker(threading.Thread):
...     def run(self):
...         mydata.x = self.name
...         print mydata.x
... 
>>> w1, w2 = Worker(), Worker()
>>> w1.start(); w2.start(); w1.join(); w1.join()
Thread-1
Thread-2
>>> print mydata.x
hello

Cool! Each thread does indeed set and read its own data in the x attribute.

Mutable threadlocals part 1: oops

Let's try the same exact example again with a different value for x. Something mutable. Say, a dict.

>>> import threading
>>> mydata = threading.local()
>>> mydata.x = {}
>>>
>>> class Worker(threading.Thread):
...     def run(self):
...         mydata.x['message'] = self.name
...         print mydata.x['message']
...
>>> w1, w2 = Worker(), Worker()
>>> w1.start(); w2.start(); w1.join(); w2.join()
Exception in thread Thread-2:
Traceback (most recent call last):
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 810, in __bootstrap_inner
    self.run()
  File "<stdin>", line 3, in run
AttributeError: 'thread._local' object has no attribute 'x'

Exception in thread Thread-1:
Traceback (most recent call last):
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 810, in __bootstrap_inner
    self.run()
  File "<stdin>", line 3, in run
AttributeError: 'thread._local' object has no attribute 'x'

Um. Wat. So mydata.x exists if it's assigned an immutable value, and does not exist if it's assigned a mutable value?

I don't fully understand this.

It's a bit irksome that this is glossed over in the published python docs and lies here waiting to bite unsuspecting users. Remind me to submit a docs patch...

Mutable threadlocals part 2: Subclass threading.local

But here's a better way, based on a close read of the docstring that was mentioned in the docs. It works if you subclass threading.local and create your mutable object inside __init__.

You can see what happens if you do that by reading the _threading_local source: both __new__ and __getattribute__ are defined such that your __init__() gets run for each thread as needed, with locking around it. The Worker class here is the same as above:

>>> import threading
>>> class MyData(threading.local):
...     def __init__(self):
...         self.x = {}
...
>>>
>>> mydata = MyData()
>>>
>>> class Worker(threading.Thread):
...     def run(self):
...         mydata.x['message'] = self.name
...         print mydata.x['message']
...
>>>
>>> w1, w2 = Worker(), Worker()
>>> w1.start(); w2.start(); w1.join(); w2.join()
Thread-1
Thread-2
>>> print mydata.x
{}

What if you wanted to create a mutable value explicitly per thread, instead of inside __init__()?

That's actually easier, just do that inside code that's run per thread instead of globally, and assign it. No local subclass needed. Eg. in this case inside the run method:

>>> mydata = threading.local()
>>>
>>> class Worker(threading.Thread):
...     def run(self):
...         mydata.x = {'message': self.name}
...         print mydata.x['message']
... 
>>> w1, w2 = Worker(), Worker()
>>> w1.start(); w2.start(); w1.join(); w2.join()
Thread-17
Thread-18