Fighting Entropy like Sisyphus!
revised for PyGotham 2014
Hi. I'm Paul Winkler and I'm here to talk about object-oriented software design.
I work here in NY at Percolate, there are a bunch of us here today and yes we are hiring... We're a PyGotham gold sponsor as well! we have a booth in the food area, come talk to us if you're looking for work or just curious what we do.
There's some fairly beginner-level OO design tips here; there's also some waxing philosophical so I hope the more experienced people won't be bored.
I'm going to talk about 3 things ... (slide)
Some of you know all of this. Some of you know more than me about all of this. Some of you may disagree with me about this. Some of you may know none of this.
First, to get a sense of who's here today - show of hands:
Who here would call themselves pretty experienced at OO design?
Who here is just starting at OO programming or design?
Anybody who learned Python specifically so they could do Django or Flask?
(The interesting thing about that for my purposes is that I've noticed it's possible to do quite a bit of productive work in Django without ever doing much OO design. Which is actually pretty cool. But it might leave you wondering what all this OO design fuss is about.)
In my career: Most software I've worked on is developed using some agile approach and deployed on the web frequently. There are no big releases, just a constant stream of improvements.
Q: How many people work in similar situations?
In this world, it's taken as gospel truth that the old big-design-up-front approaches are bad because you don't know enough up front to predict what you need to design, so you waste time and you design the wrong thing. I believe this is largely true, though I reject the extreme of thinking you should start without doing any preliminary design work at all.
Instead we have embraced as a best practice the mantra: refactor constantly. Do the simplest thing and always be improving your design and all will be well.
Result: Big ball of mud!
(source: CBS via Baltimore Sun)
Unfortunately, I'm going to argue, we often do a poor job of this. And a good design fails to emerge. Instead, we often get the proverbial "Big Ball of Mud" design. And this happens not through any one bad decision but through a series of decisions that in isolation make good sense, but taken together add up to an overly complex software design.
And I'm going to say up front that there's no grand solution here. The solution is vigilance and being aware of the pitfalls.
(this title is from Camus)
Hence, Sisyphus. We are never going to be done pushing the design rock up the hill. Or the kitten up the slide. Eternal vigilance is the price of, not just liberty, but also agile design.
Disclaimer: I have not read Camus. I can use the google.
If that doesn't appeal to you, you might be in the wrong line of work... or need an attitude change. Savor the little victories. Always be learning.
For today, focusing on overuse of inheritance.
This talk could go on forever so I'm picking on my favorite target. Inheritance. Or more specifically, overuse of inheritance for things that can be done more flexibly and more simply in other ways.
Hard to untangle.
Things we do by default as we incrementally improve a system. These are all often highly expedient and often make things worse.
OO 101: Over-inheritance falls out of any language with inheritance.
Easiest path to D.R.Y.: Add more base classes!
Alternatives may not be as intuitive or obvious.
We continue to overuse inheritance because it's a path of very low resistance. And once we have an existing system that uses inheritance, it's very difficult - perhaps prohibitively so - to stop doing that. Once you pop, you can't stop!
Zope 2 in a nutshell:
Confession: Hi, my name is Paul, and I'm a recovering Zope 2 programmer.
Perhaps this makes me overly sensitive?
Zope, for the young folks in the audience, was a web development framework that was very big in the Python world around 10-15 years ago. Internally it used multiple inheritance very very heavily.
Here's part of the inheritance tree of the ironically named SimpleItem. Nearly everything you did in Zope 2 involved inheriting from this class.
Easy things were usually easy. The hard things it made convenient were easy. Anything else was rough going.
So, people with my history are typically very suspicious of big inheritance graphs. Not coincidentally, the guy that replied to this tweet of mine is also a recovering Zope 2 programmer.
I'm going to show a simple contrived example, and a real-world example of the kinds of problems I'm talking about.
I'm going to show you why they're problems, show you 3 or 4 common symptoms of overuse.
And what should we do instead?
I'm going to show you an alternative you may have heard of. How many people have heard the phrase "Favor composition over inheritance"? How many have not?
I'm going to briefly walk you through actually doing it.
Your client just wants a freakin' shark with lasers.
class SharkWithLasers(Shark, LaserMixin): def attack(self, target): self.shoot(target) self.eat(target)
This is easy, right?
But now we want an orca with nunchaku.
Factor out commonalities into more base classes...
Every concept we add makes more and more classes.
But even if we stop here forever, it's already bad, because...
"Often we get the feeling of riding a yoyo when we try to understand one [of] these message trees." -- Taenzer, Ganti, and Podar, 1989
With inheritance, when you see a method being called, and you want to understand what's going on, you have to mentally envision the inheritance graph and figure out which class defines the version that's actually getting called.
Since subclasses can call methods defined in superclasses, and superclasses can also call methods that overridden or even only defined in subclasses, you have to go hunting by bouncing up and down through the inheritance tree looking for these method definitions.
State - instance state, typically attribute assignments - is even worse, because it can change on literally any line.
Multiple inheritance makes it even more fun - it's not like being a yo-yo, it's like being a pinball and bouncing all over the place. You have to reconstruct Python's method resolution order in your head, or find a tool to do it for you.
class SharkWithLasers(SharkBase, LaserMixin): def attack(self, target): self.shoot(target) self.eat(target)
It starts innocuously enough...
Okay, easy in that example.
class Shark(object): def eat(self, target): print "chomp! delicious %s" % target class LaserMixin(object): def shoot(self, target): print "pew! pew! at %s" % target
Not so much when there are dozens of classes.
Who is "self"?
Put another way: It's interesting to ask yourself in each method definition, what kind of object do I mean when I say "self"?
You don't know if it currently means a shark, or a base Animal, or a thing with lasers, or a base Weapon, or a thing with armor? You have to look all over, with only the names to give you clues.
ArmoredSharkWithLasers will have methods related to sharks, lasers, and armor.
Those are not conceptually related at all.
More classes + more methods = more yo-yo
"Has-a" or "Uses-a" relationships, instead of "Is-a".
Underlying principle in "Design Patterns" (aka "Gang of Four" book)
Now we get back to this phrase we mentioned before.
class Shark(object): def __init__(self, weapon): self.weapon = weapon def eat(self, target): print "chomp! delicious %s" % target def attack(self, target): self.weapon.attack(target) self.eat(target) shark_with_laser = Shark(weapon=Laser())
def attack(self, target): self.weapon.attack(target) # ^^^^^^ A clue! self.eat(target) # Still have to look, but the tree is smaller.
These would have been hard to do without special case hacks and/or yet more classes:
mystery_shark = Shark( weapon=get_random_weapon()) armed_to_the_teeth = Shark( weapon=WeaponCollection(Lasers(), Grenades()))
Yes, it's a bad made-up design that nobody would ever do.
One day I was working on some rest API endpoints at my job.
Names of classes changed to protect the innocent. But this was generated from real code from a real production system.
Existing inheritance hierarchy tends to encourage more inheritance, because it's easier than puzzling out how to do without it. This is what I meant by "once you pop, you can't stop."
Here I factored out methods I needed to re-use into two new base classes.
Let's refactor SharkWithArmor!
class Shark(Animal): def receive_hit(self, damage): self.health -= damage if self.health <= 0: self.die() class ArmorMixin(object): def receive_hit(self, damage): self.armor_health -= damage if self.armor_health < 0: super(ArmorMixin, self).receive_hit(-self.armor_health) self.armor_health = 0 class SharkWithArmor(ArmorMixin, Shark): pass
One nice thing about this design: the Shark class knows nothing about armor. All you have to do is put the base classes of SharkWithArmor in the right order, and receive_hit() will do the right thing.
One not so nice thing: Depends on super().receive_hit() and does not have any base classes. Implicitly must be mixed into something that provides that method. Not documented by code.
class Armored(object): def __init__(self, wearer): self.wearer = wearer def receive_hit(self, damage): self.armor_health -= damage if self.armor_health < 0: self.wearer.receive_hit(-self.armor_health) self.armor_health = 0 def __getattr__(self, name): # Or explicitly proxy all others if desired. return getattr(self.wearer, name) shark_with_armor = Armored(wearer=Shark())
This might look a little backwards at first. The armor has the wearer, rather than the wearer having the armor.
This is so we can maintain the nice property we had before, where the Shark class doesn't have to know about armor. Nothing knows about the armor except the armor itself... and the invocation that constructs it.
Shark has and uses laser, rather than is laser.
class Shark(object): ... def attack(self, target): self.weapon.attack(target) self.eat(target) shark_with_laser = Shark(weapon=Laser())
How do we get here?
Earlier we suggested that this was a better design for sharks with lasers. How do we get from the inheritance-based code to this delegation-based code? When there's a huge pile of other classes in the tree and we want to do it gradually?
Example refactoring of Sharks/Orcas/Nunchucks/Lasers:
Important: Tests before refactoring!
You need solid test coverage. If you don't have it, do that first. This is mandatory.
Filling out your test suite and getting decent coverage is more important to the success of your project than redoing your design. You could add tests and never redo the design and you'd be a hell of a lot better off than when you started.
The sample repo starts and ends with 100% line coverage.
For more (references and some more rambling):
Cats-on-a-slide gif: found at http://thisconjecture.com/2014/02/15/the-myth-of-sisyphus-a-touch-of-silly-and-a-great-animation-of-the-story/ original provenance unclear.
Nunchucks designed by Simon Henrotte (public domain)
Question for audience: does everybody know what a mixin is? in python?
(If not: A mixin is a class designed not to be used by itself, but by inheriting from it to add some behavior to your class. Get more behavior by inheriting from more mixins. In some languages eg. Ruby, this means something a bit more formal, but in python it's just an informal idea of, here's a class you can inherit from if you want its behavior.)
Some characteristics of nice mixins:
Symptom: Reuse is tied very tightly to the inheritance tree and is very hard to refactor away from that tree.
Symptom: As that tree grows, you don't have a yo-yo problem anymore, you have a pinball problem - bouncing all over the place.
Simple example that does not suck: unittest.TestCase. The setUp() and tearDown() are expected to be overridden.
So template method is certainly not inherently bad, it's useful and good.
Some code smells to watch out for:
|Left, Down, Page Down||Next slide|
|Right, Up, Page Up||Previous slide|
|P||Open presenter console|
|H||Toggle this help|