Here's a typical loop that checks for some condition and breaks the first time it passes:

    def list_contains_urls(maybe_urls):
        for url in maybe_urls:
             if url.startswith('http'):
                  return True
        return False

Can we do that more concisely? Sure.

    def list_contains_urls(maybe_urls):
         return any(u for u in maybe_urls if u.startswith('http'))

This, I'd argue, is still pretty clear. any is a builtin, and everybody should know their builtins.

This version would still work, but would be less efficient:

    def list_contains_urls(maybe_urls):
         return any([u for u in maybe_urls if u.startswith('http')])

What's the difference? In the second case, we used a list comprehension; in the first, we used a generator expression. Why is the second version worse? Because the list comprehension wastes memory building a list we don't actually need, and wastes work computing every member of this list even though we only care about the first one that matches the conditional. The generator expression version does neither; it short-circuits just like the original for loop version.

Confused? Consider this generator expression:

>>> ints = (x for x in xrange(-10, 10))
>>> ints
<generator object <genexpr> at 0x214ceb0>
>>> any(x for x in ints if x > 0)

What happened there? The generator stopped as soon as any() stopped consuming it because it found a matching value. The value that tested true must have been 1. That's why, when we call the generator's next() method, we got 2.


comments powered by Disqus