Here's a typical loop that checks for some condition and breaks the first time it passes:
def list_contains_urls(maybe_urls):
for url in maybe_urls:
if url.startswith('http'):
return True
return False
Can we do that more concisely? Sure.
def list_contains_urls(maybe_urls):
return any(u for u in maybe_urls if u.startswith('http'))
This, I'd argue, is still pretty clear. any
is a builtin, and everybody
should know their builtins.
This version would still work, but would be less efficient:
def list_contains_urls(maybe_urls):
return any([u for u in maybe_urls if u.startswith('http')])
What's the difference? In the second case, we used a list comprehension; in
the first, we used a generator expression. Why is the second version worse?
Because the list comprehension wastes memory building a list we don't actually
need, and wastes work computing every member of this list even though we only
care about the first one that matches the conditional. The generator
expression version does neither; it short-circuits just like the original for
loop version.
Confused? Consider this generator expression:
>>> ints = (x for x in xrange(-10, 10))
>>> ints
<generator object <genexpr> at 0x214ceb0>
>>> any(x for x in ints if x > 0)
True
>>> ints.next()
2
What happened there? The generator stopped as soon as any() stopped consuming
it because it found a matching value. The value that tested true must have
been 1. That's why, when we call the generator's next()
method, we got 2.