---
title: "Python 101: iterators, generators, coroutines"
date: 2019-12-28
description: "A step-by-step guide to understanding iterators, generators, and coroutines in Python, building from first principles."
tags: [python]
---

In this post I'm going to be talking about what a generator is and how it compares to a coroutine, but to understand these two concepts (generators and coroutines) we'll need to take a step back and understand the underlying concept of an Iterator.

Each section leads onto the next, so it's best to read this post in the order the sections are defined. Unless you're already familiar with earlier segments and prefer to jump ahead.

## Summary

The summary of everything we'll be discussing below is this:

- Iterators let you iterate over your own custom object.
- Generators are built upon Iterators (they reduce boilerplate).
- Generator Expressions are even more concise Generators †
- Coroutines _are_ Generators, but their `yield` accepts values.
- Coroutines can pause and resume execution (great for concurrency).

> [!INFO]
> † think [comprehensions](https://gist.github.com/e5310d1082b0ff8307e39b71a6f9bae5).

## Iterators

According to the official [Python glossary](https://docs.python.org/3.7/glossary.html#term-iterator), an 'iterator' is...

> [!CITE]
> An object representing a stream of data.

### Why use Iterators?

An interator is useful because it enables any custom object to be iterated over using the standard Python `for-in` syntax. This is ultimately how the internal list and dictionary types work, and how they allow `for-in` to iterate over them.

More importantly, an iterator (as we'll discover) is very memory efficient and means there is only ever one element being handled at once. Thus you could have an iterator object that provides an infinite sequence of elements and you'll never find your program exhausting its memory allocation.

### Iterator Implementation

An iterator is (typically) an object that implements both the `__iter__` and `__next__` 'dunder' methods, although the `__next__` method doesn't _have_ to be defined as part of the same object as where `__iter__` is defined. Let me clarify...

An 'iterator' is really just a container of some data. This 'container' must have an `__iter__` method which, according to the [protocol documentation](https://docs.python.org/3.7/library/stdtypes.html#iterator.__iter__), should return an iterator object (i.e. something that has the `__next__` method). It's the `__next__` method that moves forward through the relevant collection of data.

So you could design a single class that contains both the `__iter__` and `__next__` methods (like I demonstrate below), or you might want to have the `__next__` method defined as part of a separate class (it's up to you and whatever you feel works best for your project).

> [!INFO]
> the Python docs for [`collections.abc`](https://docs.python.org/3.7/library/collections.abc.html#collections.abc.Iterator) highlight the other 'protocols' that Python has and the various methods they require (see an [earlier post of mine](/posts/design-python/#interfaces-protocols-and-abstract-methods) that discusses protocols + abstract classes in detail). If you're unfamiliar with 'dunder' methods, then I'll refer you to an excellent post: [a guide to magic methods](https://rszalski.github.io/magicmethods/).

By implementing these two methods it enables Python to iterate over a 'collection'. It doesn't matter what the collection is, as long as the iterator object defines the behaviour that lets Python know how to iterate over it.

### Iterator Example

Below is a contrived example that shows how to create such an object. In this example we pass in a list of strings to a class constructor and the class implements the relevant methods that allow `for-in` to iterate over that collection of data:

```
class Foo:
    def __init__(self, collection):
        self.collection = collection
        self.index = 0

    def __iter__(self):
        """
        we return self so the 'iterator object' 
        is the Foo class instance itself,
        
        but we could have returned a new instance 
        of a completely different class, so long as
        that other class had __next__ defined on it.
        """
        return self

    def __next__(self):
        """
        this method is handling state and informing
        the container of the iterator where we are
        currently pointing to within our data collection.
        """
        if self.index > len(self.collection)-1:
            raise StopIteration

        value = self.collection[self.index]
        self.index += 1

        return value

# we are now able to loop over our custom Foo class!
for element in Foo(["a", "b", "c"]):
    print(element)
```

> [!IMPORTANT]
> raising the `StopIteration` exception is a requirement for implementing an iterator correctly.

With this example implementation, we can also iterate over our `Foo` class _manually_, using the `iter` and `next` functions, like so:

```
foo = Foo(["a", "b", "c"])
iterator = iter(foo)

next(iterator)  # 'a'
next(iterator)  # 'b'
next(iterator)  # 'c'
```

> [!INFO]
> `iter(foo)` is the same as `foo.__iter__()`, while `next(iterator)` is the same as `iterator.__next__()` -- so these functions are basic syntactic sugar provided by the standard library that helps make our code look nicer.

This type of iterator is referred to as a 'class-based iterator' and isn't the only way to implement an _iterable_ object. [Generators](#generators) and [Generator Expressions](#generator-expressions) (see the following sections) are other ways of iterating over an object in a memory efficient way.

We can also _realize_ the full collection by using the `list` function, like so:

```
iterator = Foo(["a", "b", "c"])
list(iterator)  # ["a", "b", "c"]
```

> [!WARNING]
> be careful doing this, because if the iterator is yielding an unbounded number of elements, then this will exhaust your application's memory!

## Generators

According to the official [Python documentation](https://docs.python.org/3.7/library/stdtypes.html#generator-types), a 'generator' provides...

> [!CITE]
> A convenient way to implement the iterator protocol. If a container object's `__iter__()` method is implemented as a generator, it will automatically return an iterator object.

### Why use Generators?

They offer nice syntax sugar around creating a simple Iterator, but also help reduce the boilerplate code necessary to make something iterable.

A Generator can help reduce the code boilerplate associated with a 'class-based' iterator because they're designed to handle the 'state management' logic you would otherwise have to write yourself.

### Generator Implementation

A Generator is a function that returns a 'generator iterator', so it acts similar to how `__iter__` works (remember it returns an iterator).

In fact a Generator is a subclass of an Iterator. The generator function itself should utilize a `yield` statement to return control back to the caller of the generator function.

The caller can then advance the generator iterator by using either the `for-in` statement or `next` function (as we saw earlier with the 'class-based' Iterator examples), which again highlights how generators are indeed a subclass of an Iterator.

When a generator 'yields' it actually pauses the function at that point in time and returns a value. Calling `next` (or as part of a `for-in`) will move the function forward, where it will either complete the generator function or stop at the next `yield` declaration within the generator function.

### Generator Example

The following example prints `a`, then `b`, finally `c`:

```
def generator():
  yield "a"
  yield "b"
  yield "c"

for v in generator():
     print(v)
```

If we used the `next()` function instead then we would do something like the following:

```
gen = generator()
next(gen)  # a
next(gen)  # b
next(gen)  # c
next(gen)  # raises StopIteration
```

Notice that this has greatly reduced our code boilerplate compared to the custom 'class-based' Iterator we created earlier, as there is no need to define the `__iter__` nor `__next__` methods on a class instance (nor manage any state ourselves). We simple call `yield`!

If our use case is simple enough, then Generators are the way to go. Otherwise we might need a custom 'class-based' Iterator if we have very specific logic we need to execute.

Remember, Iterators (and by extension Generators) are very memory efficient and thus we could have a generator that yields an unbounded number of elements like so:

```
def unbounded_generator():
    while True:
        yield "some value"

gen = unbounded_generator()

next(gen)  # some value
next(gen)  # some value
next(gen)  # some value
next(gen)  # some value
next(gen)  # ...
```

So, as mentioned earlier, be careful when using `list()` over a generator function (see below example), as that will realize the entire collection and could exhaust your application memory.

```
def generator():
  yield "a"
  yield "b"
  yield "c"

gen = generator()
list(gen)  # [a, b, c]
```

### Generator Expressions

According to the official [PEP 289 document](https://www.python.org/dev/peps/pep-0289/) for generator expressions...

> [!CITE]
> Generator expressions are a high-performance, memory–efficient generalization of list comprehensions and generators.

In essence they are a way of creating a generator using a syntax very similar to [list comprehensions](https://gist.github.com/e5310d1082b0ff8307e39b71a6f9bae5).

Below is an example of a generator function that will print `"foo"` five times:

```
def generator(limit):
    for i in range(limit):
        yield "foo"

for v in generator(5):
    print(v)
```

Now here is is the same thing as a generator expression:

```
for v in ("foo" for i in range(5)):
    print(v)
```

The syntax for a generator expression is also very similar to those used by comprehensions, except that instead of the boundary/delimeter characters being `[]` or `{}`, we use `()`:

```
(expression for item in collection if condition)
```

> [!INFO]
> so although not demonstrated, you can also 'filter' yielded values due to the support for "if" conditions.

### Nested Generators (i.e. `yield from`)

Python 3.3 provided the `yield from` statement, which offered some basic syntactic sugar around dealing with nested generators.

Let's see an example of what we would have to do if we didn't have `yield from`:

```
def baz():
    for i in range(10):
        yield i

def bar():
    for i in range(5):
        yield i

def foo():
    for v in bar():
        yield v
    for v in baz():
        yield v

for v in foo():
    print(v)
```

Notice how (inside the `foo` generator function) we have two separate `for-in` loops, one for each nested generator.

Now look at what this becomes when using `yield from`:

```
def baz():
    for i in range(10):
        yield i

def bar():
    for i in range(5):
        yield i

def foo():
    yield from bar()
    yield from baz()

for v in foo():
    print(v)
```

OK so not exactly a ground breaking feature, but if you were ever confused by `yield from` you now know that it's a simple facade over the `for-in` syntax.

Although it's worth pointing out that if we didn't have `yield from` we still could have reworked our original code using the `itertool` module's `chain()` function, like so:

```
from itertools import chain

def baz():
    for i in range(10):
        yield i

def bar():
    for i in range(5):
        yield i

def foo():
    for v in chain(bar(), baz()):
        yield v

for v in foo():
    print(v)
```

> [!INFO]
> refer to [PEP 380](https://www.python.org/dev/peps/pep-0380/) for more details on `yield from` and the rationale for its inclusion in the Python language.

## Coroutines

Coroutines (as far as Python is concerned) have historically been designed to be an extension to [Generators](#generators).

> [!CITE]
> Coroutines are computer program components that generalize subroutines for non-preemptive multitasking, by allowing execution to be suspended and resumed. -- [Wikipedia](https://en.wikipedia.org/wiki/Coroutine)

### Why use Coroutines?

Because coroutines can pause and resume execution context, they're well suited to conconcurrent processing, as they enable the program to determine when to 'context switch' from one point of the code to another.

This is why coroutines are commonly used when dealing with concepts such as an [event loop](/posts/python-asyncio/#event-loop) (which Python's `asyncio` is built upon).

### Coroutines Implementation

Generators use the `yield` keyword to return a value at some point in time within a function, but with coroutines the `yield` directive can _also_ be used on the right-hand side of an `=` operator to signify it will _accept a value_ at that point in time.

### Coroutines Example

Below is an example of a coroutine. Remember! a coroutine is still a generator and so you'll see our example uses features that are related to generators (such as `yield` and the `next()` function):

> [!TIP]
> refer to the code comments for extra clarity.

```
def foo():
    """
    notice we use yield in both the 
    traditional generator sense and
    also in the coroutine sense.
    """
    msg = yield  # coroutine feature
    yield msg    # generator feature

coro = foo()

# because a coroutine is a generator
# we need to advance the returned generator
# to the first yield within the generator function
next(coro)

# the .send() syntax is specific to a coroutine
# this sends "bar" to the first yield 
# so the msg variable will be assigned that value
result = coro.send("bar")

# because our coroutine also yields the msg variable
# it means we can print that value
print(result)  # bar
```

> [!INFO]
> `coro` is an identifier commonly used to refer to a coroutine. For more information on other available coroutine methods, please refer to the [documentation](https://docs.python.org/3.8/reference/datamodel.html#coroutines).

Below is an example of a coroutine using `yield` to return a value to the caller prior to the value _received_ via a caller using the `.send()` method:

```
def foo():
    msg = yield "beep"
    yield msg

coro = foo()

print(next(coro))  # beep

result = coro.send("bar")

print(result)  # bar
```

You can see in the above example that when we moved the generator coroutine to the first `yield` statement (using `next(coro)`), that the value `"beep"` was returned for us to `print`.

### Asyncio: generator based coroutines

When the `asyncio` module was first released it didn't support the `async`/`await` syntax, so when it was introduced, to ensure any legacy code that had a function that needed to be run concurrently (i.e. awaited) would have to use an `asyncio.coroutine` decorator function to allow it to be compatible with the new `async`/`await` syntax.

> [!INFO]
> refer to [the documentation](https://docs.python.org/3.8/library/asyncio-task.html#generator-based-coroutines) for information on this deprecated (as of Python 3.10) feature, as well as some other functions like `asyncio.iscoroutine` that are specific to generator based coroutines.

The original generator based coroutines meant any `asyncio` based code would have used `yield from` to await on [Futures](/posts/python-asyncio/#futures) and other coroutines.

The following example demonstrates how to use both the new `async` coroutines with legacy generator based coroutines:

```
@asyncio.coroutine
def old_style_coroutine():
    yield from asyncio.sleep(1)

async def main():
    await old_style_coroutine()
```

### Asyncio: new async coroutines

Coroutines created with `async def` are implemented using the more recent `__await__` dunder method (see [documentation here](https://docs.python.org/3.8/reference/datamodel.html#coroutines)), while generator based coroutines are using a legacy 'generator' based implementation.

### Types of Coroutines

This has led to the term 'coroutine' meaning multiple things in different contexts. We now have:

- **simple coroutines**: traditional generator coroutine (no async io).
- **generator coroutines**: async io using legacy `asyncio` implementation.
- **native coroutines**: async io using latest `async`/`await` implementation.

### Miscellaneous

There are a couple of interesting decorator functions provided by Python that can be a bit confusing, due to these functions appearing to have overlapping functionality.

They don't overlap, but do appear to be used together:

- `types.coroutine`: converts generator function into a coroutine.
- `asyncio.coroutine`: abstraction ensuring `asyncio` compatibility.

> [!INFO]
> as we'll see in a moment, `asyncio.coroutine` actually calls `types.coroutine`. You should ideally use the former when dealing with `asyncio` code.

More specifically, if we look at the implementation of the [`asyncio.coroutine` code](https://github.com/python/cpython/blob/master/Lib/asyncio/coroutines.py#L105) we can see:

1. If decorated function is already a coroutine, then just return it.
1. If decorated function is a generator, then convert it to a coroutine (using `types.coroutine`).
1. Otherwise wrap the decorated function such that when it's converted to a coroutine it'll await any resulting awaitable value.

What's interesting about `types.coroutine` is that if your decorated function were to remove any reference to a `yield`, then the function will be executed immediately rather than returning a generator. See [this](https://stackoverflow.com/a/49477233) Stack Overflow answer for more information as to where that behaviour was noticed.
