Big Generator

Submitted by Mile23 on Fri, 09/05/2014 - 20:29

Note that as my deadlines loom, I am unable to comb over every line of code in this article. Please forgive any un-runnable code and let me know which it is so I can address it. Thanks.

Note also that the title of this article is a reference to the 'Big Generator' album by Yes, released in 1987. It was the less-successful followup to their total smash hit '90125'. It was all I could think of.

You can learn more about it here: http://youtu.be/Qc8ZNeBG0XI

Spoilers

I don't like generators very much, for the same reason I don't like anonymous functions very much: It's very easy to write untestable, and thus unmaintainable generators.

Add in language features such as send() and throw() and you get a whole world of smelly code that will be very difficult to test and maintain.

WTH is a Generator?

You know what an array is? Well there's Iterators and they act like arrays, only they're objects. And you can make a function that acts like an Iterator object, and that's a Generator.

OK then. We're done.

Wait, not done? OK. Let's wind it all the way back to the beginning. Which is something Generators can't do, by the way.

Remedial PHP 1: The Array

Here's an array:

$array = array('foo', 'bar', 'baz');

You can loop over the array with foreach():

foreach ($array as $item) {
  echo 'item: ' . $item;
}

// item: foo
// item: bar
// item: baz

You can also put the array into array functions:

echo reset($array);
// foo

echo last($array);
// baz

And others: current(), next(), prev(), and so forth.

There's plenty more about arrays, but the point here is: You can access array values through such functions as current().

Remedial PHP 2: The Iterator

\Iterator allows objects to act like arrays.

It's an interface provided by PHP v.5.0 and later.

It enforces the existence of methods such as current(), next(). This lets operators like foreach() iterate over whatever those methods return.

class FakeArray implements \Iterator {
  public function current() {
    return $the_current_thing;
  }
  // More interface implementation here...
}

$iterator = new FakeArray();

foreach($iterator as $item) {
  echo $item;
}

An Iterator can be thought of as a one-directional stack which you can access through next() and current().

// Advance to the next item.
$iterator->next();
// Get the item we advanced to.
echo $iterator->current();

Iterators can only go forwards, other than to rewind(). That is, there's no previous().

The big point here: Iterators are objects that you call methods on, much like you pass arrays to functions.

Remedial PHP 3: Anonymous Functions are Objects

This is an anonymous function:

function () { return 'yay!'; }

We can make a reference to our anonymous function:

$foof = function () { return 'yay!'; };

We can then execute our function:

echo $foof();

Our anonymous function is stored internally to PHP as a \Closure object:

echo get_class($foof);
// Closure

This illustrates that a function can be implemented as an object by PHP.

Normal functions don't work this way. That is, they aren't converted to objects in any way we can know about.

But generators do work this way.

Non-Remedial PHP: The Generator, Level 1

What's A Generator?

\Generator is a class internal to PHP. It implements \Iterator, so it's bound to act like one. Generator can't do any random-access, not even rewind(). They only move forward.

You can't explicitly create a Generator.

// Phail!
$generator = new \Generator();

You create a generator by creating a function. PHP figures out that your function is really a generator and then casts it as a Generator object.

How does it know?

  1. You don't return a value. (You can say return;, but not return 23;.)
  2. You have a yield operator.
  3. That's it.

Ridiculously Useless Sample Generator Function:

function generator_function() {
  for ($i=0; $i<9000; ++$i) {
    yield $i;
  }
  return;
}

If you wanted something to iterate 9000 times, and you were running PHP earlier than v.5.5, you might have had to say something like this:

foreach(range(0,8999) as $count) {
  echo $count;
}

This is bad because a) it's a very stupid example, but also b) You're creating a 9000-element array, which fills up memory.

So we replace it with our generator function:

foreach(generator_function() as $count) {
  echo $count;
}

Now we only have the $i index in memory, instead of a giant array.

This is often cited as the raison d'etre for the generator as a concept: You can avoid large memory-gobbling arrays.

Wait, What? Please Yield An Explanation

If you're used to functions returning a single value and then going out of scope, you are justifiably puzzled.

Here's a diagram:

function generator() {
  yield 1; // <-- Control goes back to caller here.
  // Control returns here from caller.
  yield 2; // <-- Control goes back to caller again.
}

In a generator, on the first iteration the function is called until it gets to a yield command.

The state of the function call is suspended at that point and the yielded value is given to the caller.

On the next iteration, code flow comes back to the generator function just after the yield call.

Control flows until it gets to another yield command, or a return, or until the function ends.

And keeps on going like that.

If We're Emulating Arrays, What About Keys For Our Values?

E-zee. Just use array key-value assignment operators.

function generator() {
  yield 'key' => 'value';
}

foreach(generator() as $key=>$value) {
  echo $key;
  echo $value;
}

This means that generators are functions that can return two values.

If I can, I'd like to pause a moment here, and repeat that. This is a little bit of an Alice In Wonderland type moment, and I'd like to draw your attention to it's strangeness:

A generator is a function that can return two values.

That's weird, but it will get weirder.

Generators Are Functions, Right?

We can pass arguments to generators:

function generator_function($offset) {
  for ($i=0;$i<10;++$i) {
    yield $offset + $i;
  }
}

We can get a reference to a generator just like we can with anonymous functions, and use that instead. Like this:

$g = generator_function(23);

foreach($g as $item) {
  echo $item;
}

Anonymous Generators

And now that we've learned that, we can make the leap to declaring anonymous generators:

$gen = function() {
  for ($i=0; $i<3000; ++$i) {
    yield $i;
  }
}
  
foreach ($gen as $count) {
  echo $count;
}

Note that we have to assign the anonymous function to a variable. This gives PHP a chance to instantiate it, and thus discover that it's a generator.

// Phail!
foreach( function() { yield $something; } as $whatever) {}

Generator Level 2: Object-Oriented Functions

I told you it was weird, right? So when we get a reference to the generator function, we are getting a reference to a \Generator object.

$g = generator();
echo get_class($g);
// Generator

Internally, the generator function becomes a \Generator object, much like anonymous functions become \Closure objects.

A generator differs from other functions in that the state of the function code is preserved at the yield operator, rather than going out of scope after return.

So I can say this:

// Our generator function...
function generator_function() {
  for ($i=1; $i<=10; ++$i) {
    yield $i;
  };
}

// Instantiate a Generator.
$generator = generator_function();

echo $generator->current();
// 1
$generator->next();
echo $generator->current();
// 2

// etc.

As long as $generator is in scope, we can call \Iterator methods on it. We could even put this generator in a global scope and have different areas of code call those methods.

Is that a bad idea? Probably. But we could.

So it's a function and then it's an object and it lives in a scope. Which is weird.

send()

Speaking of weird... I mean really weird...

If we look at the PHP specifications for /Generator, we see that it includes two methods: send() and throw().

send() lets us send a value into the iterator.

That's right, the yield command is bi-directional.

So the deal is, you can make a generator that looks like this:

function sendable_generator() {
  $value = 0;
  for($i=0;$i<10;++$i) {
    $value = (yield $value + $i);
  }
}

Do you see the assignment made after yield is called?

We have to place yield inside some parentheses so that PHP isn't confused about precedence. Or really, I'd wager it was designed that way so that we wouldn't be confused about precedence.

What happens there? yield sends a value back and control leaves the generator. Then when control comes back in from a send(), the sent value is placed in $value.

From the outside it looks like this:

$g = sendable_generator();
echo $g->send(23);
// 24

Note that, in this case, the send() 'wastes' an iteration, because the yield happened, and then PHP sent the value back, and then execution continued until the next yield.

throw()

There's also that throw() method. It's similar to send() except it throws an exception into the generator instead of poking back a value.

If you want to signal your generator in some way other than sending back a special value, you can make it explode from the inside with an exception.

You can, of course, catch this exception and do some special behavior, maybe like this:

function generator() {
  for($i=0;$i<10;++$i) {
    // Wrap in a try structure.
    try {
      yield $i;
    }
    // Receive special signal from caller.
    catch (\Exception $e) {
      yield 'boom';
    }
  }
}

$g = generator();

echo $g->current();
// 0
$g->next();
echo $g->throw(new \Exception());
// boom
$g->next();
echo $g->current();
// 2
Stupid Generator Tricks

Kill me now.

function gen(\Iterator $g = NULL) {
  if ($g) {
    yield $g->current();
  }
  else {
    yield 23;
  }
}

$g1 = gen();
$g2 = gen($g1);
foreach ($g2 as $item) {
  echo $item;
}

I eagerly await death.

function digits_of_pi() {
  $timestamp = time();
  $still_working_on_it = TRUE;
  while ($still_working_on_it) {
    if (time() > $timestamp + 600) {
      yield NULL;
      $timestamp = time();
    }
    else {
      yield $next_digit;
    }
  }
}

$pi = digits_of_pi();

while(TRUE) {
  $digit = $pi->current();
  if ($digit) {
    echo $digit;
  }
  else {
    do_some_other_thing();
  }
}