Note that as my deadlines loom, I am unable to comb over every line of code in this article. Please forgive any un-runnable code and let me know which it is so I can address it. Thanks.
Note also that the title of this article is a reference to the 'Big Generator' album by Yes, released in 1987. It was the less-successful followup to their total smash hit '90125'. It was all I could think of.
You can learn more about it here: http://youtu.be/Qc8ZNeBG0XI
Spoilers
I don't like generators very much, for the same reason I don't like anonymous functions very much: It's very easy to write untestable, and thus unmaintainable generators.
Add in language features such as send()
and throw()
and you get a whole world of smelly code that will be very difficult to test and maintain.
WTH is a Generator?
You know what an array is? Well there's Iterators and they act like arrays, only they're objects. And you can make a function that acts like an Iterator object, and that's a Generator.
OK then. We're done.
Wait, not done? OK. Let's wind it all the way back to the beginning. Which is something Generators can't do, by the way.
Remedial PHP 1: The Array
Here's an array:
$array = array('foo', 'bar', 'baz');
You can loop over the array with foreach()
:
foreach ($array as $item) {
echo 'item: ' . $item;
}
// item: foo
// item: bar
// item: baz
You can also put the array into array functions:
echo reset($array);
// foo
echo last($array);
// baz
And others: current()
, next()
, prev()
, and so forth.
There's plenty more about arrays, but the point here is: You can access array values through such functions as current()
.
Remedial PHP 2: The Iterator
\Iterator
allows objects to act like arrays.
It's an interface provided by PHP v.5.0 and later.
It enforces the existence of methods such as current()
, next()
. This lets operators like foreach()
iterate over whatever those methods return.
class FakeArray implements \Iterator {
public function current() {
return $the_current_thing;
}
// More interface implementation here...
}
$iterator = new FakeArray();
foreach($iterator as $item) {
echo $item;
}
An Iterator can be thought of as a one-directional stack which you can access through next()
and current()
.
// Advance to the next item.
$iterator->next();
// Get the item we advanced to.
echo $iterator->current();
Iterators can only go forwards, other than to rewind()
. That is, there's no previous()
.
The big point here: Iterators are objects that you call methods on, much like you pass arrays to functions.
Remedial PHP 3: Anonymous Functions are Objects
This is an anonymous function:
function () { return 'yay!'; }
We can make a reference to our anonymous function:
$foof = function () { return 'yay!'; };
We can then execute our function:
echo $foof();
Our anonymous function is stored internally to PHP as a \Closure
object:
echo get_class($foof);
// Closure
This illustrates that a function can be implemented as an object by PHP.
Normal functions don't work this way. That is, they aren't converted to objects in any way we can know about.
But generators do work this way.
Non-Remedial PHP: The Generator, Level 1
What's A Generator?
\Generator
is a class internal to PHP. It implements \Iterator
, so it's bound to act like one. Generator can't do any random-access, not even rewind()
. They only move forward.
You can't explicitly create a Generator.
// Phail!
$generator = new \Generator();
You create a generator by creating a function. PHP figures out that your function is really a generator and then casts it as a Generator object.
How does it know?
- You don't return a value. (You can say
return;
, but notreturn 23;
.) - You have a
yield
operator. - That's it.
Ridiculously Useless Sample Generator Function:
function generator_function() {
for ($i=0; $i<9000; ++$i) {
yield $i;
}
return;
}
If you wanted something to iterate 9000 times, and you were running PHP earlier than v.5.5, you might have had to say something like this:
foreach(range(0,8999) as $count) {
echo $count;
}
This is bad because a) it's a very stupid example, but also b) You're creating a 9000-element array, which fills up memory.
So we replace it with our generator function:
foreach(generator_function() as $count) {
echo $count;
}
Now we only have the $i
index in memory, instead of a giant array.
This is often cited as the raison d'etre for the generator as a concept: You can avoid large memory-gobbling arrays.
Wait, What? Please Yield An Explanation
If you're used to functions returning a single value and then going out of scope, you are justifiably puzzled.
Here's a diagram:
function generator() {
yield 1; // <-- Control goes back to caller here.
// Control returns here from caller.
yield 2; // <-- Control goes back to caller again.
}
In a generator, on the first iteration the function is called until it gets to a yield
command.
The state of the function call is suspended at that point and the yield
ed value is given to the caller.
On the next iteration, code flow comes back to the generator function just after the yield
call.
Control flows until it gets to another yield
command, or a return
, or until the function ends.
And keeps on going like that.
If We're Emulating Arrays, What About Keys For Our Values?
E-zee. Just use array key-value assignment operators.
function generator() {
yield 'key' => 'value';
}
foreach(generator() as $key=>$value) {
echo $key;
echo $value;
}
This means that generators are functions that can return two values.
If I can, I'd like to pause a moment here, and repeat that. This is a little bit of an Alice In Wonderland type moment, and I'd like to draw your attention to it's strangeness:
A generator is a function that can return two values.
That's weird, but it will get weirder.
Generators Are Functions, Right?
We can pass arguments to generators:
function generator_function($offset) {
for ($i=0;$i<10;++$i) {
yield $offset + $i;
}
}
We can get a reference to a generator just like we can with anonymous functions, and use that instead. Like this:
$g = generator_function(23);
foreach($g as $item) {
echo $item;
}
Anonymous Generators
And now that we've learned that, we can make the leap to declaring anonymous generators:
$gen = function() {
for ($i=0; $i<3000; ++$i) {
yield $i;
}
}
foreach ($gen as $count) {
echo $count;
}
Note that we have to assign the anonymous function to a variable. This gives PHP a chance to instantiate it, and thus discover that it's a generator.
// Phail!
foreach( function() { yield $something; } as $whatever) {}
Generator Level 2: Object-Oriented Functions
I told you it was weird, right? So when we get a reference to the generator function, we are getting a reference to a \Generator
object.
$g = generator();
echo get_class($g);
// Generator
Internally, the generator function becomes a \Generator
object, much like anonymous functions become \Closure
objects.
A generator differs from other functions in that the state of the function code is preserved at the yield
operator, rather than going out of scope after return
.
So I can say this:
// Our generator function...
function generator_function() {
for ($i=1; $i<=10; ++$i) {
yield $i;
};
}
// Instantiate a Generator.
$generator = generator_function();
echo $generator->current();
// 1
$generator->next();
echo $generator->current();
// 2
// etc.
As long as $generator
is in scope, we can call \Iterator
methods on it. We could even put this generator in a global scope and have different areas of code call those methods.
Is that a bad idea? Probably. But we could.
So it's a function and then it's an object and it lives in a scope. Which is weird.
send()
Speaking of weird... I mean really weird...
If we look at the PHP specifications for /Generator
, we see that it includes two methods: send()
and throw()
.
send()
lets us send a value into the iterator.
That's right, the yield
command is bi-directional.
So the deal is, you can make a generator that looks like this:
function sendable_generator() {
$value = 0;
for($i=0;$i<10;++$i) {
$value = (yield $value + $i);
}
}
Do you see the assignment made after yield
is called?
We have to place yield
inside some parentheses so that PHP isn't confused about precedence. Or really, I'd wager it was designed that way so that we wouldn't be confused about precedence.
What happens there? yield
sends a value back and control leaves the generator. Then when control comes back in from a send()
, the sent value is placed in $value
.
From the outside it looks like this:
$g = sendable_generator();
echo $g->send(23);
// 24
Note that, in this case, the send()
'wastes' an iteration, because the yield happened, and then PHP sent the value back, and then execution continued until the next yield.
throw()
There's also that throw()
method. It's similar to send()
except it throws an exception into the generator instead of poking back a value.
If you want to signal your generator in some way other than send
ing back a special value, you can make it explode from the inside with an exception.
You can, of course, catch this exception and do some special behavior, maybe like this:
function generator() {
for($i=0;$i<10;++$i) {
// Wrap in a try structure.
try {
yield $i;
}
// Receive special signal from caller.
catch (\Exception $e) {
yield 'boom';
}
}
}
$g = generator();
echo $g->current();
// 0
$g->next();
echo $g->throw(new \Exception());
// boom
$g->next();
echo $g->current();
// 2
Stupid Generator Tricks
Kill me now.
function gen(\Iterator $g = NULL) {
if ($g) {
yield $g->current();
}
else {
yield 23;
}
}
$g1 = gen();
$g2 = gen($g1);
foreach ($g2 as $item) {
echo $item;
}
I eagerly await death.
function digits_of_pi() {
$timestamp = time();
$still_working_on_it = TRUE;
while ($still_working_on_it) {
if (time() > $timestamp + 600) {
yield NULL;
$timestamp = time();
}
else {
yield $next_digit;
}
}
}
$pi = digits_of_pi();
while(TRUE) {
$digit = $pi->current();
if ($digit) {
echo $digit;
}
else {
do_some_other_thing();
}
}