Speeding up database calls with PDO and iterators

Image source: Wikipedia.org
When you review lots of code, you often wonder why things were written the way they were. Especially when making expensive calls to a database, I still see things that could and should be improved.

No framework development

When working with a framework, mostly these database calls are optimized for the developer and abstract the complex logic to improve and optimize the retrieval and usage of data. But then developers need to build something without a framework and end up using the basics of PHP in a sub-optimal way.

$pdo = new \PDO(
    $config['db']['dsn'],
    $config['db']['username'],
    $config['db']['password']
);

$sql = 'SELECT * FROM `gen_contact` ORDER BY `contact_modified` DESC';

$stmt = $pdo->prepare($sql);
$stmt->execute();
$data = $stmt->fetchAll(\PDO::FETCH_OBJ);

echo 'Getting the contacts that changed the last 3 months' . PHP_EOL;
foreach ($data as $row) {
    $dt = new \DateTime('2015-04-01 00:00:00');
    if ($dt->format('Y-m-d') . '00:00:00' < $row->contact_modified) {
        echo sprintf(
            '%s (%s)| modified %s',
            $row->contact_name,
            $row->contact_email,
            $row->contact_modified
        ) . PHP_EOL;
    }
}
Above example code is a most common way to retrieve data. At first sight, this code is clean and looks good, but looking more closely you will discover a couple of points to improve.

  • Above code is not reusable, so whenever you need a similar functionality you're stuck with duplicating existing code.
  • Even though you're fetching an object with $stmt->fetchAll(\PDO::FETCH_OBJ); you still face the issue you're using an array of objects which will consume too much memory when fetching lots of data.
  • Filtering is done within the routine, which also means that if you have other filtering conditions you need to modify existing logic, making it hard for maintenance and expanding functionality.

Iterators

Most of the modern frameworks are using Iterators for their data retrieval, because they're fast and reusable. But also they allow other Iterators to filter and modify the retrieved results. Building an application without a framework still gives you the option to use Iterators as they're part of PHP since Version 5.0.0 Beta 2.

So lets assume you continue to use PDO for your data retrieval, we can choose between two options:
  1. Use PDOStatement::fetchAll() to retrieve all data in a single go
  2. Use PDOSTatement::fetch() to retrieve a single row per iteration
Even though the first option seems really tempting, I prefer to use option two as it allows me to create a single Iterator to do the retrieval for me without limiting myself to options required to make the query (and thus making it reusable for any kind of retrievals).

<?php

/**
 * Class DbRowIterator
 *
 * File: Iterator/DbRowIterator.php
 */
class DbRowIterator implements Iterator
{
    /** @var \PDOStatement $pdoStatement The PDO Statement to execute */
    protected $pdoStatement;
    /** @var int $key The cursor pointer */
    protected $key;
    /** @var  bool|\stdClass The resultset for a single row */
    protected $result;
    /** @var  bool $valid Flag indicating there's a valid resource or not */
    protected $valid;

    public function __construct(\PDOStatement $PDOStatement)
    {
        $this->pdoStatement = $PDOStatement;
    }

    /**
     * @inheritDoc
     */
    public function current()
    {
        return $this->result;
    }

    /**
     * @inheritDoc
     */
    public function next()
    {
        $this->key++;
        $this->result = $this->pdoStatement->fetch(
            \PDO::FETCH_OBJ, 
            \PDO::FETCH_ORI_ABS, 
            $this->key
        );
        if (false === $this->result) {
            $this->valid = false;
            return null;
        }
    }

    /**
     * @inheritDoc
     */
    public function key()
    {
        return $this->key;
    }

    /**
     * @inheritDoc
     */
    public function valid()
    {
        return $this->valid;
    }

    /**
     * @inheritDoc
     */
    public function rewind()
    {
        $this->key = 0;
    }
}
Above Iterator is just implementing the PHP Iterator interface, but in our example this is more than enough to achieve our goal.

As you can see, we implement the logic for data retrieval in the "next" loop, as this is our forward retrieval sequence. Take note of the second and third argument of PDOSTatement::fetch() statement: with the second argument we can control the cursor in our data retrieval, the third argument is to position the cursor for this data retrieval which was set scrollable outside the Iterator.

<?php
class LastPeriodIterator extends FilterIterator
{
    protected $period;

    public function __construct(\Iterator $iterator, $period = 'last week')
    {
        parent::__construct($iterator);
        $this->period = $period;
    }
    public function accept()
    {
        if (!$this->getInnerIterator()->valid()) {
            return false;
        }
        $row = $this->getInnerIterator()->current();
        $dt = new \DateTime($this->period);
        if ($dt->format('Y-m-d') . '00:00:00' < $row->contact_modified) {
            return true;
        }
        return false;
    }
}
For filtering our data, we can now extend the SPL FilterIterator that will allow us to attach our filtering immediately to our DbRowIterator, making it extendable and reusable immediately.

Changing our initial data retrieval code into code that will use both of our Iterators is now very simple:

$pdo = new \PDO(
    $config['db']['dsn'],
    $config['db']['username'],
    $config['db']['password']
);

$sql = 'SELECT * FROM `gen_contact` ORDER BY `contact_modified` DESC';
$stmt = $pdo->prepare($sql, [\PDO::ATTR_CURSOR => \PDO::CURSOR_SCROLL]);
$stmt->execute();

$data = new DbRowIterator($stmt);
echo 'Getting the contacts that changed the last 3 months' . PHP_EOL;
$lastPeriod = new LastPeriodIterator($data, '2015-04-01 00:00:00');
foreach ($lastPeriod as $row) {
    echo sprintf(
        '%s (%s)| modified %s',
        $row->contact_name,
        $row->contact_email,
        $row->contact_modified
    ) . PHP_EOL;
}
Please pay attention to $pdo->prepare($sql, [\PDO::ATTR_CURSOR => \PDO::CURSOR_SCROLL]); as we now need to ensure the cursor of dataretrieval is now scrollable so we can use row by row control.

Benchmarking

I know that all this requires a bit of "extra" work and you might wonder why you should invest this "more work" as the foreach-loop was working as well. Let me show you with a benchmark between the two:

Foreach loop

  • Data fetching time for 63992 of 250000 records: 2.14 seconds
  • Data processing time for 63992 of 250000 records: 7.11 seconds
  • Total time for 63992 of 250000 records: 9.25 seconds
  • Memory consumption for 63992 of 250000 records: 217.75MB

Iterator loop

  • Data fetching time for 63992 of 250000 records: 0.92 seconds
  • Data processing time for 63992 of 250000 records: 5.57 seconds
  • Total time for 63992 of 250000 records: 6.49 seconds
  • Memory consumption for 63992 of 250000 records: 0.25MB

Result of this benchmark

  • Data retrieval is faster with Iterators
  • Data processing is faster with Iterators
  • Memory consumption is enormously better with Iterators
Benchmark executed with MySQL 5.5.43 and PHP 5.5.26 on Ubuntu 12.04 LTS (virtual machine). Other versions of PHP, Mysql or OS might give you different results. 250000 records generated using fzaninotto/Faker.

Conclusion

Using simple Iterators in your PHP code you can speed up the data retrieval and processing, but the most important thing that this benchmark shows you is that Iterators will save a ton of memory.

Sidenote

Iterators are more effective for processing large amounts of data. For small amounts of data (aproximately under 5000 entries) Iterators might be even slower than using arrays, but you will still be winning on memory though.

Comments

  1. Hi - thanks for this. I would never have thought to try this method. However, there is a discussion of StackOverflow which suggests it is reinventing the wheel:
    https://stackoverflow.com/questions/39410722/creating-pdo-iterator
    Do have any thoughts on what is said there?
    Thanks again.

    ReplyDelete
    Replies
    1. I looked at the comments on StackOverflow and I agree partially with "Your Common Sense", but again that's looking at it from one point of view. His PDO resource is amazing and a good reference if you want to learn more about using PDO to the full extend.

      My use case was that I needed to process many records and I didn't wanted to clutter the memory by pulling all data into memory, therefor I switched over to an Iterator instead of using a Traversable return type. Secondly I needed to filter based on user interaction and didn't want to make a detour to the database again as the FilterIterator classes provide a powerful way of doing this to the same extend as PDO. The only benefit of filtering with PDO instead of DB is that in most engines the query is not cacheable if you add filters in your query. And since with an Iterator I'm looking at a single record at a time, I don't really care if it's the full query or not. The fact I can pull it straight out of the database cache is my little win.

      I don't think my article is a complete hoax as he describes it, but I do believe it leaves plenty of room for improvement as any piece of code is only 1% finished.

      Delete
  2. Anonymous9/11/17 15:58

    Reply to your article -> https://stackoverflow.com/a/39410972/5441700

    ReplyDelete
  3. Anonymous2/12/17 16:14

    The comparison is a bit weird, because in the first example you're moving all columns into PHP, whereas in the second example you're iterating over them one by one.
    Could you please also include the following use case in your comparison? (Sorry for the bad formatting...)

    $pdo = new \PDO(
    $config['db']['dsn'],
    $config['db']['username'],
    $config['db']['password']
    );

    $sql = 'SELECT * FROM `gen_contact` ORDER BY `contact_modified` DESC';

    $stmt = $pdo->prepare($sql);
    $stmt->execute();

    echo 'Getting the contacts that changed the last 3 months' . PHP_EOL;
    while (($data = $stmt->fetch(\PDO::FETCH_OBJ))) {
    $dt = new \DateTime('2015-04-01 00:00:00');
    if ($dt->format('Y-m-d') . '00:00:00' < $row->contact_modified) {
    echo sprintf(
    '%s (%s)| modified %s',
    $row->contact_name,
    $row->contact_email,
    $row->contact_modified
    ) . PHP_EOL;
    }
    }

    ReplyDelete
    Replies
    1. The first example I posted was a "common" use case you can find on many blogs and web sites showcasing the usage of PDO.

      The reason I'm iterating over each record one by one is I can still process the data (often a lot faster) without filling up my memory space. PHP arrays are great for data sets under ~5000 entries (depending on the complexity of their structures), but once you go over that limit you inevitably run in situations that you're exhausting your memory or the processing of the data takes too much time.

      Using an Iterator allows me to let the DB do the thing it knows best: retrieving the data and offer it as a stream. With PHP we "Iterate" over that stream, one record at a time which gives us superb speeds and less memory consumption. When you're processing 500K records while 10K users are all requesting the same data sets, you know exactly what I mean. The approach I've given here is only for performance purposes.

      Delete
  4. It is an informative post.

    ReplyDelete

Post a Comment

Popular Posts