Pages

2015/07/26

Speeding up database calls with PDO and iterators

Image source: Wikipedia.org
When you review lots of code, you often wonder why things were written the way they were. Especially when making expensive calls to a database, I still see things that could and should be improved.

No framework development

When working with a framework, mostly these database calls are optimized for the developer and abstract the complex logic to improve and optimize the retrieval and usage of data. But then developers need to build something without a framework and end up using the basics of PHP in a sub-optimal way.

$pdo = new \PDO(
    $config['db']['dsn'],
    $config['db']['username'],
    $config['db']['password']
);

$sql = 'SELECT * FROM `gen_contact` ORDER BY `contact_modified` DESC';

$stmt = $pdo->prepare($sql);
$stmt->execute();
$data = $stmt->fetchAll(\PDO::FETCH_OBJ);

echo 'Getting the contacts that changed the last 3 months' . PHP_EOL;
foreach ($data as $row) {
    $dt = new \DateTime('2015-04-01 00:00:00');
    if ($dt->format('Y-m-d') . '00:00:00' < $row->contact_modified) {
        echo sprintf(
            '%s (%s)| modified %s',
            $row->contact_name,
            $row->contact_email,
            $row->contact_modified
        ) . PHP_EOL;
    }
}
Above example code is a most common way to retrieve data. At first sight, this code is clean and looks good, but looking more closely you will discover a couple of points to improve.

  • Above code is not reusable, so whenever you need a similar functionality you're stuck with duplicating existing code.
  • Even though you're fetching an object with $stmt->fetchAll(\PDO::FETCH_OBJ); you still face the issue you're using an array of objects which will consume too much memory when fetching lots of data.
  • Filtering is done within the routine, which also means that if you have other filtering conditions you need to modify existing logic, making it hard for maintenance and expanding functionality.

Iterators

Most of the modern frameworks are using Iterators for their data retrieval, because they're fast and reusable. But also they allow other Iterators to filter and modify the retrieved results. Building an application without a framework still gives you the option to use Iterators as they're part of PHP since Version 5.0.0 Beta 2.

So lets assume you continue to use PDO for your data retrieval, we can choose between two options:
  1. Use PDOStatement::fetchAll() to retrieve all data in a single go
  2. Use PDOSTatement::fetch() to retrieve a single row per iteration
Even though the first option seems really tempting, I prefer to use option two as it allows me to create a single Iterator to do the retrieval for me without limiting myself to options required to make the query (and thus making it reusable for any kind of retrievals).

<?php

/**
 * Class DbRowIterator
 *
 * File: Iterator/DbRowIterator.php
 */
class DbRowIterator implements Iterator
{
    /** @var \PDOStatement $pdoStatement The PDO Statement to execute */
    protected $pdoStatement;
    /** @var int $key The cursor pointer */
    protected $key;
    /** @var  bool|\stdClass The resultset for a single row */
    protected $result;
    /** @var  bool $valid Flag indicating there's a valid resource or not */
    protected $valid;

    public function __construct(\PDOStatement $PDOStatement)
    {
        $this->pdoStatement = $PDOStatement;
    }

    /**
     * @inheritDoc
     */
    public function current()
    {
        return $this->result;
    }

    /**
     * @inheritDoc
     */
    public function next()
    {
        $this->key++;
        $this->result = $this->pdoStatement->fetch(
            \PDO::FETCH_OBJ, 
            \PDO::FETCH_ORI_ABS, 
            $this->key
        );
        if (false === $this->result) {
            $this->valid = false;
            return null;
        }
    }

    /**
     * @inheritDoc
     */
    public function key()
    {
        return $this->key;
    }

    /**
     * @inheritDoc
     */
    public function valid()
    {
        return $this->valid;
    }

    /**
     * @inheritDoc
     */
    public function rewind()
    {
        $this->key = 0;
    }
}
Above Iterator is just implementing the PHP Iterator interface, but in our example this is more than enough to achieve our goal.

As you can see, we implement the logic for data retrieval in the "next" loop, as this is our forward retrieval sequence. Take note of the second and third argument of PDOSTatement::fetch() statement: with the second argument we can control the cursor in our data retrieval, the third argument is to position the cursor for this data retrieval which was set scrollable outside the Iterator.

<?php
class LastPeriodIterator extends FilterIterator
{
    protected $period;

    public function __construct(\Iterator $iterator, $period = 'last week')
    {
        parent::__construct($iterator);
        $this->period = $period;
    }
    public function accept()
    {
        if (!$this->getInnerIterator()->valid()) {
            return false;
        }
        $row = $this->getInnerIterator()->current();
        $dt = new \DateTime($this->period);
        if ($dt->format('Y-m-d') . '00:00:00' < $row->contact_modified) {
            return true;
        }
        return false;
    }
}
For filtering our data, we can now extend the SPL FilterIterator that will allow us to attach our filtering immediately to our DbRowIterator, making it extendable and reusable immediately.

Changing our initial data retrieval code into code that will use both of our Iterators is now very simple:

$pdo = new \PDO(
    $config['db']['dsn'],
    $config['db']['username'],
    $config['db']['password']
);

$sql = 'SELECT * FROM `gen_contact` ORDER BY `contact_modified` DESC';
$stmt = $pdo->prepare($sql, [\PDO::ATTR_CURSOR => \PDO::CURSOR_SCROLL]);
$stmt->execute();

$data = new DbRowIterator($stmt);
echo 'Getting the contacts that changed the last 3 months' . PHP_EOL;
$lastPeriod = new LastPeriodIterator($data, '2015-04-01 00:00:00');
foreach ($lastPeriod as $row) {
    echo sprintf(
        '%s (%s)| modified %s',
        $row->contact_name,
        $row->contact_email,
        $row->contact_modified
    ) . PHP_EOL;
}
Please pay attention to $pdo->prepare($sql, [\PDO::ATTR_CURSOR => \PDO::CURSOR_SCROLL]); as we now need to ensure the cursor of dataretrieval is now scrollable so we can use row by row control.

Benchmarking

I know that all this requires a bit of "extra" work and you might wonder why you should invest this "more work" as the foreach-loop was working as well. Let me show you with a benchmark between the two:

Foreach loop

  • Data fetching time for 63992 of 250000 records: 2.14 seconds
  • Data processing time for 63992 of 250000 records: 7.11 seconds
  • Total time for 63992 of 250000 records: 9.25 seconds
  • Memory consumption for 63992 of 250000 records: 217.75MB

Iterator loop

  • Data fetching time for 63992 of 250000 records: 0.92 seconds
  • Data processing time for 63992 of 250000 records: 5.57 seconds
  • Total time for 63992 of 250000 records: 6.49 seconds
  • Memory consumption for 63992 of 250000 records: 0.25MB

Result of this benchmark

  • Data retrieval is faster with Iterators
  • Data processing is faster with Iterators
  • Memory consumption is enormously better with Iterators
Benchmark executed with MySQL 5.5.43 and PHP 5.5.26 on Ubuntu 12.04 LTS (virtual machine). Other versions of PHP, Mysql or OS might give you different results. 250000 records generated using fzaninotto/Faker.

Conclusion

Using simple Iterators in your PHP code you can speed up the data retrieval and processing, but the most important thing that this benchmark shows you is that Iterators will save a ton of memory.

Sidenote

Iterators are more effective for processing large amounts of data. For small amounts of data (aproximately under 5000 entries) Iterators might be even slower than using arrays, but you will still be winning on memory though.

No comments:

Post a Comment