How to implement Asynchronous background execution with PHP

Post Reply
User avatar
Shane
Captain
Captain
Posts: 226
Joined: Sun Jul 19, 2009 9:59 pm
Location: Jönköping, Sweden

How to implement Asynchronous background execution with PHP

Post by Shane » Tue Feb 23, 2010 10:46 am

If you ever worked with threads and particular work queues you know how convenient they can be.
Have some demanding work that needs to be done but no time to do it yourself? No problem, just put it on the work queue and continue with whatever we were doing, some other thread will come along and do the dirty work for you.

Consider the following scenario. Actions and inputs from a web page triggers something that might take a (very) long time to execute and if executed during the browser session which besides annoying the user who has to wait for a page to load, might cause a time out and interrupt the processing. How do we solve this?

I’ll use the term job to indicate some work that needs to be execution, in practice this is an isolated PHP function which takes an unspecified time to execute.

Some possible solutions
  • Store job information somewhere (like a SQL database) have a separate script, PHP or in some other language, running on the host, either continuously or from crontab. The obvious flaw is that it requires additional administrative maintenance to keep the script running and if executed from crontab, will cause an additional delay before the job is executed.
  • Use pcntl_fork from inside the (web page) PHP script and execute the job function in the child, problem is that this will fork an entire Apache (or whatever web server you’re using) processes which can cause all sorts of problems.
  • Some other solution I didn’t think about
?

My solution ended up being a mix of the two above. It consists of three major parts and, a bit simplified, works like this
  1. Job information is stored in a SQL database.
  2. An external CLI-based worker PHP-script is launched from inside the web page script using proc_open.
  3. The worker script, which is running in an isolated PHP process calls pcntl_fork to create a duplicate of itself. The parent returns directly to the web page script.
  4. The child process of the worker script now runs free of any webserver process and can now retrieve job information from the SQL database. When all jobs are executed it kills itself.
Here is a nice block diagram that shows the interaction of the components
bgphp.png
bgphp.png (30.83 KiB) Viewed 4971 times
And viola, we have asynchronous background execution of (almost) arbitrary PHP functions. There are a few problems left to solve, we’ll tackle those further down.

Requirements
  • PHP 5 built with CLI support
  • The PHP module pcntl
  • The classes SQLConnector and Prefs published in earlier posts.
SQL database
First, we’ll need some database support. I’ve used MySQL, but any database should work fine. One simple table is required, it’s called jobs and looks like this
FIELDTypeNULLKEYDEFAULTExtra
jidbigint(20) UNSIGNEDNOPRINULLAUTO_INCREMENT
DATAtextYESNULL
The column jid is an arbitrary job id and the column data contains job information (more on what exactly job information is will be covered further down). You can create this table with the following SQL command

Code: Select all

CREATE TABLE jobs (jid bigint UNSIGNED AUTO_INCREMENT,
    DATA text, PRIMARY KEY (jid))
PHP implementation
The implementation consists of three files, we’ll focus on the class jobs first. This leads us into what exactly the data column should contain. Since the job (PHP function) will execute in a new context the worker must be able to bring in PHP files so that required functions and classes can be resolved. So, in addition to the actual job function and an opaque argument we also need to store a list of PHP files to include at execution time.

The complete data structure stored in the data column looks like this.

Code: Select all

$data = array(
    /* Version identifier of job structure */
    'version' => 1,
    /* Array of PHP files to include at execution time */
    'include' => array(),
    /* Name of actual job function */
    'callback' => '',
    /* Opaque argument passed to callback function */
    'args' => null,
); 
A dedicated class calls jobs manages jobs and allows enqueuing and dequeuing of jobs. For clarity, only the function prototypes are shown here, the full source code can be found at the bottom of the pafe.

Code: Select all

class jobs {
    private $m_con; /* SQL Connector */
    const jobversion = '1'; /* Job structure version */
 
    private $m_data = array(
        'version' => self::jobversion,
        'include' => array(),
        'callback' => '',
        'args' => null,
    );
 
    /*
     * Enqueue a job for execution
     *  cb - Name of PHP function to execute
     *  args - Arguments to pass to 'cb'
     *  incs - Array of PHP files to include before execution
     */
    public function enqueue($cb, $args, $incs);
 
    /*
     * Return the job identifier (integer) of
     * next available job or -1 if no new jobs are
     * available.
     */
    public function nextJob();
 
    /*
     * Dequeue job with id 'jid' for execution,
     * if no such job is available null is returned.
     *
     * An array with the following keys are returned
     *  (jid, include, callback, args)
     */
    public function dequeue($jid);
 
    /*
     * Attempt to launch worker process if not already running
     */
    private function startWorker();
} 
The callback function ‘cb’ will be called like this, cb(args), before this call all files listed in the array incs will be included using require_once.
The functions enqueue, dequeue and nextJob are nothing special, they simply do normal SQL manipulation (inserts and removes job from the table). However the routine startWorker might deserve some attention (it’s called internally from enqueue)

Code: Select all

/* Should be set to the path of the CLI PHP binary */
define("PHP_PATH",      "/usr/local/bin/php");
 
private function startWorker() {
    /*
     * Attempt to read the preference
     * 'worker_pid' to check if the worker is
     * already running.
     */
    $prefs = new Prefs($this->m_con, 'jobs');
    $pid = $prefs->worker_pid;
    if ($pid != null)
        return;
 
    /* Get our working directory */
    $cwd = getcwd();
 
    /*
     * Construct a command such as the following
     *  /usr/local/bin/php /working/path/worker.php
     */
    $cmd = escapeshellcmd(PHP_PATH ." $cwd/worker.php");
    $desc = array();
 
    /* Execute the command and wait for it to finish */
    $proc = proc_open($cmd, $desc, $pipes, NULL, NULL);
    proc_close($proc);
} 
The startWorker function is executed in the context of a web page (from inside the Apache PHP module for example) and as you can see it doesn’t call pcntl_fork directly. Technically it still forks, otherwise it wouldn’t be able to execute another process. But we let the PHP module deal with that mess.

This leads us to worker.php
File: worker.php

Code: Select all

require_once('jobs.inc');
require_once('prefs.inc');
require_once('db_mysql.inc');
define("PHP_PATH",      "/usr/local/bin/php");
 
if (!isset($argv)) {
    exit("Don't call me from a browser");
}
 
/*
 * Fork our self and let the parent return directly to startWorker
 */
$pid = pcntl_fork();
if ($pid == -1) {
   die("could not fork");
} else if ($pid) {
    /*
     * This is the parent process. Record the child pid so that we
     * don't launch more processes than wanted.
     */
    $con = new MySQLConnector('localhost', 'user', 'pwd', 'db');
    $prefs = new Prefs($con, 'jobs');
    $prefs->worker_pid = $pid;
    $prefs->Flush();
    /*
     * Exit the script, it runs in a separate
     * PHP process NOT inside Apache
     */
    exit;
}
 
$con = new MySQLConnector('localhost', 'user', 'pwd', 'db')
$jobs = new jobs($con);
 
$cwd = getcwd();
if ($cwd == false)
    exit("can't get working directory");
 
/*
 * Loop over all available jobs
 */
$pjid = -1;
for (;;) {
    $jid = $jobs->nextJob();
    if ($jid == -1 || $jid == $pjid)
        break;
 
    /*
     * Execute each job in a clean environment
     */
    $cmd = escapeshellcmd(PHP_PATH . " $cwd/job_execute.php $jid");
    $desc = array();
    $proc = proc_open($cmd, $desc, $pipes);
    proc_close($proc);
 
    $pjid = $jid;
}
 
/* Tell the world that we aren't executing any more */
$prefs = new Prefs($con, 'jobs');
$prefs->worker_pid = null; 
The first thing that happens is that worker.php forks itself and lets the parent return directly to the caller, which in this case is startWorker. This allows startWorker and thus enqueue to finish and the calling PHP script can resume with what ever it was doing (creating a web page etc).

Also note that worker.php doesn’t execute the job functions directly, instead it hands the jid number to a second script called job_execute.php. There is a good reason for this, since we must include PHP files (with require_once) to be able to execute the job function the namespace of the worker would be contaminated quite fast and with that comes the risk of name collisions. By letting each job execute in a totally clean environment they can include only the files needed for their execution and thus avoid any name collisions.

job_execute.php takes a job id (jid) as argument, dequeues the job, includes all required files, calls the job function and then terminates.

File: job_execute.php

Code: Select all

require_once('jobs.inc');
require_once('db_mysql.inc');
 
if (!isset($argv)) {
    exit("Don't call me from a browser");
}
 
if (!isset($argv[1])) {
    exit("Job missing");
}
$jid = $argv[1];
 
$con = new MySQLConnector('localhost', 'user', 'pwd', 'db');
$jobs = new jobs($con);
$job = $jobs->dequeue($jid);
if ($job == null)
    exit("Job dequeue failed");
 
/*
 * Include required files
 */
foreach ($job['include'] as $inc) {
    require_once($inc);
}
 
/* Execute the job handler */
$cb = $job['callback'];
$cb($job['args']); 
Okay, that should be all. Are you still with me?

Using it

After all this…how do one use this then? First of all we need one or more job functions that we would like to be background executed. The prototype for such a function looks like this

Code: Select all

function myBgFunc($args) { ... }
Where args can be anything you like (as long as PHP allows it). Put the function inside a file and include (preferably with require_once) any other files that you might need. For example like this, not the best example, but anyway.

File: myfunc.php

Code: Select all

<?php
/* $args is an integer in this case */
function test1($args) {
    $j = 1;
    for ($i = 0; $i < $args; $i++) {
       $j *= 1.00001;
    }
    /* Report result somewhere */
}
?>
Now, from your run-of-the-mill PHP script, create a jobs object and enqueue the function test1.

Code: Select all

require_once('db_mysql.inc');
require_once('jobs.inc');
$con = new MySQLConnector('localhost', 'user', 'pwd', 'db');
$jobs = new jobs($con);
/* Remember, "function", "argument", "files needed" */
$jobs->enqueue("test1", rand() * 1000, array('myfunc.php'));
print "Job enqueued"; 
Even if test1 takes ages to execute, the enqeue function will execute in a fraction of a second and report back directly to the web browser. You can enqueue any number of functions, they will be executed on a first-come first-serve basis.

Improvements
I’ve tested this a bit and it seems to work, however I haven’t put it through some real situation test yet. With some additional locking it should be possible to run multiple instances of worker.php to take advantage of SMP systems.

There is no built-in mechanism to know when a job has finished. The job function will need to report its results to a SQL database for example. If browser feedback is needed an AJAX solution could poll the database for the execution status.

Files
Following files with complete source code is included in the zip file.
jobs.inc
worker.php
job_execute.php
prefs.inc
db_iface.inc
db_mysql.inc
static.inc
files.zip
(7.98 KiB) Downloaded 360 times
Courtesy of Fredrik Lindberg
Post Reply

Return to “PHP & MySQL”