Tag Archives: php

Benchmarking Asyncronous PHP vs NodeJS Properly

This article was jointly written by Samuel Reed, a freelance JavaScript developer, and Kevin Ohashi, Founder of Review Signal. NodeJS testing was handled by Samuel Reed and PHP by Kevin Ohashi on the same VPS.

I recently read Phil Sturgeon's Benchmarking Codswallop: NodeJS v PHP and was interested to see that PHP performed pretty well compared to NodeJS once it was run asynchronously.

My first reaction was to share the article with my good friend Samuel Reed who is a JavaScript and NodeJS lover to see what his thoughts were. He instantly wrote back to me "Not that it matters, but that bench was completely broken node-side, we have an absurd connection pooling default."

So we decided to run the test ourselves and he would fix the NodeJS settings. I would follow Phil's instructions for the PHP side as best as possible.

The Server

We used the same VPS. It was a 512MB droplet from DigitalOcean in NY2.

PHP

If you are wondering, to get ReactPHP running with composer, simply do the following once you're in the folder with all the files:

curl -sS https://getcomposer.org/installer | php

php composer.phar install

It's not in Phil's instructions, but that's the missing part he alludes to.

Blocking PHP Results:

real 3m21.114s
user 0m8.217s
sys 0m0.496s

Non-Blocking PHP (ReactPHP) Results:

real 0m11.971s
user 0m10.897s
sys 0m0.704s

The PHP results were fairly similar to Phil's result. That's good, because we haven't changed anything and tried to replicate his experiment.

NodeJS

I used two versions of Node for the test - the latest stable tag at 0.10.22, and the latest unstable, at 0.11.8.

Around version 0.4, Node added connection pooling and a concept called an Agent. An agent can be added to an individual http request, or to all requests on the server. For example, if I had a free or limited account on some API, I might not want to hit it with more than 10 simultaneous connections at a time. In that case, I would assign an Agent to the request and set its `maxSockets` property to 10.

Unfortunately, the `globalAgent`, which runs for all http requests without an explicit Agent (which is what you do about 99% of the time), has a limit of 5. That's right - just 5. One of Node's biggest selling points is speed, yet outgoing requests are completely hamstrung by this absurd opt-out connection pooling scheme that most people miss. It's not like these are expensive database connections - these are outgoing http requests, and all of them share the same 5-connection limit.

As a general rule, setting `http.globalAgent.maxSockets=Infinity` is what you want. It's one of the first things I do in any program that makes outgoing requests.

Of course, I'm not the first person to rant about this; not by a long shot. From famed NodeJS developer substack:

This module disables a lot of infuriating things about core http that WILL cause 
bugs in your application if you think of http as just another kind of stream:

http requests have a default idle timeout of 2 minutes. This is terrible if you 
just want to pipe together a bunch of persistent backend processes over http.

There is a default connection pool of 5 requests. If you have 5 or more extant 
http requests, any additional requests will HANG for NO GOOD REASON.

hyperquest turns these annoyances off so you can just pretend that core http is 
a fancier version of tcp and not the horrible monstrosity that it actually is.

I have it on good authority that these annoyances will be fixed in node 0.12.

Luckily, this behavior is now fixed as of about four months ago, and the last few Node unstable releases contain the fix. The next stable release, Node 0.12 is around the corner and thankfully will make a big difference in lots of users' applications.

NodeJS Results

v0.10.22 (default maxSockets=5)

real 0m30.691s
user 0m10.089s
sys 0m0.628s

v0.10.22 (maxSockets=64, mentioned in the followup article)

real 0m11.299s
user 0m9.973s
sys 0m0.768s

v0.10.22 (maxSockets=Infinity, this is what most servers should be running)

real 0m10.339s
user 0m9.269s
sys 0m0.624s

v0.11.8 (default is maxSockets=Infinity)

real 0m7.480s
user 0m6.372s
sys 0m0.680s

Node v0.11.x brings along with it a host of http optimizations that make it even faster than v0.10.22 without connection pooling.

Conclusion

At this point, as Phil Sturgeon mentioned, this is more of a network test than anything else; although latest Node is about 60% faster in this test than ReactPHP. Of course, when choosing a framework, one must consider much more than raw speed in singular benchmarks, but they can be entertaining.

We thought it best, given the popularity of the original post, to set the record straight about how it would perform if the NodeJS code were corrected. In general, if you're running Node v0.10 or below, set `http.globalAgent.maxSockets=Infinity` (and the same for https, if you use it), unless you absolutely have a need for connection pooling.

Long Running Processes in PHP

Here at Review Signal, I use a lot of PHP code and one of the challenges is getting PHP to run for long periods of time.

Here are two sample problems that I deal with at Review Signal that require PHP processes to run for long periods of time or indefinitely:

  1. Data processing – Every night Review Signal crunches millions of pieces of data to update our rankings.
  2. Twitter Streaming API data – this requires a constant connection to the Twitter API to receive messages as they are posted on Twitter

The Tools

One of the best things as of PHP 5 is CLI (Command Line Interface). PHP CLI allows you to run things directly from the command line and doesn't have a time limit built in. All the pains of set_time_limit() and playing with php.ini disappear.

If you're going to be working from the command line, you're probably going to need to learn a little bit of bash scripting.

Finally, we will use cron (crontab / cron jobs)

SSH vs Cron Jobs

I need to explain that when you run something from a ssh session it is different from when you setup a cronjob to run something for you. An SSH session can be a good place to test scripts and run one-time processes. While a cronjob is the right way to setup a script you want run regularly.

If I write this line into my SSH session

php myscript.php

It will execute myscript.php. However, my terminal will be locked up until it completes.

You can get around this by holding ctrl+z (pauses the execution) and then type 'bg' (backgrounds the process).

For longer running processes, this can be nice, but if you lose your SSH session, it will terminate execution.

You can get around this by using the nohup (no hangup) command.

nohup php myscript.php

nohup allows execution to continue even if you lose the session. So if you use nohup, and then background the process it will finish executing regardless of your SSH session's status.
All of this only matters if you are running things manually from the command line. If you are running scripts with some regularity and using cronjobs, then you do not need to worry about these issues. Since the server itself is executing them, the SSH terminal sessions don't matter.

Update: A few readers reminded me that you can add an ampersand (&) to the end a command to background it immediately. This avoids having to ctrl+z, bg.

nohup php myscript.php &

Sometimes, you make a mistake and run a process without nohup but want it to continue running even if your SSH session disconnects. I've run scripts late at night thinking they would be quick, only to find out they took a lot longer than expected and I had to go home. This trick allows you to run the script as a daemon, so it won't terminate upon SSH session ending.

  1. ctrl+z to stop (pause) the program and get back to the shell
  2. bg to run it in the background
  3. disown -h [job-spec] where [job-spec] is the job number (like %1 for the first running job; find about your number with the jobs command) so that the job isn't killed when the terminal closes

Credit to user Node on StackOverflow

Data Processing with PHP

Since I run this script regularly, I create a bash script which is executed by a cron job.

Example bash script which actually runs the PHP script:

#!/bin/sh
php /path/to/script.php

Example cron job:

0 0 23 * * /bin/sh /path/to/bashscript.sh

If you don't know where to put the code above, type 'crontab -e' to edit your cron table and save it. The 0 0 23 * * tells it run when the time is 0 seconds, 0 minutes, 23 hours on any day, any day of the week.

So now we have a basic script which will run every night at 11pm. It doesn't matter how long it will take to execute, it will simply start every night at that time and run until it's finished.

Twitter Streaming API

The second problem is more interesting because the PHP script needs to be running to collect data. I want it running all the time. So I have a php script (thank you to Phirehose library) which keeps an open connection to the Twitter API but I can't rely on it to always be running. The server may restart, the script may error out, or other problems could occur.

So my solution has been to create a bash script to make sure the process is running. And if it isn't running, run it.

#!/bin/sh
 
ps aux | grep '[m]yScript.php'
if [ $? -ne 0 ]
then
    php /path/to/myScript.php
fi

Line by line explanation:

#!/bin/sh

So we start with our path to the shell.

ps aux | grep '[m]yScript.php'

process list is piped (|) to grep which searches for '[m]yScript.php'. I use the [m] regular expression matching so it doesn't match itself. Grep will spawn a process with myScript.php in the command, so it will always find a result if you search without putting something in brackets.

if [ $? -ne 0]

This checks the last command's return value. So if nothing was returned by searching our process list for [m]yScript.php

then
    php /path/to/myScript.php
fi

These lines are executed if our php script isn't found running. It runs our php script. The conditional is then terminated with fi.

Now, we create a cron job that executes the script above:

* * * * * /bin/sh runsForever.sh

So now we have a system that checks every minute to see if myScript.php is running. If it isn't running, it starts it.

Conclusion

You will notice the Twitter streaming script is just a more advanced version of the data processing. Both of the working versions have a lot more things going on in my live scripts but are beyond the scope of this article. If you are interested in extending them, you may want to look into logging as a first step. What I've learned from years of hands-on practice is that this setup can and does work. I've run php processes for many months on this configuration.