Here at Review Signal, I use a lot of PHP code and one of the challenges is getting PHP to run for long periods of time.
Here are two sample problems that I deal with at Review Signal that require PHP processes to run for long periods of time or indefinitely:
- Data processing – Every night Review Signal crunches millions of pieces of data to update our rankings.
- Twitter Streaming API data – this requires a constant connection to the Twitter API to receive messages as they are posted on Twitter
The Tools
One of the best things as of PHP 5 is CLI (Command Line Interface). PHP CLI allows you to run things directly from the command line and doesn't have a time limit built in. All the pains of set_time_limit() and playing with php.ini disappear.
If you're going to be working from the command line, you're probably going to need to learn a little bit of bash scripting.
Finally, we will use cron (crontab / cron jobs)
SSH vs Cron Jobs
I need to explain that when you run something from a ssh session it is different from when you setup a cronjob to run something for you. An SSH session can be a good place to test scripts and run one-time processes. While a cronjob is the right way to setup a script you want run regularly.
If I write this line into my SSH session
php myscript.php |
It will execute myscript.php. However, my terminal will be locked up until it completes.
You can get around this by holding ctrl+z (pauses the execution) and then type 'bg' (backgrounds the process).
For longer running processes, this can be nice, but if you lose your SSH session, it will terminate execution.
You can get around this by using the nohup (no hangup) command.
nohup php myscript.php |
nohup allows execution to continue even if you lose the session. So if you use nohup, and then background the process it will finish executing regardless of your SSH session's status.
All of this only matters if you are running things manually from the command line. If you are running scripts with some regularity and using cronjobs, then you do not need to worry about these issues. Since the server itself is executing them, the SSH terminal sessions don't matter.
Update: A few readers reminded me that you can add an ampersand (&) to the end a command to background it immediately. This avoids having to ctrl+z, bg.
nohup php myscript.php & |
Sometimes, you make a mistake and run a process without nohup but want it to continue running even if your SSH session disconnects. I've run scripts late at night thinking they would be quick, only to find out they took a lot longer than expected and I had to go home. This trick allows you to run the script as a daemon, so it won't terminate upon SSH session ending.
- ctrl+z to stop (pause) the program and get back to the shell
bg
to run it in the backgrounddisown -h [job-spec]
where [job-spec] is the job number (like %1 for the first running job; find about your number with thejobs
command) so that the job isn't killed when the terminal closes
Credit to user Node on StackOverflow
Data Processing with PHP
Since I run this script regularly, I create a bash script which is executed by a cron job.
Example bash script which actually runs the PHP script:
#!/bin/sh php /path/to/script.php |
Example cron job:
0 0 23 * * /bin/sh /path/to/bashscript.sh |
If you don't know where to put the code above, type 'crontab -e' to edit your cron table and save it. The 0 0 23 * * tells it run when the time is 0 seconds, 0 minutes, 23 hours on any day, any day of the week.
So now we have a basic script which will run every night at 11pm. It doesn't matter how long it will take to execute, it will simply start every night at that time and run until it's finished.
Twitter Streaming API
The second problem is more interesting because the PHP script needs to be running to collect data. I want it running all the time. So I have a php script (thank you to Phirehose library) which keeps an open connection to the Twitter API but I can't rely on it to always be running. The server may restart, the script may error out, or other problems could occur.
So my solution has been to create a bash script to make sure the process is running. And if it isn't running, run it.
#!/bin/sh ps aux | grep '[m]yScript.php' if [ $? -ne 0 ] then php /path/to/myScript.php fi |
Line by line explanation:
#!/bin/sh |
So we start with our path to the shell.
ps aux | grep '[m]yScript.php' |
process list is piped (|) to grep which searches for '[m]yScript.php'. I use the [m] regular expression matching so it doesn't match itself. Grep will spawn a process with myScript.php in the command, so it will always find a result if you search without putting something in brackets.
if [ $? -ne 0] |
This checks the last command's return value. So if nothing was returned by searching our process list for [m]yScript.php
then php /path/to/myScript.php fi |
These lines are executed if our php script isn't found running. It runs our php script. The conditional is then terminated with fi.
Now, we create a cron job that executes the script above:
* * * * * /bin/sh runsForever.sh |
So now we have a system that checks every minute to see if myScript.php is running. If it isn't running, it starts it.
Conclusion
You will notice the Twitter streaming script is just a more advanced version of the data processing. Both of the working versions have a lot more things going on in my live scripts but are beyond the scope of this article. If you are interested in extending them, you may want to look into logging as a first step. What I've learned from years of hands-on practice is that this setup can and does work. I've run php processes for many months on this configuration.