Long Running Processes in PHP

Here at Review Signal, I use a lot of PHP code and one of the challenges is getting PHP to run for long periods of time.

Here are two sample problems that I deal with at Review Signal that require PHP processes to run for long periods of time or indefinitely:

  1. Data processing – Every night Review Signal crunches millions of pieces of data to update our rankings.
  2. Twitter Streaming API data – this requires a constant connection to the Twitter API to receive messages as they are posted on Twitter

The Tools

One of the best things as of PHP 5 is CLI (Command Line Interface). PHP CLI allows you to run things directly from the command line and doesn't have a time limit built in. All the pains of set_time_limit() and playing with php.ini disappear.

If you're going to be working from the command line, you're probably going to need to learn a little bit of bash scripting.

Finally, we will use cron (crontab / cron jobs)

SSH vs Cron Jobs

I need to explain that when you run something from a ssh session it is different from when you setup a cronjob to run something for you. An SSH session can be a good place to test scripts and run one-time processes. While a cronjob is the right way to setup a script you want run regularly.

If I write this line into my SSH session

php myscript.php

It will execute myscript.php. However, my terminal will be locked up until it completes.

You can get around this by holding ctrl+z (pauses the execution) and then type 'bg' (backgrounds the process).

For longer running processes, this can be nice, but if you lose your SSH session, it will terminate execution.

You can get around this by using the nohup (no hangup) command.

nohup php myscript.php

nohup allows execution to continue even if you lose the session. So if you use nohup, and then background the process it will finish executing regardless of your SSH session's status.
All of this only matters if you are running things manually from the command line. If you are running scripts with some regularity and using cronjobs, then you do not need to worry about these issues. Since the server itself is executing them, the SSH terminal sessions don't matter.

Update: A few readers reminded me that you can add an ampersand (&) to the end a command to background it immediately. This avoids having to ctrl+z, bg.

nohup php myscript.php &

Sometimes, you make a mistake and run a process without nohup but want it to continue running even if your SSH session disconnects. I've run scripts late at night thinking they would be quick, only to find out they took a lot longer than expected and I had to go home. This trick allows you to run the script as a daemon, so it won't terminate upon SSH session ending.

  1. ctrl+z to stop (pause) the program and get back to the shell
  2. bg to run it in the background
  3. disown -h [job-spec] where [job-spec] is the job number (like %1 for the first running job; find about your number with the jobs command) so that the job isn't killed when the terminal closes

Credit to user Node on StackOverflow

Data Processing with PHP

Since I run this script regularly, I create a bash script which is executed by a cron job.

Example bash script which actually runs the PHP script:

#!/bin/sh
php /path/to/script.php

Example cron job:

0 0 23 * * /bin/sh /path/to/bashscript.sh

If you don't know where to put the code above, type 'crontab -e' to edit your cron table and save it. The 0 0 23 * * tells it run when the time is 0 seconds, 0 minutes, 23 hours on any day, any day of the week.

So now we have a basic script which will run every night at 11pm. It doesn't matter how long it will take to execute, it will simply start every night at that time and run until it's finished.

Twitter Streaming API

The second problem is more interesting because the PHP script needs to be running to collect data. I want it running all the time. So I have a php script (thank you to Phirehose library) which keeps an open connection to the Twitter API but I can't rely on it to always be running. The server may restart, the script may error out, or other problems could occur.

So my solution has been to create a bash script to make sure the process is running. And if it isn't running, run it.

#!/bin/sh
 
ps aux | grep '[m]yScript.php'
if [ $? -ne 0 ]
then
    php /path/to/myScript.php
fi

Line by line explanation:

#!/bin/sh

So we start with our path to the shell.

ps aux | grep '[m]yScript.php'

process list is piped (|) to grep which searches for '[m]yScript.php'. I use the [m] regular expression matching so it doesn't match itself. Grep will spawn a process with myScript.php in the command, so it will always find a result if you search without putting something in brackets.

if [ $? -ne 0]

This checks the last command's return value. So if nothing was returned by searching our process list for [m]yScript.php

then
    php /path/to/myScript.php
fi

These lines are executed if our php script isn't found running. It runs our php script. The conditional is then terminated with fi.

Now, we create a cron job that executes the script above:

* * * * * /bin/sh runsForever.sh

So now we have a system that checks every minute to see if myScript.php is running. If it isn't running, it starts it.

Conclusion

You will notice the Twitter streaming script is just a more advanced version of the data processing. Both of the working versions have a lot more things going on in my live scripts but are beyond the scope of this article. If you are interested in extending them, you may want to look into logging as a first step. What I've learned from years of hands-on practice is that this setup can and does work. I've run php processes for many months on this configuration.

The following two tabs change content below.
avatar
Kevin Ohashi is the geek-in-charge at Review Signal. He is passionate about making data meaningful for consumers. Kevin is based in Washington, DC.





23 thoughts on “Long Running Processes in PHP

  1. avatarFabien

    Why do you create a bash to run a PHP script, then create a CRON to run that bash? Why not just CRON the PHP script?

    Reply
    1. avatarKevin Ohashi Post author

      It’s a lot easier to run the PHP script directly if you are only using that one line. However, I wrote it with a bash script because it builds into the next example (Twitter API). My actual bash scripts have a lot more going on but they felt out of scope for this article.

      So yes, if you are going to just execute a script, easier to call it directly. However, I was trying to give people a basis to start writing bash scripts to control their long running processes.

      I hope that makes sense.

      Reply
  2. avatarEvert

    If you’re going to start a process off the cli, and want to push it to the background.. rather than using nohup or &, you can also just run it in a tmux or screen session.

    That way you can go back later and still read the output.

    Reply
    1. avatarKevin Ohashi Post author

      This is absolutely true. However, you can actually read the output from nohup. nohup defaults output to nohup.out. So ‘tail -f nohup.out’ would let you see the output and you can view that anytime.

      Reply
      1. avatarSam

        Nohup is cool for simple jobs. On servers where I need to quickly run a few processes at once, tmux is the king – create a session, split your terminal, run the processes & exit. And you can quickly & easily reconnect to the tmux session and see output from all the processes at once.

        Of course, tmux / screen is a huge pain to use if you don’t use it all the time, so this is in my bookmarks bar: http://www.dayid.org/os/notes/tm.html

        Reply
  3. avatarAdam

    I just wanted to throw out how I handle long running processes like your twitter stream api job.

    I have a PHP job which watches for messages in an Amazon SQS queue. It runs on an ubuntu 12.04 server. I made it into an upstart job. Upstart can be told to restart the process if it dies.

    In /etc/init I created myJob.conf


    description "SQS Queue Browser"
    author "Me"

    start on startup
    stop on shutdown
    respawn

    exec /usr/bin/php /path/to/myJob.php > /var/log/myJob.log

    It works like a champ with the added benefit that I can easily start/restart/stop the process as needed by simply doing sudo service myJob restart.

    Reply
  4. avatarEttore

    The big problem of Long Running Process is, PHP was made for “die”. In other words, isn’t so good run php script for a long period of time.
    For example, garbage collection implementation in PHP is buggy, because it’s not designed that scripts run over a long time.

    Ok. You kill the process an then you can restart it, but I think isn’t a beautiful solution.
    I use a different solution. The main loop isn’t in PHP script but in batch file.

    For example:

    while true; do
    php script.php
    done

    With this method php execute only one transaction. You don’t have problem of memory leak or unexpected crash.

    What do you think about this?
    Thanks 😉

    Reply
    1. avatarKevin Ohashi Post author

      I guess it depends on the use case. I’ve run PHP scripts for half a year continuously on an micro instance at Amazon to test it out. It collected 20GB of data and the first thing that ran out was my RDS instance of space. I didn’t even have my restart script as described here in that case. So PHP can definitely run for a long time. You definitely need to be considerate of how you write a script, if it’s poorly written and leaking memory, you will run into problems eventually. Restarting isn’t a beautiful solution, but it’s one that may work (depending on situation).

      Your bash controller is a fine idea if you can make it work with whatever your goal is. Firing off events from bash and letting PHP die regularly can’t hurt. However, don’t fool yourself into thinking a php script can’t crash just because it’s run from a loop. Things can and sometimes do go wrong, if that php process hangs for some reason, bash I think will wait forever? It’s about finding a balance of systems to make sure it’s continuously running. Some of the more robust solutions suggested are probably your best bet longer term like supervisord.

      Reply
    2. avatarDerak

      Changes to garbage collection in php 5.3 fixed this issue.
      Before that, PHP did simple reference counting to decide when to remove an object. It leaked memory like a sieve.

      Reply
  5. avatarmtorres

    To connect to my SSH server, I always use a screen session. This allows me to attach to a previous bash session even if it’s from a previous SSH connection! And you can have many screen sessions as you want. Then you can attach to one or the other, it’s really great because you can let one process to run, close SSH connection and the next day SSH into the machine, re-attach to the screen session, and there you are, as if you weren’t disconnected at all.

    Reply
  6. avatarStelian Mocanita

    #1 Don’t rely on php for long running processes, the php engine is built to die (to be read, terminate an execution thread). You will eventually end up with a lot of memory issues.
    #2 If you need to do so, use supervisord. It does exactly that for you, as in it keeps your daemons runnning. It has an web api and all that, so go: supervisord.org

    Reply
  7. avatarkowach

    Run php scripts as shell scripts by adding this to beginning

    #!/usr/bin/php
    <?php
    echo "My shell script\n"

    make file executable
    chmod 744 myscript.php

    add cron:
    0 8 * * * /home/me/myscript.php

    or use "screen" for background running.

    Reply
  8. avatarDave

    at the end of your crontab line, it’s usually a good idea to add this…

    >> /var/log/yourapp/yourservice.`date +\%Y-\%m-\%d`.log

    those are backticks in case WP messes it up. this gives you output of your script to a log file, based on dates.

    Reply
  9. avatarBoy

    According to your guide on Twitter Streaming API, I used your crontab and it worked. But the process keeps adding up. I mean after a while when I run

    ps aux | grep ‘[m]yScript.php’

    I will see many of my script running. Here is the modified crontab I am using:

    #!/bin/sh
    ps aux | grep ‘[t]w_core.php’
    if [ $? -ne 0 ];then php /var/www/disqover/core/tw_core.php;fi

    Do you have any idea why this happen? Thank you!

    Reply
    1. avatarKevin Ohashi Post author

      I am really not sure why it would run multiple copies. Have you tried running the bash script multiple times from a fresh install to see if it’s really failing to match?

      if yes, then try ps aux | grep ‘[t]w_core.php’ and see if it’s actually matching it or not. it’s a short script, you should be able to figure out quickly if it’s behaving properly.

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Current day month ye@r *

Loading...

Interested in seeing which web hosting companies people love (and hate!)? Click here and find out how your web host stacks up.