Category Archives: Technical Thursday

Is your site patched against Heartbleed? (CVE-2014-0160)

I had the 'fun' experience of patching against this vulnerability today. Although, when I rebooted one of my primary servers, it failed to reboot and caused two hours of downtime. Sorry about that to anyone who couldn't access this site.

If you're wondering if you are vulnerable, check your site for Heartbleed vulnerability.

As far as actually patching, I only did it manually on some Ubuntu 12.04 systems. It was fairly simple. Just run

apt-get update && apt-get upgrade

That should take care of it. If you want to learn more, go to Heartbleed.com.

Comic courtesy XKCD:

Reverse Proxy and Cache Server with Nginx

This is one of the ways I improve performance here at Review Signal. I run an nginx reverse proxy and cache system in front of the apache server. Apache can be slow and doesn't have a built in caching system for a lot of the static content we serve. So I put Nginx in front to cache and serve all the content it can directly from memory. This improves the performance of my servers and users get their content faster. It also can help when there is a high load. If you want to see how it performs, I've included a screenshot from Blitz.io at the bottom showing how well Review Signal performs with 500 concurrent users.

The Nginx config (http, server):

http {
  proxy_redirect off;
  proxy_set_header Host $host;
  proxy_set_header X-Real-IP $remote_addr;
  proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
 
  # caching options
  proxy_cache_path /var/cache/nginx levels=1:2 keys_zone=my-cache:8m 
max_size=1000m inactive=600m;
  proxy_temp_path /var/cache/tmp;
 
server {
  listen 80;
  server_name subdomain.example.com;
  access_log on;
  error_log on;
 
  location /{
    proxy_pass http://localhost:3000/subdomain;
  }
}
 
server {
  listen 80;
  server_name example.com;
  access_log on;
  error_log on;
 
  location / { 
    proxy_pass http://localhost:3000/; 
    proxy_cache my-cache;
    proxy_cache_valid 200 302 60m;
    proxy_cache_valid 404 1m;
  }
 }
}

Please note that only the relevant parts to this article are included (http and server). You definitely need to add more options to the config before http and in your http section. I didn't include those parts because they can vary a huge amount. See the Nginx documentation  for more details and example configurations.

Configuration Walkthrough:

  proxy_redirect off;
  proxy_set_header Host $host;
  proxy_set_header X-Real-IP $remote_addr;
  proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

proxy_redirect off tells the server we aren't redirecting content. We actually do that later with proxy_pass. The headers we set allow you to see proper header information on the server you are proxying to. Without X-Real-IP/X-Forwarded-For the server will simply see your reverse proxy server's IP address.

  proxy_cache_path /var/cache/nginx levels=1:2 keys_zone=my-cache:8m 
max_size=1000m inactive=600m;
  proxy_temp_path /var/cache/tmp;

The first line tells nginx where to save cache data (path), the structure of the cache data (levels), names your cache and it's size (keys_zone), max cache size (max_size) and how long before cached data is expired (inactive). The second line tells nginx where to save temporary data which it uses in building the cache.

server {
  listen 80;
  server_name subdomain.example.com;
  access_log on;
  error_log on;
 
  location /{
    proxy_pass http://localhost:3000/subdomain;
  }
}

This server code creates a reverse proxy to localhost:3000. It doesn't do any caching, it simply forwards all requests between Nginx and the localhost:3000. It is listening to subdomain.example.com and any request to it (/) are passed to localhost:3000/subdomain.

server {
  listen 80;
  server_name example.com;
  access_log on;
  error_log on;
 
  location / { 
    proxy_pass http://localhost:3000/; 
    proxy_cache my-cache;
    proxy_cache_valid 200 302 60m;
    proxy_cache_valid 404 1m;
  }
 }

This server is a reverse proxy and cache. We're responding to any request to example.com. It's forwarding all requests to localhost:3000. It also is creates a cache called my-cache (notice this matches the proxy_cache_path keys_zone setting). proxy_cache_valid defines what HTTP codes can be cached and for how long. So in this example 200 (OK) and 302 (FOUND) are cached for 60 minutes. 404 (NOT FOUND) is cached for 1 minute.

Conclusion

Setting up a reverse proxy isn't too difficult. However, it can be complicated to get it working with your application on occasion. You can empty the cache manually by deleting all the contents in the cache folder. That often helps fix issues. Nginx is fairly smart and when you pass post data, it doesn't serve cached pages, but get/head data will be cached by default. This setup works great when you serve a lot of static content. I run it in front of almost everything here at Review Signal including our blog. It's easy enough to configure different caching levels for different parts of your application. And if you're ever in doubt, test the application directly, then just reverse proxy without caching, and then turn on caching.

As promised, here is what Review Signal's performance looks like when rushing with Blitz.io from 1-500 concurrent users with this nginx setup. It responds to around 300 concurrent requests without going over 100ms response time.

nginxblitzio

Long Running Processes in PHP

Here at Review Signal, I use a lot of PHP code and one of the challenges is getting PHP to run for long periods of time.

Here are two sample problems that I deal with at Review Signal that require PHP processes to run for long periods of time or indefinitely:

  1. Data processing – Every night Review Signal crunches millions of pieces of data to update our rankings.
  2. Twitter Streaming API data – this requires a constant connection to the Twitter API to receive messages as they are posted on Twitter

The Tools

One of the best things as of PHP 5 is CLI (Command Line Interface). PHP CLI allows you to run things directly from the command line and doesn't have a time limit built in. All the pains of set_time_limit() and playing with php.ini disappear.

If you're going to be working from the command line, you're probably going to need to learn a little bit of bash scripting.

Finally, we will use cron (crontab / cron jobs)

SSH vs Cron Jobs

I need to explain that when you run something from a ssh session it is different from when you setup a cronjob to run something for you. An SSH session can be a good place to test scripts and run one-time processes. While a cronjob is the right way to setup a script you want run regularly.

If I write this line into my SSH session

php myscript.php

It will execute myscript.php. However, my terminal will be locked up until it completes.

You can get around this by holding ctrl+z (pauses the execution) and then type 'bg' (backgrounds the process).

For longer running processes, this can be nice, but if you lose your SSH session, it will terminate execution.

You can get around this by using the nohup (no hangup) command.

nohup php myscript.php

nohup allows execution to continue even if you lose the session. So if you use nohup, and then background the process it will finish executing regardless of your SSH session's status.
All of this only matters if you are running things manually from the command line. If you are running scripts with some regularity and using cronjobs, then you do not need to worry about these issues. Since the server itself is executing them, the SSH terminal sessions don't matter.

Update: A few readers reminded me that you can add an ampersand (&) to the end a command to background it immediately. This avoids having to ctrl+z, bg.

nohup php myscript.php &

Sometimes, you make a mistake and run a process without nohup but want it to continue running even if your SSH session disconnects. I've run scripts late at night thinking they would be quick, only to find out they took a lot longer than expected and I had to go home. This trick allows you to run the script as a daemon, so it won't terminate upon SSH session ending.

  1. ctrl+z to stop (pause) the program and get back to the shell
  2. bg to run it in the background
  3. disown -h [job-spec] where [job-spec] is the job number (like %1 for the first running job; find about your number with the jobs command) so that the job isn't killed when the terminal closes

Credit to user Node on StackOverflow

Data Processing with PHP

Since I run this script regularly, I create a bash script which is executed by a cron job.

Example bash script which actually runs the PHP script:

#!/bin/sh
php /path/to/script.php

Example cron job:

0 0 23 * * /bin/sh /path/to/bashscript.sh

If you don't know where to put the code above, type 'crontab -e' to edit your cron table and save it. The 0 0 23 * * tells it run when the time is 0 seconds, 0 minutes, 23 hours on any day, any day of the week.

So now we have a basic script which will run every night at 11pm. It doesn't matter how long it will take to execute, it will simply start every night at that time and run until it's finished.

Twitter Streaming API

The second problem is more interesting because the PHP script needs to be running to collect data. I want it running all the time. So I have a php script (thank you to Phirehose library) which keeps an open connection to the Twitter API but I can't rely on it to always be running. The server may restart, the script may error out, or other problems could occur.

So my solution has been to create a bash script to make sure the process is running. And if it isn't running, run it.

#!/bin/sh
 
ps aux | grep '[m]yScript.php'
if [ $? -ne 0 ]
then
    php /path/to/myScript.php
fi

Line by line explanation:

#!/bin/sh

So we start with our path to the shell.

ps aux | grep '[m]yScript.php'

process list is piped (|) to grep which searches for '[m]yScript.php'. I use the [m] regular expression matching so it doesn't match itself. Grep will spawn a process with myScript.php in the command, so it will always find a result if you search without putting something in brackets.

if [ $? -ne 0]

This checks the last command's return value. So if nothing was returned by searching our process list for [m]yScript.php

then
    php /path/to/myScript.php
fi

These lines are executed if our php script isn't found running. It runs our php script. The conditional is then terminated with fi.

Now, we create a cron job that executes the script above:

* * * * * /bin/sh runsForever.sh

So now we have a system that checks every minute to see if myScript.php is running. If it isn't running, it starts it.

Conclusion

You will notice the Twitter streaming script is just a more advanced version of the data processing. Both of the working versions have a lot more things going on in my live scripts but are beyond the scope of this article. If you are interested in extending them, you may want to look into logging as a first step. What I've learned from years of hands-on practice is that this setup can and does work. I've run php processes for many months on this configuration.

Loading...

Interested in seeing which web hosting companies people love (and hate!)? Click here and find out how your web host stacks up.