Monthly Archives: June 2014

40 Million hits a day on WordPress using a $10 VPS

I recently tested many of the biggest names in managed WordPress hosting in my article Managed WordPress Hosting Performance Benchmarks. (Update: 2016 WordPress Hosting Performance Benchmarks) I am preparing to do a second round of testing with double the number of companies on board. Some of us like to setup servers ourselves (or are cheap).

Given a reasonable VPS, what sort of performance can we get out of it?

10 million hits as measured by Blitz.io was the benchmark to beat based on a previous iteration of this question.

I decided to test this from the ground up, let's start with the most basic configuration and gradually try to improve it.

All tests were performed on a $10/Month 1GB Ram Digital Ocean VPS running Ubuntu 14.04x64. All code and documentation are also available on GitHub.

LAMP Stack

Based on my previous experience benchmarking WordPress, I didn't have high hopes for this test. Last time I crashed MySql almost instantly. This time I ran Blitz a lot slower, from 1-50 users. The performance wasn't impressive, it started slowing down almost immediately and continued to get worse. No surprises.

default-lamp

The LAMP stack setup script is available on GitHub. Download full Blitz results from LAMP Stack (PDF).

LAMP + PHP5-FPM

The next thing I tried was PHP-FPM(FastCGI Process Manager). It got slightly better performance with just under 200ms faster response times at 50 users. But the graph looks pretty similar, we're seeing quickly increasing response times as the number of users goes up. Not a great improvement.

lamp-php-fpm

The LAMP+ PHP5-FPM setup script is available on GitHub. Download full Blitz results from LAMP+PHP5-FPM (PDF).

Nginx + PHP-FPM (aka LEMP Stack)

Maybe the problem is Apache? I tried Nginx next. What happened? I got a worse performance than the default LAMP stack (wtf?). Everyone said Nginx was faster. Turns out, it's not magically faster than Apache (and appears worse out of the box).

lemp-php-fpm

The LEMP + PHP-FPM setup script is available on GitHub. Download full Blitz results from LEMP+PHP-FPM (PDF).

Microcaching

I've written about creating a reverse proxy and cache in nginx before. But I've already setup Nginx as my web server, I don't need to reverse proxy this time. Nginx has fastcgi_cache which allows us to cache results from fastcgi processes (PHP). So I applied the same technique here and the results were staggering. The response time dropped to 20ms (+/- 2ms) and it scaled from 1 to 1000 concurrent users.

"This rush generated 28,924 successful hits in 60 seconds and we transferred 218.86 MB of data in and out of your app. The average hit rate of 482/second translates to about 41,650,560 hits/day."

All that with only 2 errors (connection timeouts).

lemp-microcache

The LEMP + PHP-FPM + microcaching setup script is available on GitHub. Download full Blitz results from LEMP+PHP-FPM + microcaching (PDF).

Mircocaching Config Walkthrough

We do the standard

apt-get update
apt-get -y install nginx
sudo apt-get -y install mysql-server mysql-client
apt-get install -y php5-mysql php5-fpm php5-gd php5-cli

This gets us Nginx, MySql and PHP-FPM.

Next we need to tweak some PHP-FPM settings. I am using some one liners to edit /etc/php5/fpm/php.ini and /etc/php5/fpm/pool.d/www.conf to uncomment and change some settings [turning cgi.fix_pathinfo=0 and uncommenting the listen.(owner|group|mode) settings].

sed -i "s/^;cgi.fix_pathinfo=1/cgi.fix_pathinfo=0/" /etc/php5/fpm/php.ini
sed -i "s/^;listen.owner = www-data/listen.owner = www-data/" /etc/php5/fpm/pool.d/www.conf
sed -i "s/^;listen.group = www-data/listen.group = www-data/" /etc/php5/fpm/pool.d/www.conf
sed -i "s/^;listen.mode = 0660/listen.mode = 0660/" /etc/php5/fpm/pool.d/www.conf

Now make sure we create a folder for our cache

mkdir /usr/share/nginx/cache

Which will we need in our Nginx configs. In our /etc/nginx/sites-available/default config we add this into our server {} settings. We also make sure to add index.php to our index command and set our server_name to a domain or IP.

location ~ \.php$ {
		try_files $uri =404;
		fastcgi_split_path_info ^(.+\.php)(/.+)$;
		fastcgi_cache  microcache;
		fastcgi_cache_key $scheme$host$request_uri$request_method;
		fastcgi_cache_valid 200 301 302 30s;
		fastcgi_cache_use_stale updating error timeout invalid_header http_500;
		fastcgi_pass_header Set-Cookie;
		fastcgi_pass_header Cookie;
		fastcgi_ignore_headers Cache-Control Expires Set-Cookie;
		fastcgi_pass unix:/var/run/php5-fpm.sock;
		fastcgi_index index.php;
		include fastcgi_params;
}

Then we move on to our /etc/nginx/nginx.conf and make a few changes. Like increasing our worker_connections. We also add this line in our http{} before including our other configs:

fastcgi_cache_path /usr/share/nginx/cache/fcgi levels=1:2 keys_zone=microcache:10m max_size=1024m inactive=1h;

This creates our fastcgi_cache.

All of these are done in somewhat ugly one-liners in the script (if someone has a cleaner way of doing this, please share!), I've cleaned them up and provided the full files for comparison.

Go Big or Go Home

Since Nginx didn't seem to blink when I hit it with 1000 users, I wondered how high it would really go. So I tried from 1-3000 users and guess what?

"This rush generated 95,116 successful hits in 60 seconds and we transferred 808.68 MB of data in and out of your app. The average hit rate of 1,585/second translates to about 136,967,040 hits/day."

The problem was I started getting errors: "4.74% of the users during this rush experienced timeouts or errors!" But it amazingly peaked at an astonishing 2,642 users per second. I watched my processes while the test was running and saw all 4 nginx workers fully maxing out the CPU (25% each) while the test was running. I think I hit the limit a 1GB, 1 Core VPS can handle. This setup was a champ though, I'm not sure what caused the big spike (perhaps a cache refresh), but if you wanted to roll your own WordPress VPS and serve a lot of static content, this template should be a pretty good starting point.
lemp-microcache-3000

Download full results of 3000 users blitz test (PDF)

Conclusion

There are definitely a lot of improvements that can be made on this config. It doesn't optimize anything that doesn't hit the cache (which will be any dynamic content, most often logged in users). It doesn't talk about security at all. It doesn't do a lot of things. If you aren't comfortable editing php, nginx and other linux configs/settings and are running an important website, you probably should go with a managed wordpress company. If you really need performance and can't manage it yourself, you need to look at our Managed WordPress Hosting Performance Benchmarks. If you just want a good web hosting company, take a look at our web hosting reviews and comparison table.

All code and documentation is available on GitHub

Thanks and Credits:

The title was inspired by Ewan Leith's post 10 Million hits a day on WordPress using a $15 server. Ewan built a server that handled 250 users/second without issue using Varnish, Nginx, PHP-APC, and W3 Total Cache.

A special thanks goes to A Small Orange who have let me test up multiple iterations of their LEMP stack and especially Ryan MacDonald at ASO who spent a lot of time talking WordPress performance with me.

Bias, Negativity, Sentiment and Review Signal

Photo Credit: _Abhi_

People are more likely to express negative sentiments or give negative reviews than they are positive ones.

I hear this in almost every discussion about Review Signal and how it works. There is certainly lots of studies to back this up. One major study concluded that bad is a stronger than good. One company found people were 26% more likely to share bad experiences. There is plenty of research in the area of Negativity Bias for the curious readers.

Doesn't that create problems for review sites?

The general response I have to this question is no. It doesn't matter if there is a negativity bias when comparing between companies because it's a relative comparison. No company, at least not at the start, has an unfair advantage in terms of what their customers will say about them.

Negativity bias may kick in later when customers have had bad experiences and want to continually share that information with everyone and anyone despite changes in the company. Negative inertia or the stickiness of negative opinion is a real thing. Overcoming that is something that Review Signal doesn't have any mechanism to deal with beyond simply counting every person's opinion once. This controls it on an individual level, but not on a systemic level if a company has really strong negative brand associations.

What if a company experiences a disaster, e.g. a major outage, does that make it hard to recover in the ratings?

This was a nuanced question that I hadn't heard before and credit goes to Reddit user PlaviVal for asking.

Luckily, major outages are a rare event. They are fascinating to observe from a data perspective. The most recent and largest outage was the EIG (BlueHost, HostGator, JustHost, HostMonster) outage in August 2013. If we look at the actual impact of the event, I have a chart available here.

When I looked at the EIG hosts' post-outage, there really hasn't been a marked improvement in their ratings. Review Signal's company profiles have Trends tabs on every company which graph on a per month basis to see how a company is done over the past 12 months.

BlueHost-May2014 HostGator-May2014

There is definitely some variance, but poor ratings post-outage seem quite common. It's hard to make an argument that these companies have recovered to their previous status and are simply being held back by major outcries that occurred during the outage.

The only other company with a major outage I can track in the data is GoDaddy. GoDaddy have had numerous negative events in their timeline since we started tracking them. There has been the elephant killing scandal, SOPA, DNS outages and multiple super bowl events.

godaddy_chart

August 2012 - July 2013

Godaddy-May2014

June 2013 - May 2014

There are clear dips for events such as the September 2012 DNS Outage, the Superbowl in February. Their overall rating is 46% right now and the trend is slightly up. But they seem to hang around 45-50% historically and maintain that despite the dips from bad events. There is arguably some room to for them be rated higher depending on the time frame you think is fair, but we're talking a couple percent at most.

What about outages affecting multiple companies? eg. Resellers, infrastructure providers, like Amazon, who others are hosting on top of. Are all the companies affected equally?

No. Just because there is an outage with a big provider that services multiple providers doesn't mean that all the providers will be treated identically. The customer reaction may be heavily influenced by the behavior of the provider they are actually using.

Let's say there is an outage in Data Center X(DC X). It hosts Host A and Host B. DC X has an outage lasting 4 hours. Host A tells customers 'sorry, it's all DC X's fault' and Host B tells customers 'We're sorry, our DC X provider is having issues, to make up for the downtime your entire month's bill is free because we didn't meet our 99.99% uptime guarantee.' Just because Host A and Host B had identical technical issues, I imagine the responses from customers would be different. I've definitely experienced great customer service which changed my opinion of a company dramatically on how they handled a shitty situation. I think the same applies here.

Customer opinions are definitely shaped by internal and external factors. The ranking system here at Review Signal definitely isn't perfect and has room for improvement. That said, right now, our rankings don't seem to be showing any huge signs of weakness in the algorithms despite the potential for issues like the ones talked about here to arise.

Going forward, the biggest challenge is going to be creating a decay function. How much is a review today worth versus a review in the past? At some point, a review of a certain age just isn't as good as a recent review. At some point, this is a problem I'm going to have to address and figure out. But now, it's on the radar but it doesn't seem like a major issue yet.

Introducing Windows Azure

I am happy to announce we've added Windows Azure hosting platform to Review Signal today. Azure is definitely a big player in the cloud server market. It was also one a very notable absence in our listings. Now that we've added Azure and Amazon AWS in the past few months, our cloud listings for IaaS providers looks a lot more complete.

Windows Azure comes in with a 70% overall rating which is quite respectable and puts it right next to RackSpace on the rankings. Although the support scores seem to be a lot lower at an underwhelming 56%.

Want to see the full cloud provider rankings? Visit our complete rankings and click on the Cloud tab.

Loading...

Interested in seeing which web hosting companies people love (and hate!)? Click here and find out how your web host stacks up.