For the past week, around 400 customers have been affected by outages, or degraded service on one of our clusters. This cluster is under what’s known as a Slowloris attack, where good connections are being mixed up with bad connections. It’s a slower form of a DDoS (Distributed Denial of Service) attack.
Symptoms of this are: 1. either site becomes completely in-accessible, some parts of the website become inaccessible (/wp-admin/)..
We’re working very hard to mitigate this attack with our upstream network provider, Rackspace. I’m really really sorry that we don’t have a very good solution in place yet. But, we should have it in place by the end of the day today. After going through tons of logs, tons of packet sniffing, and understanding exactly what’s going on, we think we have a solution that will prevent this from making your site inaccessible.
To give you an idea, we’re seeing all sorts of spikes.. like this
These aren’t bringing the servers down, just choking the outbound traffic. Which makes it look like the site site is down.
Co-incidentally, our friends up in Austin (WPEngine) have been having an opposite DDoS attack, where they’re getting flooded as well, we’ve been working behind the scenes exchanging ip addresses and symptoms to get a better handle on things.
Once we all get a better handle on things and service is reliable again, we’ll be visiting with Rackspace and their Network engineers to figure out ways to prevent and diagnose these attacks faster.
One thing we’ve already done, is ordered a load balancer from F5, we’ll be switching off the Cloud Load Balancers over to this device, as it’s much more capable than the Zeus load balancers that we were using.