Every now and then a bunch of really slow HTTP clients decide to suck down pages off my web site. This is bad because when enough of them do this, it dramatically lowers the number of free Apache processes available to handle requests from the rest of the world. I don't know if it's some lame DDoS attack or just really slow clients.
In years gone by, I know that lingerd was a solution to this problem. But there doesn’t appear to be much activity around it these days. In fact, the lack of a lingerd package in Debian (there is an old unofficial packagae) suggests that there are better methods.
I've been using mod_limitipconn to partly deal with the problem, but I need to keep that number high enough that it doesn't penalize normal browsers. That makes it a sort of half-assed solution.
It occurs to me that I could put Squid in front of Apache, but that seems a little heavyweight. Or maybe my impressions of Squid are skewed.
Anyway, I'm looking for ideas or pointers to the obvious thing I've missed.
Ideas?
Posted by jzawodn at February 04, 2007 07:35 AM
Of course I'd recommend Perlbal, but I'm biased.
It can optionally buffer a response in memory (up until a certain size), slowly spoon-feeding it to clients, keeping backend Apaches free. Along with dozens of other neat tricks.
Ah, I completely forgot about perlbal.
Hmm. Given that it's Perl, how can I go wrong? :-)
I just purposely typed a nonsense URL to get a 404 to confirm something:
Apache/1.3.34 Server at jeremy.zawodny.com Port 80
Have you thought about updating to Apache 2.0, er, I mean, 2.2? One of the huge changes from 1.3 to 2.0 was better threading. There's probably a reason why lingerd is being abandoned....
I'll upgrade Apache if it really solves problems I have, but I'm not a fan up upgrading "just because."
So far this is the first indication of anything Apache 2.x has to offer that might be interesting to me. Does the threading play well with mod_php?
Apache2 with a threaded MPM does not play well with mod_php. If you check the Debian dox for mod_php you'll see that they try to build it thread safe, and gave up - I think that says something right there.
What I do: run Apache2 with a threaded MPM, and hook up php via mod_fastcgi
Apache 2 has different MPMs - or ways of handling multiple requests. The Apache 1.3 one is called 'prefork', and works fine with mod_php - however, you won't get much benefit, if any.
The solution is to setup PHP to run in a separate process with FastCGI, and have Apache run multithreaded ('worker' MPM).
You could have a look at varnish, http://varnish.projects.linpro.no/
Second to perlbal; Brad may be biased, but he's still right -- it's an amazing piece of software. I've heard similar good things about nginx (http://nginx.net/), but have no personal experience with it.
We usually use Pound as a reverse proxy. Really stable, we have a few hundred K visitors a day and no problems whatsoever.
Prefer it to perlbal.
Jordan, using Apache2 that way just turns it into a proxy for dynamic requests which is exactly what the Squid/Pound approach does, and I'd argue Squid or Pound were designed to do exactly that and they do a better job of it.
Varnish looks good.
About Apache 2 and PHP, be careful : it depends on the libraries. For example, the MapServer PHP library isn't thread safe.
http://brunovernay.wordpress.com/2006/04/11/connection-pooling-with-php-a-growing-problem/
I'd recommend looking at alternatives to Apache. Look at lighttpd with FastCGI. I don't know what platform your blog uses, but WordPress was _much_ faster and more scalable using lighttpd. Drupal absolutely rocks on it.
I've had limited success with Apache 2 + mod_php threaded. It worked for some things, but even compiled in thread_safe mode and no php modules with threading problem, it still crashed on me sometimes. I switched back to pre-fork and surprisingly got better performance and stability. The application has nearly 950 XML-RPC calls hitting at the exact time every minute, which is why I considered the threaded version to start with. Even mod_php with XML and mysql installed, it still wasn't happy with threaded mode.
Donald
Bruno, that's why you need to use FastCGI interface to separate PHP and mysql, to avoid thread-safe problem.
First of all, you can look at Yahoo's Radwin's presentation, about how Apache is used inside Yahoo. The secret is - allow Apache to go away and do unclean shutdown on socket (SO_LINGER option), and of course, increase write buffer size to up to 256k or a meg. Then it will all live inside kernel buffer and not hold Apache child.
There's always lighttpd with fastcgi too.. ;-)
I'd second nginx. We use it and it's great.
http://blog.fastmail.fm/?p=592
The author (Igor Sysoev) is also very responsive to feedback and suggestions and there's regular new releases.
Squid is so passe as a reverse proxy. nginx can handle lots of slow clients with aplomb.
Also, keep an eye out for lighttpd 1.5 . Lots of neat features including async-stat
Definitely use a reverse proxy. At delicious we used to use Pound, and then migrated to Perlbal plus some magic for throttling.
An Apache with mod_rewrite and the mod_proxy modules and not much else is pretty good for a "buffering proxy" if you want to stick to something familiar (for small things I often do). If you want something fancier, then I'd second Brad's recommendation of perlbal.
If you want to do caching in the "buffer layer", then hold your breath for a little while and then look at Varnish. (It's close, but haven't been a good fit for any of my applications yet).
- ask
A few years ago I used Squid from an opposite approach and was impressed with how much it helped. We had a few of our schools hooked up to a 56k ISDN line, much to slow for a whole building but we managed. I saw the worse hits when we had a whole class in the lab at the same time (generally going to the same pages). So I put a Linux box in each building and put Squid on it.
It helped so much. All the default start pages were then cached. If a class all went to the same place it was extremely fast. Even if they all pulled down the same video it was ultra fast. The we crappy little Pentium 90-150 boxes and they increase our performance so much I couldn't believe it! I'm thinking I just left the cache refresh set at 10 or 15 minutes and it was great. I would have preferred shorter, but news and weather I wanted to be more up to date.
For me this took the load of the connection to the outside world and let me use my 10MB network inside the building (it might even have been token ring, what was that? 2MB?).
Although for your case, it still has to serve up all the slow processes too and still push them outside the network, so I'm not sure how that works for you.
Setup two HTTP daemons.
One should be mpm_prefork for your php and other legacy stuff.
The other should be mpm_worker configured for 5-10k concurrent connections and serving static binaries. It uses threads so if you're on decent hardware with NPTL on a libc6 > 2.3.4 (which is common now adays) you can handle this just fine.
You can also do this on very little memory.
Squid is OK but it might have problems scaling that high.
While pound and perlbal are good alternatives youc ould use mod_proxy_balancer in Apache 2.2 to handle your load balancing. This is what we do....
Kevin
I'll go for those single threaded web server (lighttpd or nginx + PHP on FastCGI) as well, if you don't have anything that depend on Apache. They completely eliminated the slow client issue, as extra connection does not fork out another process. Moreover you can tune the HTTP 1.1 keep alive all the way up (1-2 minutes for example) to help out those slow clients.
Pound is a fine reverse proxy, but I believe it doesn't buffer enough response data to free Apache / PHP processes early:
http://www.apsis.ch/pound/pound_list/archive/2004/2004-10/1098534864000/index_html?fullMode=1
Perlbal does this explicitly as Brad noted. It's also fast, flexible, and has the nice feature of allowing on-the-fly reconfiguration.
nginx rules for that. We use it on a multimillion-users web mail here and it rocks.
Slashdot's still using lingerd (which passes data to pound). What's wrong with lingerd? It still works fine.
Squid is to big a kind'a Titanic for that kind of problem. I'm using an Oops caching proxy for that. And, of course, all the static content is moved on thttpd with some throttling options.
Just about any asynchronous/event-driven server -- whether plain old server, reverse proxy, whatever -- will do the trick; the real problem is that Apache's process model is fundamentally prone to this problem. It's trivial to DOS almost any Apache server (which is why many big sites put things like NetCaches in front of their servers).
Things to look for include a) stability b) security c) kernel integration a la epoll or kqueue.
There are lots of options, but because of (a) and (b) I'd personally avoid a lot of the newer ones (e.g., nginx, varnish, lighttpd) for the time being; while they're fine and very promising products, they're still under rapid evolution and haven't faced the fire that things like Squid and other old-timers have seen.
I know that's not cool with the hip Web 2.0 kids, but whatever...
See also: http://www.mnot.net/blog/2006/08/21/caching_performance
(Jeremy - ping me internally for more details if you like)
taking a step back, I'd be curious as to what the browser types are. Are they simply robots indexing your site? if so there are ways to direct traffic based on the client information in most (if not all load-balancers) and I have to imagine there are ways to throttle the requests within apache.
regards.
JohnH
Count me as another vote for a minimal mod_rewrite/mod_proxy apache server. This has worked very well for me in the past on single server setups.
Jeremy,
Out of interest, how do you know you have slow HTTP clients hitting your site?
Al.
If you are still using 1.3 Lingerd works just in (and that is what I am doing in a couple of cases).
Somewhere in my bad of tricks is a copy of Pound that I added caching too, you might want to see if anyone has made their copy available for others to use.
While it might be overkill for you, I'll throw in my two cents that Squid is quite fine with scaling, and that it's not passe.
While I'd like to see Varnish have a full feature set, I don't actually know any other caching reverse proxy that works well enough for me to use at work.
-allspaw
p.s. we use it at Flickr, which has some images to serve.