System Design: Combining HAProxy, nginx, Varnish and more into the big picture


This comes from a question posted on stack overflow: Ordering: 1. nginx 2. varnish 3. haproxy 4. webserver?

I’ve seen people recommend combining all of these in a flow, but they seem to have lots of overlapping features so I’d like to dig in to why you might want to pass through 3 different programs before hitting your actual web server.

My answer explains what are these applications for, how do they fit together in the big pictures and when do they shine. [Original answer on ServerFault]

Foreword

As of 2016. Things are evolving, all servers are getting better, they all support SSL and the web is more amazing than ever.

Unless stated, the following is targeted toward professionals in business and start-ups, supporting thousands to millions of users.

These tools and architectures require a lot of users/hardware/money. You can try this at a home lab or to run a blog but that doesn’t make much sense.

As a general rule, remember that you want to keep it simple. Every middleware appended is another critical piece of middleware to maintain. Perfection is not achieved when there is nothing to add but when there is nothing left to remove.

Some Common and Interesting Deployments

HAProxy (balancing) + nginx (php application + caching)

The webserver is nginx running php. When nginx is already there it might as well handle the caching and redirections.

HAProxy —> nginx-php
A —> nginx-php
P —> nginx-php
r —> nginx-php
o —> nginx-php
x —> nginx-php
y —> nginx-php


HAProxy (balancing) + Varnish (caching) + Tomcat (Java application)

HAProxy can redirect to Varnish based on the request URI (*.jpg *.css *.js).

HAProxy —> tomcat
A —> tomcat
—> tomcat
P —> tomcat tomcat varnish varnish nginx:443 -> webserver:8080
A —> nginx:443 -> webserver:8080
P —> nginx:443 -> webserver:8080
r —> nginx:443 -> webserver:8080
o —> nginx:443 -> webserver:8080
x —> nginx:443 -> webserver:8080
y —> nginx:443 -> webserver:8080

Middleware

HAProxy: THE load balancer

Main Features:

  • Load balancing (TCP, HTTP, HTTPS)
  • Multiple algorithms (round robin, source ip, headers)
  • Session persistence
  • SSL termination

Similar Alternatives: nginx (multi-purpose web-server configurable as a load balancer)

Different Alternatives: Cloud (Amazon ELB, Google load balancer), Hardware (F5, fortinet, citrix netscaler), Other&Worldwide (DNS, anycast, CloudFlare)

What does HAProxy do and when do you HAVE TO use it?

Whenever you need load balancing. HAProxy is the go to solution.

Except when you want very cheap OR quick & dirty OR you don’t have the skills available, then you may use an ELB 😀

Except when you’re in banking/government/similar requiring to use your own datacenter with hard requirements (dedicated infrastructure, dependable failover, 2 layers of firewall, auditing stuff, SLA to pay x% per minute of downtime, all in one) then you may put 2 F5 on top of the rack containing your 30 application servers.

Except when you want to go past 100k HTTP(S) [and multi-sites], then you MUST have multiples HAProxy with a layer of [global] load balancing in front of them (cloudflare, DNS, anycast). Theoretically, the global balancer could talk straight to the webservers allowing to ditch HAProxy. Usually however, you SHOULD keep HAProxy(s) as the public entry point(s) to your datacenter and tune advanced options to balance fairly across hosts and minimize variance.

Personal Opinion: A small, contained, open source project, entirely dedicated to ONE TRUE PURPOSE. Among the easiest configuration (ONE file), most useful and most reliable open source software I have came across in my life.

Nginx: Apache that doesn’t suck

Main Features:

  • WebServer HTTP or HTTPS
  • Run applications in CGI/PHP/some other
  • URL redirection/rewriting
  • Access control
  • HTTP Headers manipulation
  • Caching
  • Reverse Proxy

Similar Alternatives: Apache, Lighttpd, Tomcat, Gunicorn…

Apache was the de-facto web server, also known as a giant clusterfuck of dozens modules and thousands lines httpd.conf on top of a broken request processing architecture. nginx redo all of that, with less modules, (slightly) simpler configuration and a better core architecture.

What does nginx do and when do you HAVE TO use it?

A webserver is intended to run applications. When your application is developed to run on nginx, you already have nginx and you may as well use all its features.

Except when your application is not intended to run on nginx and nginx is nowhere to be found in your stack (Java shop anyone?) then there is little point in nginx. The webservers features are likely to exist in your current webserver and the other tasks are better handled by the appropriate dedicated tool (HAProxy/Varnish/CDN).

Except when your webserver/application is lacking features, hard to configure and/or you’d rather die job than look at it (Gunicorn anyone?), then you may put an nginx in front (i.e. locally on each node) to perform URL rewriting, send 301 redirections, enforce access control, provide SSL encryption, and edit HTTP headers on-the-fly. [These are the features expected from a webserver]

Varnish: THE caching server

Main Features:

  • Caching
  • Advanced Caching
  • Fine Grained Caching
  • Caching

Similar Alternatives: nginx (multi-purpose web-server configurable as a caching server)

Different Alternatives: CDN (Akamai, Amazon CloudFront, CloudFlare), Hardware (F5, Fortinet, Citrix NetScaler)

What does Varnish do and when do you HAVE TO use it?

It does caching, only caching. It’s usually not worth the effort and it’s a waste of time. Try CDN instead. Be aware that caching is the last thing you should care about when running a website.

Except when you’re running a website exclusively about pictures or videos then you should look into CDN thoroughly and think about caching seriously.

Except when you’re forced to use your own hardware in your own datacenter (CDN ain’t an option) and your webservers are terrible at delivering static files (adding more webservers ain’t helping) then Varnish is the last resort.

Except when you have a site with mostly-static-yet-complex-dynamically-generated-content (see the following paragraphs) then Varnish can save a lot of processing power on your webservers.

Static caching is overrated in 2016

Caching is almost configuration free, money free, and time free. Just subscribe to CloudFlare, or CloudFront or Akamai or MaxCDN. The time it takes me to write this line is longer that the time it takes to setup caching AND the beer I am holding in my hand is more expensive than the median CloudFlare subscription.

All these services work out of the box for static *.css *.js *.png and more. In fact, they mostly honour the Cache-Control directive in the HTTP header. The first step of caching is to configure your webservers to send proper cache directives. Doesn’t matter what CDN, what Varnish, what browser is in the middle.

Performance Considerations

Varnish was created at a time when the average web servers was choking to serve a cat picture on a blog. Nowadays a single instance of the average modern multi-threaded asynchronous buzzword-driven webserver can reliably deliver kittens to an entire country. Courtesy of sendfile().

I did some quick performance testing for the last project I worked on. A single tomcat instances could serve 21 000 to 33 000 static files per second over HTTP (testing files from 20B to 12kB with varying HTTP/client connections count). The sustained outbound traffic is beyond 2.4 Gb/s. Production will only have 1 Gb/s interfaces. Can’t do better than the hardware, no point in even trying Varnish.

Caching Complex Changing Dynamic Content

CDN and caching servers usually ignore URL with parameters like ?article=1843, they ignore any request with sessions cookies or authenticated users, and they ignore most MIME types including the application/json from /api/article/1843/info. There are configuration options available but usually not fine grained, rather “all or nothing”.

Varnish can have custom complex rules (see VCL) to define what is cachable and what is not. These rules can cache specific content by URI, headers and current user session cookie and MIME type and content ALL TOGETHER. That can save a lot of processing power on webservers for some very specific load pattern. That’s when Varnish is handy and AWESOME.

Conclusion

It took me a while to understand all these pieces, when to use them and how they fit together. Hope this can help you.

3 thoughts on “System Design: Combining HAProxy, nginx, Varnish and more into the big picture

  1. Great article, lot’s of interesting things. I built something similar with Apache2, Varnish, HaProxy and CloudFlare for the content part and to run the MariaDb very reliable I used Galera 10.1 and distributed all servers and using now 31 data centers. Took a lot of time, but it was worth all the energy.

    Yesterday I installed Zabbix to be able to monitor the HaProxy Load Balancers, MariaDb Galera Clusters and many more services. Really a nice tool, and probably much easier than Cacti.

    Like

  2. Nice article. It clarifies a lot of questions but one. How is sticky sessions implemented in all this. If I have a Java WAR file in Tomcat and needed to hit the same Tomcat VM again then which one of these 3 software will help me or how would the flow be?
    HAproxy -> NginX -> Tomcat

    Like

    • Session persistence is not needed. Applications should store their states in a database.

      If there is really no choice, HAProxy can do session persistence by source ip or by a cookie.
      nginx only had persistence in the paid edition last I checked. I don’t recommend.

      Like

Leave a Reply