Configuring timeouts in HAProxy


This comes from a question posted on stack overflow: By what criteria do you tune timeouts in HA Proxy config?

When configuring HA Proxy, how do you decide what values to assign to the timeouts? I’ve read a half dozen samples in various blogs, and everyone uses different timeouts and no one discusses why.

My originally answer was posted on ServerFault.

Foreword

I’ve been tuning HAProxy for a while and done a lot of performance testing on it. From 100 HTTP requests/s to 50 000 HTTP requests/s.

The first advice is to enable the statistics page on HAProxy. You NEED monitoring, no exception. You will also need fine tuning if you intend to go past 10 000 requests/s.

Timeouts are a confusing beast because they have a huge range of possible values, most of them having no observable difference. I have yet to see something fails because of a number 5% lower or 5% higher. 10000 vs 11000 milliseconds, who cares? Probably not your system.

Configuration

I cannot in good conscience give a couple of numbers as ‘best timeouts ever for everyone’.

What I can tell instead is the MOST aggressive timeouts which are always acceptable for HTTP(S) load balancing. If you encounter lower than these, it’s time to reconfigure your load balancer.

timeout connect 5000
timeout check 5000
timeout client 30000
timeout server 30000

timeout client:

The inactivity timeout applies when the client is expected to acknowledge or
send data. In HTTP mode, this timeout is particularly important to consider
during the first phase, when the client sends the request, and during the
response while it is reading data sent by the server.

Read: This is the maximum time to receive HTTP request headers from the client.

3G/4G/56k/satellite can be slow at times. Still, they should be able to send HTTP headers in a few seconds, NOT 30.

If someone has a connection so bad that it needs more than 30s to request a page (then more than 10*30s to request the 10 embedded images/CSS/JS), I believe it is acceptable to reject him.

timeout server:

The inactivity timeout applies when the server is expected to acknowledge or
send data. In HTTP mode, this timeout is particularly important to consider
during the first phase of the server’s response, when it has to send the
headers, as it directly represents the server’s processing time for the
request. To find out what value to put there, it’s often good to start with
what would be considered as unacceptable response times, then check the logs
to observe the response time distribution, and adjust the value accordingly.

Read: This is the maximum time to receive HTTP response headers from the server (after it received the full client request). Basically, this is the processing time from your servers, before it starts sending the response.

If your server is so slow that it requires more than 30s to start giving an answer, then I believe it is acceptable to consider it dead.

Special Case: Some RARE services doing very heavy processing might take a full minute or more to give an answer. This timeout may need to be increased a lot for this specific usage. (Note: This is likely to be a case of bad design, use an async style communication or don’t use HTTP at all.)

timeout connect

Set the maximum time to wait for a connection attempt to a server to succeed.

Read: The maximum time a server has to accept a TCP connection.

Servers are in the same LAN as HAProxy so it should be fast. Give it at least 5 seconds because that’s how long it may take when anything unexpected happens (a lost TCP packet to retransmit, a server forking a new process to take the new requests, spike in traffic).

Special Case: When servers are in a different LAN or over an unreliable link. This timeout may need to be increased a lot. (Note: This is likely to be a case of bad architecture.)

timeout check

Set additional check timeout, but only after a connection has been already
established.

Set additional check timeout, but only after a connection has been already
If set, haproxy uses min(“timeout connect”, “inter”) as a connect timeout
for check and “timeout check” as an additional read timeout. The “min” is
used so that people running with very long “timeout connect” (eg. those
who needed this due to the queue or tarpit) do not slow down their checks.
(Please also note that there is no valid reason to have such long connect
timeouts, because “timeout queue” and “timeout tarpit” can always be used
to avoid that).

Read: When performing a healthcheck, the server has timeout connect to accept the connection then timeout check to give the response.

All servers MUST have a HTTP(S) health check configured. That’s the only way for the load balancer to know whether a server is available. The healthcheck is a simple /isalive page always answering OK.

Give this timeout at least 5 seconds because that’s how long it may take when anything unexpected happens (a lost TCP packet to retransmit, a server forking a new process to take the new requests, spike in traffic).

War Story: A lot of people wrongly believe that the server can always answer this simple page in 3 ms. They set an aggressive timeout (< 2000ms) with aggressive failover (2 failed checks = server dead). I have seen entire websites going down because of that. Typically there is a slight spike in traffic, backend servers get slower, the healthchecks are delayed… until suddenly they all timeout together, HAProxy thinks ALL servers died at once and the entire site goes down.

Conclusion

Hope you understand timeouts better now.

Lessons Learned #0: The HAProxy statistics page is your best friend for monitoring connections, timeouts and everything.

Lessons Learned #1: Timeouts aren’t that important.

Lessons Learned #2: Be gentle on timeout configuration (especially timeout check and timeout connect). There has never been any issue because of “slightly too long timeout” but there are regular cases of “too short timeout” that put entire websites down.

Advertisements

Post a Comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s