Varnishing all my troubles away
Troubleshooting varnish backends; fixing SELinux issues

TL;DR

  • Varnish routes incoming web traffic to a port on backend server
  • Page shows: “Error 503 Backend fetch failed”
  • Problem was SELinux setup
  • Actually investigate/fix SELinux issue rather than turning it off

Varnish?

For our research work at UCL, we host a bunch of different web sites, web services and applications that run on a bunch of different ports on a bunch of different backend machines (and virtual machines). All requests arrive on a single IP, and we use varnish to sit on the frontline (port 80) and make sense of the incoming traffic.

Varnish is a web accelerator – it sits in front of whatever is actually generating the content for your web pages and caches whatever content it deems safe to cache. The next time someone requests that same page, the content is served from the cache (fast) rather than going off and generating content from the backend (slow). So it speeds up your web pages and generally reduces load on your backend databases and applications.

This is all great, but varnish also provides a really simple and flexible tool for routing traffic to different backends, which is the actually the point of this post.

What’s the problem?

I eventually managed to get round to moving our frontline varnish server from a decaying machine running CentOS 4(!) to a brand new VM running CentOS 7. This allowed varnish to be upgraded from v2.0 to version 4.1 which required a few minor adjustments, but nothing too crazy.

I did get stuck with one app that wasn’t working – the following varnish config was meant to direct traffic through to a backend application listening to port 5001 on a backend server.

vcl 4.0;
backend myapp_server_5001 {
  .host = "123.456.789.123";
  .port = "5001";
}

sub vcl_recv {
  if ( req.http.host == "myapp.domain.com" ) {
    set req.backend_hint = myapp_server_5001;
    return (pass);
  }
}

Checking the web page “myapp.domain.com” just gave me the standard Varnish error:

Error 503 Backend fetch failed

It looked like the varnish couldn’t contact the backend server, however I could sit on the same server that varnish was running from and access the web page just fine.

$ ssh varnishserver
$ wget http://myapp_server:5001/
Connecting to 123.456.678.123:5001... connected.
HTTP request sent, awaiting response... 200 OK
Length: 5294 (5.2K) [text/html]
Saving to: ‘index.html’

So…

  • The application was running on the backend server
  • I could retrieve the content from the varnish server directly (wget)
  • I couldn’t retrieve this content through varnish
  • lots of other varnish rerouting was working

GIYF

Googling around suggested that the problem might be security settings in SELinux. Which took me to a nice blog post about how to get varnish to play nicely with SELinux.

In my experience, SELinux generates an incredibly strong SEP field: my general practice has been to turn SELinux into permissive mode and rely on our main firewall (SEP) to deal with security issues. This isn’t as terrible as it sounds (our IT team were okay with it), but it’s not great.

With this being a genuinely front-facing server, I figured I should actually do the right thing and learn how to get SELinux working properly. Turns out it really wasn’t that hard.

Is my problem related to SELinux?

Easiest way to find out:

$ ssh varnishserver
$ sudo grep varnish /var/log/audit/audit.log

This showed a bunch of output like:

type=AVC msg=audit(1478175339.950:37802): avc: denied { name_connect } for pid=9111 comm="varnishd" dest=5001 scontext=system_u:system_r:varnishd_t:s0 tcontext=system_u:object_r:commplex_link_port_t:s0 tclass=tcp_socket

So, yes – my problem does seem to be related to SELinux.

How do I fix my SELinux problem (without just turning the whole thing off)?

Turns out the clever people on the interwebz have written a tool audit2allow to help troubleshoot this kind of thing. This can be installed through the setroubleshoot package (which kind of makes sense).

$ sudo yum install setroubleshoot

This tool can be used to translate the output of the audit log to a more useful message:

$ sudo grep varnishd /var/log/audit/audit.log | audit2allow -w -a

Which provides messages like:

type=AVC msg=audit(1478177584.127:38275): avc: denied { name_connect } for pid=9118 comm="varnishd" dest=5001 scontext=system_u:system_r:varnishd_t:s0 tcontext=system_u:object_r:commplex_link_port_t:s0 tclass=tcp_socket
 Was caused by:
 The boolean varnishd_connect_any was set incorrectly. 
 Description:
 Allow varnishd to connect any

Allow access by executing:
 # setsebool -P varnishd_connect_any 1

Now of course I read up on exactly what this command will do before executing it (no, really).

$ setsebool -P varnishd_connect_any 1
$ systemctl restart varnish

Sorted.

Now I just need to add all this to the puppet configuration…

Author: Ian Sillitoe

I am responsible for the technical aspect of CATH. This generally involves maintaining and developing both the front-end interfaces (internal and external web pages and webservices) and back-end code library and databases.

Leave a Reply