haproxy tricks

Friday, 2 Sep 2022 Tags: haproxy http

haproxy is one of my favourite *nix tools. I use it in all sorts of odd places, to shuffle HTTP traffic from place to place, splicing TCP in all sorts of ways, debugging network protocols, and obviously, being a high performance, high availability proxy daemon.

Today’s post is to list some adventures with haproxy’s new JSON and lua functionality. You’ll need haproxy 2.8+, which is a great LTS release already, open source flavour.

peers

haproxy has the ability to form clusters, and share some state between nodes. This is somewhat confusingly documented, so here’s a copy-pasta section to get started.

Note that this works just fine with a single node, and makes using stick tables easier, allowing reload without losing table state.

The IP addresses need to be private ones, and ports obviously open. This should be secured with TLS, but I’ll deal with that later.

# a new stanza by itself
peers cluster
    # the *name* here should either be the FQDN of the host, or you
    # can start haproxy with `-L <name>` to force the name. In this
    # example, we are usting `-L local` for our main node.
    peer local 10.0.0.1:445
    peer other 10.0.0.2:445
    # initialise cluster stick tables to track abusive IPs and accounts
    table abuse_ip type ipv6   size 1m  expire 15s  store http_req_rate(15s),bytes_in_rate(15s)
    table abuse_id type string size 1m  expire 15s  store http_req_rate(15s),bytes_in_rate(15s)

That’s it. restart both nodes, and once we start using our abuse tables, these will automagically keep each node up to date. The precise mechanics aren’t clear to me yet, but there’s no fancy sharing of counters and blocks here. I have assumed that the nodes keep each other’s maximum values in a LRW last-write-wins setup.

stats and logs

While this isn’t essential, it’s very useful to see what error codes are returned, to query haproxy’s runtime state directly, and have the basic haproxy web ui to dig around in. Let’s enable all that.

# re-use the existing global section
global
    # use UNIX syslog FTW
    log-tag haproxy
    log 127.0.0.1:514 len 65535 format rfc5424 daemon
    # enable runtime API via unix socket
    stats socket /var/run/haproxy.sock level admin

# make the usual stats page available
frontend stats
    mode http
    bind 127.0.0.1:444
    http-request use-service prometheus-exporter if { path /metrics }
    stats enable
    stats uri /stats
    stats refresh 7s

a basic front end and fake backend

Yes, you read that right. A fake backend. We’ll use this during testing, it makes life a lot easier to write unit tests for your haproxy services this way.

frontend awesome_fe
    # receives traffic from clients
    # plaintext (no encryption yet on IPv6 and IPv6)
    bind            :::8000     v4v6
    acl             is_api      path_beg    /api/
    use_backend     api_be      if is_api   METH_POST
    default_backend static_be

backend static_be
.if defined(CI)
    http-request    return status 200 content-type "text/html"
.else
    server web_01   10.0.1.1:80 check observe layer7
    server web_02   10.0.1.2:80 check observe layer7
    server web_03   10.0.1.3:80 check observe layer7
.endif

backend api_be
    # set ACLs for abusers
    # make XFF available in entire transaction
    http-request    set-var(txn.forwarded)          req.hdr_ip(x-forwarded-for)
    http-request    set-var(txn.useragent)          req.hdr(user-agent),xxh64,hex,lower
    # extract site id, via parsing JSON fields in order of stupidity
    http-request    set-var(txn.site,ifnotset)  req.body,json_query('$.domain[0]')
    http-request    set-var(txn.site,ifnotset)  req.body,json_query('$.domain')
    http-request    set-var(txn.site,ifnotset)  req.body,json_query('$.d')
    # drop blocked sites immediately
    # https://www.haproxy.com/blog/introduction-to-haproxy-maps/
    # track denied sites in same table as for rate-limited sites
    # for tracking, the HTTP response code is different & is logged
    acl             denied_site var(txn.site),map(deny.map) -m found
    http-request    track-sc1 var(txn.site)  table cluster/abuse_id if denied_site
    http-request    deny deny_status 402 content-type "text/plain" lf-string "contact plausible support for site id=%[var(txn.site)]" if denied_site
    # calculate rate limiting after unceremoniously dumping blocked-sites
    # stick table inbound tracking by each var
    http-request    track-sc0 var(txn.forwarded) table cluster/abuse_ip
    http-request    track-sc1 var(txn.site)  table cluster/abuse_id
    http-request    track-sc2 var(txn.useragent) table cluster/abuse_ua
    # drop 429 requests by site with more than 15_000 req / 15 seconds
    # look up in stick table 1 which has a 15 second expiration period
    http-request    deny deny_status 429 content-type "text/plain" lf-string "contact plausible support for site id=%[var(txn.site)]" if { sc_http_req_rate(1,cluster/abuse_id) gt 5000 }
    # build composite header to allow caddy to balance more fairly
    http-request    set-header X-Plausible-LB "fwd=%[var(txn.forwarded)] uah=%[var(txn.useragent)] id=%[var(txn.site)]"
    # provide limited test capability in CI, if env var is set, then
    # send header details back in body for inspection.
.if defined(CI)
    http-request    return status 202 content-type "text/plain" lf-string "fwd=%[var(txn.forwarded)] uah=%[var(txn.useragent)] id=%[var(txn.site)]"
.else
    server caddy_api localhost:80
.endif

blocking from a map

    acl             denied_site var(txn.site),map(deny.map) -m found
    http-request    track-sc1 var(txn.site)  table cluster/abuse_id if denied_site
    http-request    deny deny_status 402 content-type "text/plain" \
        lf-string "contact support for site id=%[var(txn.site)]" if denied_site

running a simple lua app

# append lua-load to your global section
global
  lua-load hello.lua

# use in your front or back ends
frontend hello_fe
  acl           hello       path            /hello
  http-request  use-service lua.hello       if hello

-- hello.lua
core.Alert("lua: hello loaded");
core.Debug("lua: hello loaded")

core.register_service("hello", "http", function(applet)
   local response = "Hello World !"
   applet:set_status(200)
   applet:add_header("content-length", string.len(response))
   applet:add_header("content-type", "text/plain")
   applet:start_response()
   applet:send(response)
end)

a lua fetcher

Add a new lua-load fetch.lua line to your config.

-- fetch.lua
function site(txn)

 local payload = txn.req:dup()
 core.Debug("lua: " .. payload .. "\n")

 -- parses string value for customer site from POST body
 --   '{..., "domain":"dummy.site", ...}'
 -- local _, _, v = string.find(payload, '"(d|domain)"\s*:\s*"([^"]+)"')
 -- core.Info("site: " .. v)
return v
end

-- return fetcher
core.register_fetches("site", site)

making async outbound connections

-- https://www.haproxy.com/blog/5-ways-to-extend-haproxy-with-lua/
-- https://www.haproxy.com/blog/announcing-haproxy-2-5/
-- https://www.haproxy.com/blog/announcing-haproxy-2-6/
-- https://github.com/zareenc/haproxy-lua-examples/blob/master/lua_scripts/background_thread.lua
core.Alert("lua: loaded");

local function fetch()
    local httpclient = core.httpclient()
    while true do
        core.Debug("fetch: starting\n")
        local response = httpclient:get{
            url="http://127.0.0.1:5984/",
            timeout=3}
        core.Debug("Status: ".. response.status .. ", Reason : " .. response.reason ..
            ", Len:" .. string.len(response.body) .. "\n")
        core.Debug("fetch: sleeping\n")
        core.msleep(10000)
    end
end

core.register_task(fetch)

fetching stick table data

local httpclient = core.httpclient()
local response = httpclient:get{
  url="http://v2/containers/json",
  dst="unix@/tmp/docker.sock"}

Random Musings