This blog is also available on Gemini (What's Gemini?)

Nginx direct cachefile hosting, or I found a hammer, get me a nail

2018-02-01

Let's say you had an API that served JSON. Some of the responses don't change very often. Perhaps something like user data:

{ "uid": 12345, "addresses": [ { "state": "SC", "city": "Mayberry", "street": "1234 Lalaberry Lane", "zip": 12345 } ], "lname": "Smith", "fname": "John" }

It's an obvious case where caching could help. Perhaps you stick the data in memcached and write it out directly from your app.

This means you're still hitting application code. Wouldn't it be nice if you could have nginx write the cached data back to the client as if it were a static file? This is possible using a ramdisk and nginx's `try_files`.

Start with this little Mojolicious app:

#!perl use v5.20; use warnings; use Mojolicious::Lite; use File::Spec::Functions 'catfile'; use Cpanel::JSON::XS 'encode_json';

use constant CACHE_FILE_PATH => 'html'; use constant CACHE_DATASTRUCTURE => { uid => 12345, fname => 'John', lname => 'Smith', addresses => [{ street => '1234 Lalaberry Lane', city => 'Mayberry', state => 'SC', zip => 12345, }], }; use constant CACHE_JSON => encode_json( CACHE_DATASTRUCTURE );

get '/ramdisk/*' => sub { my ($c) = @_;

sleep 5;

my $url\_path = $c->req->url->path;

my $path = catfile( CACHE\_FILE\_PATH, $url\_path ); # TODO SECURITY ensure $path is actually under the absolute path to # CACHE\_FILE\_PATH, cleaning up any '..' or other path miscreants open( my $out, '>', $path ) or die "Can't write to $path: $!\\n"; print $out CACHE\_JSON; close $out;

$c->render( data => CACHE\_JSON, format => 'json', );

};

get '/direct/*' => sub { my ($c) = @_; $c->render( data => CACHE_JSON, format => 'json', ); };

app->start;

This provides two paths to the same JSON. The first one, `/ramdisk/*`, will write the JSON to a path we specify under our nginx root. This has a deliberate `sleep 5` call, which simulates the first request being very slow. The second, `/direct/*` is for benchmarking. It dumps some pre-encoded JSON back to the client, which gives us an upper limit on how fast we could go if we pulled that data out of memcached or something.

(If you use this code for anything, do note the security warning. The code as written here could allow an attacker to overwrite arbitrary files. You need to ensure the place you're writing is underneath the subdirectory you expect. I didn't want to clutter up the example too much with details, so this is left as an exercise to the reader.)

Save it as `mojo.pl` in a directory like this:

$ ls html mojo.pl

The `html` dir will be the place where nginx serves its static files. Create `html/ramdisk` and then mount a ramdisk there:

$ sudo mount -t tmpfs -o size=10M,mode=0777 tmpfs html/ramdisk/

This will give you a 10MB ramdisk writable by all users. When the mojo app above is called with `/ramdisk/foo`, it will write the JSON to this ramdisk and return it.

Now for the nginx config. Using `try_files`, we first check if the URI is directly available. If so, nginx will return it verbatim. If not, we have it proxy to our mojo app.

worker_processes 10;

events { worker_connections 1024; }

http { include mime.types; default_type application/octet-stream;

sendfile        on;
tcp\_nopush     on;

keepalive\_timeout 65;

server { listen 8001; server\_name localhost;

root html;

location /ramdisk { default\_type application/json; try\_files $uri $uri/ @cache\_build; }

location @cache\_build { proxy\_pass http://localhost:8002; } }

}

Start this up and call `http://localhost:8001/ramdisk/foo`. If the file hadn't been created yet, then that `sleep` from earlier will force it to take about 5 seconds to return a response. Once the file is created, the response should be nearly instant.

How "instant"? Very instant. Here's the result from `ab` of calling this 100,000 times, with 100 concurrent requests (all on localhost):

Concurrency Level: 100 Time taken for tests: 4.629 seconds Complete requests: 100000 Failed requests: 0 Total transferred: 37300000 bytes HTML transferred: 13400000 bytes Requests per second: 21604.02 [#/sec] (mean) Time per request: 4.629 [ms] (mean) Time per request: 0.046 [ms] (mean, across all concurrent requests) Transfer rate: 7869.43 [Kbytes/sec] received

Connection Times (ms) min mean[+/-sd] median max Connect: 0 2 0.5 2 6 Processing: 1 3 0.6 3 10 Waiting: 0 2 0.6 2 10 Total: 2 5 0.6 5 13

Percentage of the requests served within a certain time (ms) 50% 5 66% 5 75% 5 80% 5 90% 5 95% 5 98% 6 99% 7 100% 13 (longest request)

And the results from calling the mojo app with `/direct/foo`:

Concurrency Level: 100 Time taken for tests: 87.616 seconds Complete requests: 100000 Failed requests: 0 Total transferred: 28500000 bytes HTML transferred: 13400000 bytes Requests per second: 1141.34 [#/sec] (mean) Time per request: 87.616 [ms] (mean) Time per request: 0.876 [ms] (mean, across all concurrent requests) Transfer rate: 317.66 [Kbytes/sec] received

Connection Times (ms) min mean[+/-sd] median max Connect: 0 0 0.3 0 17 Processing: 8 88 32.7 88 174 Waiting: 8 87 32.7 88 174 Total: 8 88 32.7 88 174

Percentage of the requests served within a certain time (ms) 50% 88 66% 101 75% 111 80% 117 90% 132 95% 142 98% 152 99% 157 100% 174 (longest request)

We took 88ms down to just 5ms. This is on an Intel Core i7-6500 @ 2.5GHz.

If you're wondering, I didn't see any benefit to using `sendfile` or `tcp_nopush` in nginx. This may be because I'm doing everything over localhost.

What I like even more is that you don't need any special tools to manipulate the cache. Unix provides everything you need. Want to see the contents? `cat [file]`. Want to clear a cache file? `rm [file]`. Want to set a local override? `EDITOR-OF-CHOICE [file]`.

Now to go find a use for this.