Nginx direct cachefile hosting, or I found a hammer, get me a nail
2018-02-01
Let’s say you had an API that served JSON. Some of the responses don’t change very often. Perhaps something like user data:
{
"uid": 12345,
"addresses": [
{
"state": "SC",
"city": "Mayberry",
"street": "1234 Lalaberry Lane",
"zip": 12345
}
],
"lname": "Smith",
"fname": "John"
}
It’s an obvious case where caching could help. Perhaps you stick the data in memcached and write it out directly from your app.
This means you’re still hitting application code. Wouldn’t it be nice if you could have nginx write the cached data back to the client as if it were a static file? This is possible using a ramdisk and nginx’s try_files
.
Start with this little Mojolicious app:
#!perl
use v5.20;
use warnings;
use Mojolicious::Lite;
use File::Spec::Functions 'catfile';
use Cpanel::JSON::XS 'encode_json';
use constant CACHE_FILE_PATH => 'html';
use constant CACHE_DATASTRUCTURE => {
uid => 12345,
fname => 'John',
lname => 'Smith',
addresses => [{
street => '1234 Lalaberry Lane',
city => 'Mayberry',
state => 'SC',
zip => 12345,
}],
};
use constant CACHE_JSON => encode_json( CACHE_DATASTRUCTURE );
get '/ramdisk/*' => sub {
my ($c) = @_;
sleep 5;
my $url_path = $c->req->url->path;
my $path = catfile( CACHE_FILE_PATH, $url_path );
# TODO SECURITY ensure $path is actually under the absolute path to
# CACHE_FILE_PATH, cleaning up any '..' or other path miscreants
open( my $out, '>', $path )
or die "Can't write to $path: $!\n";
print $out CACHE_JSON;
close $out;
$c->render(
data => CACHE_JSON,
format => 'json',
);
};
get '/direct/*' => sub {
my ($c) = @_;
$c->render(
data => CACHE_JSON,
format => 'json',
);
};
app->start;
This provides two paths to the same JSON. The first one, /ramdisk/*
, will write the JSON to a path we specify under our nginx root. This has a deliberate sleep 5
call, which simulates the first request being very slow. The second, /direct/*
is for benchmarking. It dumps some pre-encoded JSON back to the client, which gives us an upper limit on how fast we could go if we pulled that data out of memcached or something.
(If you use this code for anything, do note the security warning. The code as written here could allow an attacker to overwrite arbitrary files. You need to ensure the place you’re writing is underneath the subdirectory you expect. I didn’t want to clutter up the example too much with details, so this is left as an exercise to the reader.)
Save it as mojo.pl
in a directory like this:
$ ls
html
mojo.pl
The html
dir will be the place where nginx serves its static files. Create html/ramdisk
and then mount a ramdisk there:
$ sudo mount -t tmpfs -o size=10M,mode=0777 tmpfs html/ramdisk/
This will give you a 10MB ramdisk writable by all users. When the mojo app above is called with /ramdisk/foo
, it will write the JSON to this ramdisk and return it.
Now for the nginx config. Using try_files
, we first check if the URI is directly available. If so, nginx will return it verbatim. If not, we have it proxy to our mojo app.
worker_processes 10;
events {
worker_connections 1024;
}
http {
include mime.types;
default_type application/octet-stream;
sendfile on;
tcp_nopush on;
keepalive_timeout 65;
server {
listen 8001;
server_name localhost;
root html;
location /ramdisk {
default_type application/json;
try_files $uri $uri/ @cache_build;
}
location @cache_build {
proxy_pass http://localhost:8002;
}
}
}
Start this up and call http://localhost:8001/ramdisk/foo
. If the file hadn’t been created yet, then that sleep
from earlier will force it to take about 5 seconds to return a response. Once the file is created, the response should be nearly instant.
How “instant”? Very instant. Here’s the result from ab
of calling this 100,000 times, with 100 concurrent requests (all on localhost):
Concurrency Level: 100
Time taken for tests: 4.629 seconds
Complete requests: 100000
Failed requests: 0
Total transferred: 37300000 bytes
HTML transferred: 13400000 bytes
Requests per second: 21604.02 [#/sec] (mean)
Time per request: 4.629 [ms] (mean)
Time per request: 0.046 [ms] (mean, across all concurrent requests)
Transfer rate: 7869.43 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 2 0.5 2 6
Processing: 1 3 0.6 3 10
Waiting: 0 2 0.6 2 10
Total: 2 5 0.6 5 13
Percentage of the requests served within a certain time (ms)
50% 5
66% 5
75% 5
80% 5
90% 5
95% 5
98% 6
99% 7
100% 13 (longest request)
And the results from calling the mojo app with /direct/foo
:
Concurrency Level: 100
Time taken for tests: 87.616 seconds
Complete requests: 100000
Failed requests: 0
Total transferred: 28500000 bytes
HTML transferred: 13400000 bytes
Requests per second: 1141.34 [#/sec] (mean)
Time per request: 87.616 [ms] (mean)
Time per request: 0.876 [ms] (mean, across all concurrent requests)
Transfer rate: 317.66 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.3 0 17
Processing: 8 88 32.7 88 174
Waiting: 8 87 32.7 88 174
Total: 8 88 32.7 88 174
Percentage of the requests served within a certain time (ms)
50% 88
66% 101
75% 111
80% 117
90% 132
95% 142
98% 152
99% 157
100% 174 (longest request)
We took 88ms down to just 5ms. This is on an Intel Core i7-6500 @ 2.5GHz.
If you’re wondering, I didn’t see any benefit to using sendfile
or tcp_nopush
in nginx. This may be because I’m doing everything over localhost.
What I like even more is that you don’t need any special tools to manipulate the cache. Unix provides everything you need. Want to see the contents? cat [file]
. Want to clear a cache file? rm [file]
. Want to set a local override? EDITOR-OF-CHOICE [file]
.
Now to go find a use for this.