Delving the depths of computing,
hoping not to get eaten by a wumpus

By Timm Murray

Why Gopher is Awful

2013-10-27


With Overbite recently making the rounds on Reddit /r/programming and Hacker News, I thought it was time to chime in with some thoughts on Gopher, and why it lost to HTTP for good reason. Despite claims to the contrary, the only reason it’s being floated in some circles really is nostalgia.

If you go looking through my CPAN directory, you will notice Gopher::Server and Apache::GopherHandler. The first was a server implementation of the Gopher protocol, and the second glued that into Apache2.

I don’t consider this to be a complete waste of time. I learned how to use Apache2’s protocol handlers (yes, Apache2 is decoupled enough that it can implement other protocols inside mod_perl). Many years ago, I used it as sample code for a job interview and I was praised for its quality.

(Sidenote: as a minor point of criticism, I was also told by the interviewer to never put “fix later” in a comment. You can put “fix after this other project is done” or “fix by 10/23/20xx”. If you put “later”, it’ll never get done. I didn’t take that job, but I’ve tried to follow that since.)

Gopher has some interesting ideas. Its structure forces a menu hierarchy between servers, and allows clients to present that hierarchy in any way they see fit. This could be a simple text-based menu, but it could be some kind of node diagram where the user navigates entirely by touching entities.

Both HTTP and Gopher have design flaws. If we roll back to HTTP/0.9, we see:

Of these, only the last one is still an issue in HTTP/1.1, and it’s a relatively minor point–you’d maybe want to have the server version and the Server header in there (again, like what SMTP servers do), but it’s not that important. Response codes were added for both success and failure. “Length” and “Content-Type” headers were added. “Keep-Alive” was added to keep the connection open for making multiple requests (further improved by Google’s SPDY).

EDIT 2013/12/14: After thinking about it for a while, the lack of an initial server header is more important than I thought. It’s not so much optimizing for TCP use, but rather for authentication. By sending a bit of randomly-selected data in that initial connect, the client can use that data in an encrypted password scheme to protect against certain cryptographic attacks, such as replay attacks.

Now lets look at Gopher’s problems:

Gopher+ adds the possibility for MIME types (like HTTP’s Content-Type header) and a few error codes (still nowhere near HTTP/1.1’s rich number of codes, but at least it’s something). Using the “$” command in selectors gives a view with ballpark estimates of document length, but it isn’t meant to be an exact measure for transfer, just a nice thing to display to users [EDIT 2013/12/14: There is a length field specified in section 2.3 of the Gopher+ protocol for data transfer.] There’s still no checksums, is still inefficient over TCP, and has no provisions to help caching.

Giving Gopher the benefit of Gopher+ extensions is being generous. The extensions were specified in July 1993. Mosaic 1.0 was released in November of that year, and quickly became all the rage. Mosaic could function as a Gopher client, but it also was the first HTTP/HTML browser that worked. Just as people were starting to implement Gopher+, everyone decided to move to HTTP. Gopher+ has been on the back burner ever since.

Whereas the fixes to HTTP that happened in versions 1.0 and 1.1 are now widespread, the Gopher+ fixes never went anywhere. Not even (as far as I can tell) within the Gopher Revival team. Even if they were, Gopher+ is still badly flawed for the reasons above.

The Gopher Revival people make a big deal about how Gopher is “resource lite”. This is only true because it’s intentionally hobbled. HTTP gives you the choice to have a complex web site. A valid, minimal HTTP/1.1 header is only a few dozen bytes more than a Gopher selector. We have huge server farms for HTTP because we choose to have complex web applications. If we wanted to serve mostly-static content over HTTP, we could run it on extremely minimal hardware, too. (I can’t find the link at the moment, but an HTTP server running on an old Amiga once survived the Slashdot Effect just fine.) For that matter, the lack of caching provisions and inefficient TCP usage actually increase its bandwidth usage compared to running modern HTTP for equivalent content.

The combination of HTTP and HTML won for a reason. Gopher is awful and way behind what HTTP now gives us. I see no reason to bother fixing it.



Copyright © 2024 Timm Murray
CC BY-NC

Opinions expressed are solely my own and do not express the views or opinions of my employer.