Why Gopher is Awful
2013-10-27
With Overbite recently making the rounds on Reddit /r/programming and Hacker News, I thought it was time to chime in with some thoughts on Gopher, and why it lost to HTTP for good reason. Despite claims to the contrary, the only reason it’s being floated in some circles really is nostalgia.
If you go looking through my CPAN directory, you will notice Gopher::Server
and Apache::GopherHandler
. The first was a server implementation of the Gopher protocol, and the second glued that into Apache2.
I don’t consider this to be a complete waste of time. I learned how to use Apache2’s protocol handlers (yes, Apache2 is decoupled enough that it can implement other protocols inside mod_perl). Many years ago, I used it as sample code for a job interview and I was praised for its quality.
(Sidenote: as a minor point of criticism, I was also told by the interviewer to never put “fix later” in a comment. You can put “fix after this other project is done” or “fix by 10/23/20xx”. If you put “later”, it’ll never get done. I didn’t take that job, but I’ve tried to follow that since.)
Gopher has some interesting ideas. Its structure forces a menu hierarchy between servers, and allows clients to present that hierarchy in any way they see fit. This could be a simple text-based menu, but it could be some kind of node diagram where the user navigates entirely by touching entities.
Both HTTP and Gopher have design flaws. If we roll back to HTTP/0.9, we see:
- Errors are returned as documents rather than numeric codes
- No length header or end-of-transmission character; the server just closes the connection when its done
- No indication of the type of document being sent
- Connections are transient, being closed at the end of each request, which makes poor use of the TCP sliding window
- No provision for checking the status of a document to see if it changed since the last time it was cached
- Server doesn’t send any header when the request is initiated (due to the TCP three-way handshake, the server can send some initial data for free; you see this in SMTP’s server connection header, for instance)
Of these, only the last one is still an issue in HTTP/1.1, and it’s a relatively minor point–you’d maybe want to have the server version and the Server header in there (again, like what SMTP servers do), but it’s not that important. Response codes were added for both success and failure. “Length” and “Content-Type” headers were added. “Keep-Alive” was added to keep the connection open for making multiple requests (further improved by Google’s SPDY).
EDIT 2013/12/14: After thinking about it for a while, the lack of an initial server header is more important than I thought. It’s not so much optimizing for TCP use, but rather for authentication. By sending a bit of randomly-selected data in that initial connect, the client can use that data in an encrypted password scheme to protect against certain cryptographic attacks, such as replay attacks.
Now lets look at Gopher’s problems:
- Server doesn’t send any header when the request is initiated
- Types are specified in the menu, but only as a single ASCII character, which limits the number of possible types
- Menu entries and text files end with “.<CR><LF>” to indicate that it’s done (similar to SMTP), but binary files are ended by closing the connection. There isn’t even a checksum header to verify that nothing got screwed up.
- There’s a menu type identifier ‘g’ for gif files, and ‘I’ for all other image types (note that this is before the gif patents became a big mess)
- No error codes
- Closes the connection at the end of each request rather than holding it open for TCP sliding window
- No provision for checking the status of a document to see if it changed since the last time it was cached
Gopher+ adds the possibility for MIME types (like HTTP’s Content-Type header) and a few error codes (still nowhere near HTTP/1.1’s rich number of codes, but at least it’s something). Using the “$” command in selectors gives a view with ballpark estimates of document length, but it isn’t meant to be an exact measure for transfer, just a nice thing to display to users [EDIT 2013/12/14: There is a length field specified in section 2.3 of the Gopher+ protocol for data transfer.] There’s still no checksums, is still inefficient over TCP, and has no provisions to help caching.
Giving Gopher the benefit of Gopher+ extensions is being generous. The extensions were specified in July 1993. Mosaic 1.0 was released in November of that year, and quickly became all the rage. Mosaic could function as a Gopher client, but it also was the first HTTP/HTML browser that worked. Just as people were starting to implement Gopher+, everyone decided to move to HTTP. Gopher+ has been on the back burner ever since.
Whereas the fixes to HTTP that happened in versions 1.0 and 1.1 are now widespread, the Gopher+ fixes never went anywhere. Not even (as far as I can tell) within the Gopher Revival team. Even if they were, Gopher+ is still badly flawed for the reasons above.
The Gopher Revival people make a big deal about how Gopher is “resource lite”. This is only true because it’s intentionally hobbled. HTTP gives you the choice to have a complex web site. A valid, minimal HTTP/1.1 header is only a few dozen bytes more than a Gopher selector. We have huge server farms for HTTP because we choose to have complex web applications. If we wanted to serve mostly-static content over HTTP, we could run it on extremely minimal hardware, too. (I can’t find the link at the moment, but an HTTP server running on an old Amiga once survived the Slashdot Effect just fine.) For that matter, the lack of caching provisions and inefficient TCP usage actually increase its bandwidth usage compared to running modern HTTP for equivalent content.
The combination of HTTP and HTML won for a reason. Gopher is awful and way behind what HTTP now gives us. I see no reason to bother fixing it.