We need a distributed social networking protocol…Could Opera Unite be a key?

(Written July 2007)

The digital dark ages is already a reality for a lot of people who grew up with hosted e-mail services like Compuserve and AOL. A lot of those users had no choice but to accept the loss of all their received and sent e-mail when they unsubscribed, the service went under, or their account was deleted from inactivity. Mark Pilgrim wrote about the challenge of long-term data preservation without open formats and source code:

Data readable by only one application is a big risk factor, because the application won’t be around forever. If that application only runs on one operating system, that’s even worse, because the operating system won’t be around forever either. If that operating system only runs on one hardware platform, that’s even worse still. No hardware lasts forever, and you may eventually need to resort to emulating the hardware in software. Emulation is the ultimate fallback. But if any or all of those layers are closed, emulation may be costly or even impossible. And if any of the layers are DRM-encumbered, emulating them may be illegal.

Most social network users don’t keep a copy of their data in any format, so how can we expect to preserve it? Will MySpace be around for 5 years? 20 years? People have already declared Friendster dead; all your testimonials and contacts of old friends could be gone any month now.

The next killer social networking application shouldn’t be another Friendster or MySpace, but rather an open standard allowing us to create and manage our own social data. And it is “our” data. Points of contact with old friends we’ve managed to track down, new friends made from shared interests, anecdotes and testimonials we’ve written for friends and loved ones, snapshots of our interests and personalities. Only by keeping this information in an open format, available for us to backup, can we expect for it to survive.

Let’s say that MySpace suddenly had an export feature. How much would it need to include to be meaningful in 50 years? Obviously you’d want your profile, pics, videos, and blog posts; your inbox and sent mail; probably comments you’ve made on friends’ profiles and blog posts. How much of your friends’ data would you want?

October 2009: We’re still not there. Google Wave will vastly improve the situation (at least having a permanent record for IM), but the real goal here is something trivially easy to install, letting users host their own personal and networking data. Big web providers could still carve out a business by caching copies of user data (to save bandwidth, or for backup) and concentrating on indexing, searching, and providing apps like those for Facebook.

When Opera released Unite (basically a webserver in the browser), I wasn’t sure what they’d get out of it, or what the use case was, but actually this the perfect platform on which to build a distributed social network app. The default storage location of all your data would be on your computer, easily backed up at any time.

Next best thing: SocialSafe, a Facebook backup tool. For three bucks you could be able to show your kids how their parents met, and what they were like then.

Smallest valid HTML documents

HTML4

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN">
<title></title>
<p>

HTML5

<!DOCTYPE html>

Smallest “useful” HTML5 document

<!DOCTYPE html>
<link rel=stylesheet href=site.css>
<script src=site.js></script>
<title>Page Title</title>
<h1>Heading</h1>
<p>Content...

Check em if you want. To avoid problems in IE you might want an opening body tag, but you don’t need a closing one!

Chrome Frame

The idea of a plugin that replaces one browser’s rendering engine with another’s has been floating around for years.

Google is going to give this crazy idea a shot with Chrome Frame. The idea is that an IE6+ user gets bugged that a site requires the Chrome Frame plugin. After she installs it, web pages can request that they be rendered by Chrome’s advanced layout and Javascript engines (enabling canvas, SVG, CSS animations and lots of other HTML5 goodies).

In the short run this could be a support nightmare for Microsoft and IT departments until the plugin is stable. Supposedly Google won’t take over the network stack nor any of the browser chrome (!), so it’ll just feel a little strange. On some pages right-clicking items may bring up different menus (Google will probably want to normalize this) and, notably, a few of IE8’s fancy new context-menu features won’t be available. Javascript guru Alex Russell is on the project so I have some faith that the integration will make sense.

This is the ultimate F.U. from Google to Microsoft. “We’re chewing away at Office, Windows in general, and now we’ll decide when IE gets new features.” Of course it’s also a shot over the bow of other browsers; lag behind Chrome feature-wise at your own risk.

Will this take off? There are, of course, hordes of IE users who can’t install plugins, and some who won’t, but web developers will be pushing this hard.

Javascript files don’t auto-update

On a panel of 4 Javascript library developers at Ajax Experience 2008, a question came up about how their libraries use browser detection. When John Resig suggested that libraries should strive for full feature detection (hardly used at all at the time) instead of browser/object detection, the other developers reacted like he was crazy. They mentioned the cases where this just isn’t possible, but none of the developers mentioned the huge, very good reason to do this whenever possible: There are pages using libraries deployed everyday that never get maintained. Yes, when a new browser changes behavior the libraries can quickly update their codebase, but many pages will never get those updates.

Note, there are autoloaders that may maintain the latest library version, but this doesn’t guarantee a stable library API over time. Feature detection (mostly) does.

The Quickening of Facebook

If you’ve used Facebook in Opera and Firefox, you might have noticed that Facebook is several magnitudes faster in FF, but this has nothing to do with FF’s speed. For FF and IE users, Facebook uses a client-side architecture called “Quickening” that basically makes a few popular pages into full AJAX applications that stay loaded in the browser for a long time. All transitions between “quickened” pages are done through AJAX calls and a cache system makes sure all pages displayed from cache are updated based on changes from the server (e.g. comments others made, ad rotation) or client (e.g. comments you made).

While other sites have certainly done this before, the complexity of Facebook’s apps and level of optimization performed is staggering. The system continuously self-monitors page performance and usage of resources and re-optimizes resources like JS/CSS/sprite images to send and receive as few bytes as possible.

Video presentation goodness: Velocity 09: David Wei and Changhao Jiang, “Frontend Performance Engineering in Facebook”

PDF readers: Help us read.

PDF articles are notorious usability disasters, with the worst being multi-column documents that require you to constantly scroll in opposite directions as you move through the columns. PDF readers should let us draw a simple path through the document (maybe zoomed out) to outline the flow of text through the article (better, it could try to guess this for us). Once the path is set, the PDF reader could either:

A) tie the scrollwheel linearly to the text flow, so that simply scrolling down shifted the document to the next column to read. Maybe grey-out the columns not actively being read.

or B) rearrange the columns to make the document single-column.

Reading PDFs shouldn’t be so painful.

I should mention regular multi-column web pages have the same issues on mobile browsers. There’s a good case for using media queries to switch small viewport devices to see single-column layouts, and generally keeping all long articles single-column.

You Probably Don’t Need ETag

(updated 3/4 to include the “Serving from clusters” case)

As I see more server scripts implementing conditional GET (a good thing), I also see the tendency to use a hash of the content for the ETag header value. While this doesn’t break anything, this often needlessly reduces performance of the system.

ETag is often misunderstood to function as a cache key. I.e., if two URLs give the same ETag, the browser could use the same cache entry for both. This is not the case. ETag is a cache key for a given URL. Think of the cache key as (URL + ETag). Both must match for the client to be able to create conditional GET requests.

What follows is that, if you have a unique URL and can send a Last-Modified header (e.g. based on mtime), you don’t need ETag at all. The older HTTP/1.0 Last-Modified/If-Modified-Since mechanism works just fine for implementing conditional GETs and will save you a bit of bandwidth. Opening and hashing content to create or validate an ETag is just a waste of resources and bandwidth.

When you actually need ETag

There are only a few situations where Last-Modified won’t suffice.

Multiple versions of a single URL

Let’s say a page outputs different content for logged in users, and you want to allow conditional GETs for each version. In this case, ETag needs to change with auth status, and, in fact, you should assume different users might share a browser, so you’d want to embed something user-specific in the ETag as well. E.g., ETag = mtime + userId.

In the case above, make sure to mark private pages with “private” in the Cache-Control header, so any user-specific content will not be kept in shared proxy caches.

No modification time available

If there’s no way to get (or guess) a Last-Modified time, you’ll have use ETag if you want to allow conditional GETs at all. You can generate it by hashing the content (or using any function that changes when the content changes).

Serving from clusters

If you serve files from multiple servers, it’s possible that file timestamps could differ, causing Last-Modified dates sent out to shift and needless 200 responses when a client hits a different server. Basically, if you can’t trust your mtime to stay synched (I don’t know how often this is an issue), it may be better to place a hash of the content in an ETag.

In any case using ETag, when handling a conditional GET request (which may contain multiple ETag values in the If-None-Match header), it’s not sufficient to return the 304 status code; you must include the particular ETag for the content you want used. Most software I’ve seen at least gets this right.

I got this wrong, too.

While writing this article I realized my own PHP conditional GET class used in Minify, has no way to disable unnecessary ETags (when the last modified time is known).

Safari Cache-Control:must-revalidate bug

Update Apr 8 2009: Apparently this bug existed in previous Safari versions (at least back to 3.1), i.e. including “must-revalidate” in Cache-Control means Expire and max-age will both be ignored by Safari. Here’s the Apple bug, if you happen to be an employee. I created an Apple account but still couldn’t find it.


Short version: Safari 4 beta incorrectly interprets the Cache-Control “must-revalidate” directive and re-requests the file each time despite freshness info sent via “max-age”.

Long version

When a server sends Cache-Control with “must-revalidate”, this tells clients/caches that, after it expires, the cached resource should not be used without first checking with the server (sending a conditional GET). From the spec (my emphasis):

When the must-revalidate directive is present in a response received by a cache, that cache MUST NOT use the entry after it becomes stale to respond to a subsequent request without first revalidating it with the origin server. [HTTP/1.1]

In other words, while the cache is fresh, there’s no need to revalidate.

E.g. I serve this Javascript file with the header “Cache-Control: public, max-age=1800, must-revalidate”. This tells clients: “consider this fresh for 30 minutes, then check back before using it again”. Unless you force a refresh, the browser won’t re-request the file for 30 minutes, after which it will send a conditional GET request. If the file hasn’t changed, the servers returns a “304 Not Modified” and the same Cache-Control header. This tells the browser to keep using the cache for another half hour and check back again. (Yes, half an hour is too short, I’ll change this later.)

Well, Safari 4 beta re-requests the file each time it needs it, seemingly ignoring the “max-age” directive. At least it’s sending a conditional GET, but it’s still a waste of bandwidth and time.

Minify getting out there

Interest in Minify seems to be picking up:

Any ideas on how to make it better?