A Zend Framework App in a Single File

Most PHP micro-frameworks I’ve reviewed have some major cons: incomplete namespacing of functions/classes/global vars; doing too much/little; being under-tested; and the worst: forcing a unique (and usually under-documented) code structure that will make it difficult to “graduate” an app into a more full-featured framework. It also seems silly to rely on “micro-optimized” code if performance isn’t your number one goal and you have well-tested libraries sitting around.

So over the weekend, as a proof of concept (and mostly to get better acquainted with ZF’s MVC implementation), I built some minimal classes allowing one to implement a “quick and dirty” Zend Framework MVC app in a single script, with the resulting abomination being relatively easy to convert to a full app. The README shows how such a script might look. Continue reading  

Elgg Core Proposal: New Table “entity_list_items”

[This proposal has been superseded with ElggCollection.]

As of Elgg 1.7.4 there’s no way to specify a specific list or ordering of entities for use in an elgg_get_entities query. E.g. one might want to implement:

  • A “featured items” list for a group or user widget. On a group page this could be implemented in “left” and “right” lists for better display control
  • “Sticky” entities in discussion lists, or any other object lists
  • A “favorites” list for each user

Continue reading  

MVC: M != the Database

Great article on the misunderstood scope of the “Model” in the Model-View-Controller architecture. The takehome: Models are commonly thought of as wrappers for database access/stored objects, but application state and business logic need to go in them, too. Otherwise you get bloated controller and/or views that clumsily try to take care of these concerns.

Earlier today I was building a multi-step wizard app—no DB storage—and found my controllers getting messy with session and state handling code. The solution was a Wizard class that encapsulated the steps through the forms and the maintenance/validation of the collected data, but until this article I might not have thought of it as a model.

Zend Framework kinda, sorta has a hack for making a wizard using “subforms” and what makes this an awkward construction is that a view helper is not the best place to maintain the state info that a wizard requires. A “Zend_Wizard” class would need to encapsulate several forms; requirements and behaviors for moving forward and backward through the forms; and methods to inc/dec the step state, fetching the active form injected with session data. Only a single controller would be needed with 2 next/back methods, rather than methods for each step.

In the Elgg plugin ecosystem, there’s very little guidance/emphasis on creating models, so writers rarely encapsulate business logic at all. Instead it becomes spread out amongst controllers and—worse—views. Since views can easily be overridden depending on the order of plugins in a list, business logic can easily be mangled by logic from another plugin. That said, the view system does allow mods to mix behaviors in ways that would be difficult were views to be tied more closely to individual plugin models; a few exemplary mods—that store state and validation in a model rather than the controller/views—might help positively influence the culture.

Bookmarklet and PHP to prevent Shibboleth-related Firefox Lockouts

Reason this might be useful.

/*
 * Remove all _shibstate cookies if there are too many of them. This usually
 * occurs due to Firefox session restores. Unfortunately we don't know which is
 * the active state cookie, so we have to delete them all, but this is a lessor
 * crime than locking the user out with server errors.
 *
 * In an app a good time to call this is when a user is not logged in or has an
 * expired app session. This way we can cleanup their cookies before forwarding
 * them to the shib login process. Also after logout you'll want to call this
 * with parameter 0 to always remove them.
 *
 * @param int $allowableStateCookies if the number of _shibstate cookies
 * exceeds this, they will all be removed.
 */
function Shibboleth_preventFirefoxLockout($allowableStateCookies = 10)
{
    $stateKeys = array();
    foreach ($_COOKIE as $key => $val) {
        if (0 === strpos($key, '_shibstate')) {
            $stateKeys[] = $key;
        }
    }
    if (count($stateKeys) > $allowableStateCookies) {
        foreach ($stateKeys as $key) {
            setcookie($key, '', time() - 3600, '/');
        }
    }
}

Here’s a bookmarklet that essentially does the same thing: Fix Shibboleth Lockout

Uh-Oh: Firefox’s Unique Session Cookie Behavior

By now, Opera’s invention of restoring tabs automatically is available in most browsers, but unlike every other browser, Firefox’s restored tabs retain session cookies for the domains of the saved tabs Firefox restores all session cookies as if the browser were never closed. This is handy in some ways, but dangerous in others:

It’s fooling web developers by breaking a very old and widely-known convention. Since Netscape’s original spec (around 1994) a cookie with an empty/missing expires was to be discarded “when the user’s session ends” (later clarified as “when the user agent exits” in RFC2109), and thousands of prominent web pages describe “session cookies” this way.

A common session design pattern uses a persistent cookie to establish low-level identity info and a session cookie for full authentication. Developers may not know that their full auth period may be lasting days or weeks, including trips to insecure wifi spots, browsing by multiple users, etc.

It’s fooling users. No one thinks of a single browsing “session” as encompassing several days of browser usage just because the same tabs were open, and users frequently read that they need to simply exit their browser to ensure their session is ended.

Recommendations

  • Be aware that Firefox session cookies can linger for days, despite the user having closed their browser.
  • Manage session timeouts on the server-side and/or via HMAC-signed timestamp values in the cookie contents (don’t let the client decide how long a session should last).
  • If you can, include secure in the cookie header. Firefox does not restore HTTPS session cookies. Realize that in later FF versions, “secure” cookies also are restored.
  • If you give out session cookies with unique names, have your application clean these up when they’re no longer needed. If you don’t, your Firefox users could suffer from…

Cookie Accumulation Torment

This annoying situation occurs when Firefox gains so many local cookies that the web server begins to deny all your requests. Deleting some or all these cookies is the only way to fix the issue because—yay—the problem session cookies persist across browser, and even OS, restarts.

Big Shibboleth Implications

If your Shibboleth-authenticating app maintains its own session, make sure that the “sign out” function searches for and deletes the local Shibboleth cookies (or that the SP sets only “secure” cookies). Otherwise this could happen:

  1. Jane “signs out”, closes Firefox, and lends her computer to Sally.
  2. Sally opens Firefox and clicks “sign in”.
  3. Sally is instantly authenticated into Jane’s account!

Jane’s application session was over, but Firefox allowed her Shibboleth session to live on.

Also, since Shibboleth gives out uniquely-named session cookies (prepended with _shibstate), failing to clean these up will lead Firefox users to the aforementioned torment. If the user has an app open all day every day, count on her gaining at least one cookie per day.

Elgg, ElggChat, and Greener HTTP Polling

At my new job, we maintain a site powered by Elgg, the PHP-based social networking platform. I’m enjoying getting to know the system and the development community, but my biggest criticisms are related to plugins.

On the basis of “keeping the core light”, almost all functionality is outsourced to plugins, and you’ll need lots of them. Venturing beyond the “core” plugins—generally solid, but often providing just enough functionality to leave you wanting—is scary because generally you’re tying 3rd-party code into the event system running every request on the site. Nontrivial plugins have to provide a lot of their own infrastructure and this seems to make it more likely that you’ll run into conflict bugs with other plugins. With Elgg being a small-ish project, non-core plugins tend to end up not well-maintained, which makes the notion of upgrading to the latest Elgg version a bit scary when there have been API changes. Then there’s the matter of determining in what order your many plugins sit in the chain; order can mean subtle differences in processing and you just have to shift things around hoping to not break something while fixing something else. Those are my initial impressions anyway, and no doubt many other open source systems relying heavily on plugins have these problems. There’s a lot of great rope to hang yourself with.

Jeroen Dalsem’s ElggChat seems to be the slickest chat mod for Elgg. Its UI more or less mirrors Facebook’s chat, making it instantly usable. It’s a nice piece of work. Now for the bad news (as of version 0.4.5):

  • Every tab of every logged in user polls the server every 5 or 10 seconds. This isn’t a design flaw—all web chat clients must poll or use some form of comet (to which common PHP environments are not well-suited)—but other factors make ElggChat’s polling worse than it needs to be:
  • Each poll action looks up all the user’s friends and existing chat sessions and messages and returns all of that in every response. If the user had 20 friends, a table containing all 20 of them would be generated and returned every 5 seconds. The visible UI would also become unwieldy if not unusable.
  • The poll actions don’t use Elgg’s “action token” system (added in 1.6 to prevent CSRFs). This isn’t much of a security flaw, but in Elgg 1.6 it fills your httpd logs with “WARNING: Action elggchat/poll was called without an action token…” If you average 20 logged in users browsing the site, that’s 172,800 long, useless error log entries (a sea obscuring errors you want to see) per day. Double that if you’re polling at 5 seconds.
  • The recent Elgg 1.7 makes the action tokens mandatory so the mod won’t work at all if you’ve upgraded.
  • Dalsem hasn’t updated it for 80 days, I can’t find any public repo of the code (to see if he’s working on it), and he doesn’t  respond to commenters wondering about its future.

The thought of branching and fixing this myself is not attractive at the moment, for a few reasons (one of which being our site would arguably be better served by a system not depending on the Elgg backend, since we have content in other systems, too), but here are some thoughts on it.

Adding the action token is obviously the low hanging fruit. I believe I read Facebook loads the friends and status list only every 3 minutes, which seems reasonable. That would cut most of the poll actions down to simply maintaining chat sessions. Facebook’s solution to the friends online UI seems reasonable: show only those active, not offline users.

“Greener” Polling

Setting aside the ideal of comet connections, one of the worst aspects of polling is the added server load of firing up server-side code and your database for each of those extra (and mostly useless) requests. A much lighter mechanism would be to maintain a simple message queue via a single flat file, accessible via HTTP, for each client. The client would simply poll the file with a conditional XHR GET request and the httpd would handle this with minimal overhead, returning minimal 304 headers when appropriate.

In its simplest form, the poll file would just be an alerting mechanism: To “alert” a client you simply place a new timestamp in its poll file. On the next poll the client will see the timestamp change and immediately make another XHR request to fetch the new data from the server-side script.

Integrating this with ElggChat

In ElggChat, clicking a user creates a unique “chatsession” (I’m calling this “CID”) on the server, and each message sent is destined for a particular CID. This makes each tab in the UI like a miniature “room”, with the ability to host multiple users. You can always open a separate room to have a side conversation, even with the same user.

In the new model, before returning the CID to the sender, you’d update the poll files of both the sender and recipient, adding the CID to each. When the files are modified, you really need to keep only a subset of recent messages for each CID. Just enough to restore the chat context when the user browses to a new page. The advantage is, all the work of maintaining the chat sessions and queues is only done when posts are sent, never during the many poll requests.

Since these poll files would all be sitting in public directories, their filenames would need to contain an unguessable string associated with each user.

Miško Hevery Programming Talks

Miško Hevery gave several presentations at Google last year that are worth checking out, I think even if you’re familiar with Dependency Injections and unit testing. They cover the ways that global state can sneak into applications, how undeclared dependencies make classes harder to test and reuse, and how DI in general eases a lot of pain. He’s just a good teacher and the examples are clear and made me want to attack a lot of old code. Hig blog also covers a lot of the same topics.

Hacking a 3rd party script for bookmarklet fun

A few weeks ago I created a simple bookmarklet that loads del.icio.us’s PlayTagger script into the current page. This post covers how some problems with this script were worked through.

Too late

The first challenge was that PlayTagger was designed to initialize itself (let’s call this method “init“) on window.onload: If a user fired the bookmarklet after window.onload (99% of the time), playtagger.js would load but init would’ve missed its chance to be called. This means I had to call init manually, but since script elements load asynchronously, I had to wait until init actually existed in the global scope to call it. This was fairly easily accomplished by attaching my code to the new script element’s “load” event (and using some proprietary “readyState” junk for IE).

Too early

If the page takes a long time to load, it’s possible the user will fire the bookmarklet before window.onload. One of two things will occur:

If it’s fired before the DOM is even “ready”, the bookmarklet throws an error when it tries to append the script element. I could use one of the standard “DOMready” routines to run the bookmarklet code a little later, but this case is rare enough to be not worth the effort to support; by the time the user can see there are mp3s on the page, the DOM is usually ready.

Assuming the DOM is ready, playtagger.js gets loaded via a new script element, the bookmarklet fires init, but then, thanks to playtagger’s built-in event attachment, init is called a second time on window.onload, producing a second “play” button per mp3 link. Harmless, but not good enough.

Preventing the 2nd init call

It would be nice if you could sniff whether or not window.onload has fired, but this doesn’t seem to be possible. Maybe via IE junk. Any ideas for a standards based way to tell?

My only hope seemed to be to somehow disable init after manually calling it. The first try was to just redefine init to a null function after calling it:

init();
init = function () {};

I figured out that redefining init would not help here due to the way it’s attached to window.onload:

// simplified
var addLoadEvent = function(f) {
    var old = window.onload;
    window.onload = function() {
        if (old) { old(); }
        f();
    };
};
addLoadEvent(init);

What’s important to notice here is that init is passed to addLoadEvent as f and window.onload is redefined as a new function, capturing f in the closure. So now f holds init‘s original code (because functions are first-class in Javascript), and f, not the global init, is what is really executed at window.onload. As f is private (hidden by the closure), I can’t overwrite it.

Disabling init from the inside by “breaking” Javascript

The second thing I tried was to break init‘s code from the inside. The first thing init does is loop over the NodeList returned by document.getElementsByTagName('a'), so if I could get that function to return an empty array, that would kill init‘s functionality. Because Javascript is brilliantly flexible I can do just that:

// cache for safe keeping
document.gebtn_ = document.getElementsByTagName;
// "break" the native function
document.getElementsByTagName = function(tag) {
    if (tag != 'a') return document.gebtn_(a);

    // called with 'a' (probably from init)
    // "repair" this function for future use
    document.getElementsByTagName = document.gebtn_;
    // return init-busting empty array
    return [];
};

Simplest solution

While the code above works pretty well, I thought of a simpler, more elegant solution: just rewrite window.onload to what it was before playtagger.js was loaded.

And with that here is the final unpacked bookmarklet code:

javascript:(function () {
    if (window.Delicious && (Delicious.Mp3 || window.Mp3))
        return;
    var d = document
        ,s = d.createElement('script')
        ,wo = window.onload
        ,go = function () {
            Delicious.Mp3.go();
            window.onload = wo || null;
        }
    ;
    s.src = 'http://images.del.icio.us/static/js/playtagger.js';
    if (null === s.onreadystatechange) 
        s.onreadystatechange = function () {
            if (s.readyState == 'complete')
                go();
        };
    else 
        s.onload = go;
    d.body.appendChild(s);
})();