String Subtypes for Safer Web Programming

Valid HTML markup involves several different contexts and escaping rules, yet many APIs give no precise indication of which context their string return values are escaped for, or how strings should be escaped before being passed in (let’s not even get into character encoding). Most programming languages only have a single String type, so there’s a strong urge to document function with @param string and/or @return string and move on to other work, but this is rarely sufficient information.

Look at the documentation for WordPress’s get_the_title:

Returns

(string) 
Post title. …

If the title is Stan "The Man" & Capt. <Awesome>, will & and < be escaped? Will the quotes be escaped? “string” leaves these important questions unanswered. This isn’t meant to slight WordPress’s documentation team (they at least frequently give you example code from which you can guess the escaping model); the problem is endemic to web software.

So for better web security—and developer sanity—I think we need a shared vocabulary of string subtypes which can supply this missing metadata at least via mention or annotation in the documentation (if not via actual types).

Proposed Subtypes and Content Models

A basic set of four might help quite a bit. Each should have its own URL to explain its content model in detail, and how it should be handled:

Unescaped
Arbitrary characters not escaped for HTML in any way, possibly including nulls/control characters. If a string’s subtype is not explicit, for safety it should be assumed to contain this content.
Markup
Well-formed HTML markup matching the serialization of a DocumentFragment
TaglessMarkup
Markup containing no literal less-than sign (U+003C) characters (e.g. for output inside title/textarea elements)
AttrValue
TaglessMarkup containing no literal apostrophe (U+0027) or quotation mark (U+0022) characters, for output as a single/double-quoted attribute value

What would these really give us?

These subtypes cannot make promises about what they contain, but are rather for making explicit what they should contain. It’s still up to developers to correctly handle input, character encoding, filtering, and string operations to fulfill those contracts.

The work left to do is to define how these subtypes should be handled and in what contexts they can be output as-is, and what escaping needs to be applied in other contexts.

Obvious Limitations

For the sake of simplicity, these subtypes shouldn’t attempt to address notions of input filtering or whether a string should be considered “clean”, “tainted”, “unsafe”, etc. A type/annotation convention like this should be used to assist—not replace—experienced developers practicing secure coding methods.

RotURL: Rot13 for URLs

RotURL is a simple substitution cipher for encoding/obscuring URLs embedded in other URLs (e.g. in a querystring). Also, common chars that need to be escaped (:/?=&%#) are mapped to infrequently used capital letters, so this generally yields shorter querystrings, too.

/**
 * Rot35 with URL/urlencode-friendly mappings. To avoid increasing size during
 * urlencode(), commonly encoded chars are mapped to more rarely used chars.
 */
function rotUrl($url) {
    return strtr($url,
        './-:?=&%# ZQXJKVWPY abcdefghijklmnopqrstuvwxyz123456789ABCDEFGHILMNORSTU',
        'ZQXJKVWPY ./-:?=&%# 123456789ABCDEFGHILMNORSTUabcdefghijklmnopqrstuvwxyz');
}

rotUrl('https://en.wikipedia.org/w/index.php?title=Special%3ASearch&search=Base64#foo')
    == '8MMGLJQQ5EZR9B9G5491ZFI7QRQ9E45SZG8GKM9MC5VxG5391CPcjx51I38WL51I38Vk1L5fdY6FF';
rotUrl(rotUrl($anyUrl)) = $anyUrl;

You could save a few more bytes by encoding the schema (e.g. “h” for http://, “H” for https://). Since your end encoding has to be URL-safe, there’s not much you can do beyond this to compress a URL embedded in a URL.

Validate Private Page Bookmarklet

ValidatePrivatePage <– validates in current window

ValidatePrivatePage <– validates in new window (your pop-up blocker may complain)

If you need to validate the markup of a page that’s not public (e.g. on localhost), you can now use this bookmarklet to auto-submit the current page source to the validator (instead of viewing source, copying, opening the validator, pasting in, and pressing “check”).

Note: this gets the page source making an XMLHTTPRequest to the current URL, so it does not get interpreted by the browser; i.e. this is NOT based on innerHTML(). If the request made returns a different page (e.g. you were logged out in the meantime), that page’s source will be sent to the validator. Not much can be done about that. I once wrote a crusty PHP4 class/bookmarklet combo that helped do this, but thanks to the standardization of XMLHTTPRequest, this is easy in JS now. You should also thank W3C for allowing cross-domain POSTs to the validator :)

NetBeans Love & Hate

For those cases where you have to work on remote code, NetBeans‘ remote project functionality seems to put it ahead of other PHP IDEs. It pulls down a tree of files and uploads files that you save. Having a local copy allows it to offer its full code comprehension, auto-complete, and great rename refactoring for “remote” code. In contrast Eclipse allows you to open remote files using Remote System Explorer, but you only get PHP syntax highlighting, not the excellent PDT.

But NetBeans is not all smiles and sunshine. Continue reading  

Helping Netbeans/PhpStorm with Autocomplete/Code-hinting

Where Netbeans can’t guess the type/existence of a local variable, you can tell it in a multiline comment:

/* @var $varName TypeName */

After this comment (and as long as TypeName is defined in your project/project’s include path), when you start to type $varName, Netbeans will offer to autocomplete it, and will offer TypeName method/property suggestions. If you rename the variable with Ctrl+r (rename refactoring), Netbeans will change the comment, too.

I usually forget this syntax because type comes first in @param declarations.

Update: PhpStorm supports a similar syntax, but reversing the type and variable name:

/* @var TypeName $varName */

A Case Against Google+

Google+ will fit some people really well, and is certainly bringing some fresh ideas to the table to keep Facebook on its toes. That said, here’s why I kinda hope it doesn’t take off, and why I’m seriously considering leaving the party early.

  1. There were compelling reasons to abandon Friendster and MySpace at their peaks; frustrating performance, bugs, spam, bad UIs, visual nonsense, etc. Facebook seems to be scaling quite gracefully and they seem to constantly improve rather than frustrate.
  2. My main issue with Facebook would be privacy, but I find it highly unlikely that, in the long run,  Google+ or any other ad-supported social network will be a better steward of our personal information. That train goes in one direction.
  3. Establishing yet another silo of social network identities will cost us a huge amount of collective time.
  4. Since Google+ “circles” nearly remove all social cost from forming weak relationships (“just dump them in acquaintances”) we could be talking about a lot more time spent managing relationships. Circles may be the killer feature that we later really regret embracing.
  5. Prominent Google+ notifications appear at the top of all Google tools, and I couldn’t find a way to hide them without, say, keeping a separate account. With social networks being brilliant delivery mechanisms for dopamine; offering that hit while in Gmail, Docs, Calendar, and Search is going to be a disaster for a lot of people’s productivity.

Bookmarklet: Horizontally invert HTML5 videos

My demands for “reverse” glasses have gone unserved, but I made a bookmarklet that provides the same effect: “flopping” a video horizontally.

  1. Install the SwitchStance bookmarklet, for which you’ll need a modern browser that supports CSS transforms on video elements.
  2. Opt-in to YouTube’s HTML5 trial
  3. Load up any video without ads (here’s one of Matt Hensley skating)
  4. While the video plays, click the bookmarklet.

The video will mirror and you’ll see Hensley, a regular-footed skater, now skating goofy foot (in “switch stance“). Or you can get Paul McCartney to play guitar right-handed.