Elgg, ElggChat, and Greener HTTP Polling

At my new job, we maintain a site powered by Elgg, the PHP-based social networking platform. I’m enjoying getting to know the system and the development community, but my biggest criticisms are related to plugins.

On the basis of “keeping the core light”, almost all functionality is outsourced to plugins, and you’ll need lots of them. Venturing beyond the “core” plugins—generally solid, but often providing just enough functionality to leave you wanting—is scary because generally you’re tying 3rd-party code into the event system running every request on the site. Nontrivial plugins have to provide a lot of their own infrastructure and this seems to make it more likely that you’ll run into conflict bugs with other plugins. With Elgg being a small-ish project, non-core plugins tend to end up not well-maintained, which makes the notion of upgrading to the latest Elgg version a bit scary when there have been API changes. Then there’s the matter of determining in what order your many plugins sit in the chain; order can mean subtle differences in processing and you just have to shift things around hoping to not break something while fixing something else. Those are my initial impressions anyway, and no doubt many other open source systems relying heavily on plugins have these problems. There’s a lot of great rope to hang yourself with.

Jeroen Dalsem’s ElggChat seems to be the slickest chat mod for Elgg. Its UI more or less mirrors Facebook’s chat, making it instantly usable. It’s a nice piece of work. Now for the bad news (as of version 0.4.5):

  • Every tab of every logged in user polls the server every 5 or 10 seconds. This isn’t a design flaw—all web chat clients must poll or use some form of comet (to which common PHP environments are not well-suited)—but other factors make ElggChat’s polling worse than it needs to be:
  • Each poll action looks up all the user’s friends and existing chat sessions and messages and returns all of that in every response. If the user had 20 friends, a table containing all 20 of them would be generated and returned every 5 seconds. The visible UI would also become unwieldy if not unusable.
  • The poll actions don’t use Elgg’s “action token” system (added in 1.6 to prevent CSRFs). This isn’t much of a security flaw, but in Elgg 1.6 it fills your httpd logs with “WARNING: Action elggchat/poll was called without an action token…” If you average 20 logged in users browsing the site, that’s 172,800 long, useless error log entries (a sea obscuring errors you want to see) per day. Double that if you’re polling at 5 seconds.
  • The recent Elgg 1.7 makes the action tokens mandatory so the mod won’t work at all if you’ve upgraded.
  • Dalsem hasn’t updated it for 80 days, I can’t find any public repo of the code (to see if he’s working on it), and he doesn’t  respond to commenters wondering about its future.

The thought of branching and fixing this myself is not attractive at the moment, for a few reasons (one of which being our site would arguably be better served by a system not depending on the Elgg backend, since we have content in other systems, too), but here are some thoughts on it.

Adding the action token is obviously the low hanging fruit. I believe I read Facebook loads the friends and status list only every 3 minutes, which seems reasonable. That would cut most of the poll actions down to simply maintaining chat sessions. Facebook’s solution to the friends online UI seems reasonable: show only those active, not offline users.

“Greener” Polling

Setting aside the ideal of comet connections, one of the worst aspects of polling is the added server load of firing up server-side code and your database for each of those extra (and mostly useless) requests. A much lighter mechanism would be to maintain a simple message queue via a single flat file, accessible via HTTP, for each client. The client would simply poll the file with a conditional XHR GET request and the httpd would handle this with minimal overhead, returning minimal 304 headers when appropriate.

In its simplest form, the poll file would just be an alerting mechanism: To “alert” a client you simply place a new timestamp in its poll file. On the next poll the client will see the timestamp change and immediately make another XHR request to fetch the new data from the server-side script.

Integrating this with ElggChat

In ElggChat, clicking a user creates a unique “chatsession” (I’m calling this “CID”) on the server, and each message sent is destined for a particular CID. This makes each tab in the UI like a miniature “room”, with the ability to host multiple users. You can always open a separate room to have a side conversation, even with the same user.

In the new model, before returning the CID to the sender, you’d update the poll files of both the sender and recipient, adding the CID to each. When the files are modified, you really need to keep only a subset of recent messages for each CID. Just enough to restore the chat context when the user browses to a new page. The advantage is, all the work of maintaining the chat sessions and queues is only done when posts are sent, never during the many poll requests.

Since these poll files would all be sitting in public directories, their filenames would need to contain an unguessable string associated with each user.

IE9 May Raise the Bar

Wow.

IE9 passing a bunch of tests

IE9 is coming, and it looks like it’ll get Microsoft back in the game. Full Developer Guide.

The Good: New standards supported, hardware-accelerated canvas, SVG, Javascript speed on par with the other browsers, preview installs side-by-side with IE.

The Bad: Not available on XP and no guarantee it will be. XP users will be stuck with, you know, all the other browsers that run their demos fine.

It could always be worse

OccasionallyVery infrequently, with help from my caffeine addiction and Intense Focus On Writing Awesome Code For Employers Who May Read This, empty Coke Zero cans will slowly accumulate in my vicinity. I couldn’t say how many. In the worst of times enough to not want to know how many.

This morning I stumbled across a 1995 photo of Netscape programmer Jamie Zawinski‘s cubicle, or the “Tent of Doom“.

pic of workstation with many soda cans

The awareness of more severe dysfunction in others is a cold comfort.

On the FairTax

I’ve not read the original FairTax book, and have only flipped through the follow-up written to answer the critics, but I have spent many hours reading about it online over the years, and back when I listened to Boortz of course he pushed it. At the moment, I don’t see it as workable and I think its rollout could be disastrously disruptive to the economy. Many of the goals and incentives set up by the FT are good, but there are a number of critiques I’ve read more recently that have not been adequately answered IMO.

Primarily, there’s Bruce Bartlett’s excellent piece “Why the FairTax Won’t Work” (pdf). This should be the first stop for those who’ve only read pro-FT literature. It begins with a decent description of the FT, but obviously it shouldn’t be the only thing you read about the FT to make an informed opinion.

Several folks commenting at the Fair Tax Blog make some compelling arguments against the FT, including weighing in on Bartlett’s critique. Many agree with him that a VAT would be a better consumption tax. This post has some lively discussion worth reading.

A lot of the selling points of the FT just seem too good to be true:

  • You keep 100% of your paycheck. This is the most obvious deception—a mechanical truth of the FT system with the emotional appeal of effectively raising your income. Of course, your paycheck would be either be smaller or your expenses larger, too. With the FT’s guarantee of being revenue-neutral, the FT cannot be a win for everyone, and when you start to look at who would greatly benefit from it, it should be obvious who the losers will be. This is not to say that everyone’s current tax level is just, but this particular line of rhetoric seems targeted towards the middle class, who I think would end up paying more under the FT. And retirees living off savings—having already been taxed on earnings—will be taxed again to get by.
  • It’s under 200 pages. Does anyone really believe that suddenly Congress would just have no way of cutting breaks for special interests? The problems of loopholes, unfairness, and the ballooning of the tax code is due to the people who amend it, and the FT won’t replace them. With not even most Republicans willing to touch it, getting a FT through Congress would take a number of sweetheart deals right off the bat and probably provisions making it easier to tamper with going forward. Remember there would be $485B going to citizens yearly in “prebate” checks, and Congress would determine who gets what; more room for deal-cutting.
  • No more IRS! For those who strongly believe federal taxation is out-of-control, the notion of sticking it to the IRS will sound satisfying, but if a national consumption tax became the sole source of revenue for the federal government, you’d better believe it would build a new, huge bureaucracy to ensure compliance. Also, since the FT would no longer allow state and local governments tax-free purchasing, the states would likely need to jack up their income taxes to compensate.

Bartlett’s piece really is well-researched and a must-read, and if you know of a serious critique that takes on his arguments head-on, I’d love to read it.

It wasn’t torture when America did it.

Suuure. Salon’s Mark Benjamin on what our last Vice President has described as “a dunk in the water”. Disturbing.

Also looks like Obama’s (unsurprisingly) caving on civilian trials. Nothing says “rule of law” like pre-trial torture sessions and determining location and rule of court by political theater. KSM may be a mass murdering bastard, but aren’t we supposed to be better than regimes who use show trials? Don’t we usually sanction or invade regimes like that?

Cheers for Ruins

If you too love urban ruins, Opacity.us has awesome photography of abandoned hospitals, churches, libraries, factories—you name it—with all structures well categorized and documented. Don’t miss the wallpaper.

A decent Gainesville find is in the woods directly South of where Shealy Dr. ends. My coworkers and I used to walk the mile along Shealy Dr. and Ritchy Rd. every day and I finally convinced them it was a Great Idea to explore the trail into the woods where Shealy Dr. ends. It runs South along the edge of the open field and down to Bivens Arm, but if you head Southwest when you see water you’ll come across an abandoned brick house in the middle of the woods. The roof is caved in and kids have spray painted pentagrams and nonsense on the walls, but still awesome. A trail also heads West along the North side of Bivens Arm.

I just got life insurance; who’s up for some misguided exploring and documentation of “Satan House”?

Guitar Tuning By Ear

(This is edited from an answer I posted to KeyMinor, a music Q&A site. They need more users!)

A lot of recordings end up slightly higher/lower than standardized pitch (I always called this “in the cracks” but don’t google it!), and this is the quickest way I’ve found to tune a guitar to them.  This method seems simplistic but has several advantages going for it:

  • All strings are tuned to the same reference pitch, so there’s no accumulation of error as you move from string to string (and you can tune any of the top 5 in any order).
  • The higher the frequency, the easier it is to notice the beat when two identical notes differ slightly in pitch. Matching high pitches yields lower error.
  • All pitch matching is done with notes in the same octave; this also helps you notice the beat

Here it is. Continue reading  

SQL Server Mgmt Studio Can’t Export CSV

I’m trying to export large (e.g. 16M rows) SQL Server 2008 views for eventual import into MySQL. The underlying setup for these views is bizarre: A SELECT for a single row can take over two minutes, a SELECT * FROM view with no WHERE clause is much faster, but still too slow to finish an export in any client we’ve connected with. The only reasonable solution I’ve found is to use SQL Server Management Studio, right-click the DB > Tasks > Export Data… and target a flat file. This seriously seems about 100x as fast as a direct query in SSMS and 200x as fast as an ODBC connection.

There’s only one (major) gotcha: The flat file writer in SQL Server Management Studio (ver 10.0.2531.0) is broken for CSV output. There’s no configuration I can find that will correctly escape characters that need to be. By default it doesn’t enclose any values in quotes, but even if you specify the double quote as “Text Qualifier” (yay for obscure proprietary descriptions), double quotes within values won’t be escaped in any way. Not with \" nor "" (the CSV standard). This means you can either export fixed-width columns, or try to find chars that never exist in your data to use as field/line delimiters. The escape char \ is also not escaped, so you could end up with a line feed character in your import when the original data was literally “foo\road”. Stupid.

Again, I can’t just select all rows in the tool and then save the resulting recordset as CSV (which isn’t broken) because the views are so big that the queries time out (the resultsets would probably crash the tool anyway) and because there’s no way to partition the query that doesn’t destroy performance. I’m going to have to create a local SQL Server DB and do a DB to DB transfer, then use a client connection to export.

Update: You can at least use obscure Unicode chars as delimiters, though it outputs invalid UTF-8 chars after the first row. You have to export as “Unicode” (UTF-16 little-endian. 2-byte chars with a leading \xFF\xFE BOM). If you pick delimiters like ʖ and ʁ and go this path, you won’t be able to use traditional line-by-line file reading functions. You’ll have to stream a fixed number of bytes at a time, building up columns (and validating that the number of columns/row stays constant) and converting to rational CSV that can be handled by MySQL. Another vote for setting up a local SQL Server instance.

Bash: unbom (to remove UTF-8 BOMs)

Tests for and removes UTF8 BOMs.

#!/bin/bash
for F in $1
do
  if [[ -f $F && `head -c 3 $F` == $'\xef\xbb\xbf' ]]; then
      # file exists and has UTF-8 BOM
      mv $F $F.bak
      tail -c +4 $F.bak > $F
      echo "removed BOM from $F"
  fi
done

USAGE: ./unbom *.txt

The magic is tail -c +4 which strips the first 3 bytes.