Bash script: recursive diff between remote hosts

This script will generate a recursive, unified diff between the same path on two remote servers. You set the CONNECT1 and CONNECT2 variables as necessary to point to your hosts/paths. Of course, the users you connect as must have read access to the files/directories you’re accessing.

#!/bin/bash

# USAGE: ./sshdiff DIRECTORY
#
# E.g. ./sshdiff static/css
# Generates a recursive diff between /var/www/css/static/css
# on two separate servers.

TEMP1=~/.tmp_site1
TEMP2=~/.tmp_site2

CONNECT1="user1@hostname1:/var/www/$1"
PORT1=22
CONNECT2="user2@hostname2:/var/www/$1"
PORT2=22

mkdir "$TEMP1"
mkdir "$TEMP2"
scp -rq -P "$PORT1" "$CONNECT1" "$TEMP1"
scp -rq -P "$PORT2" "$CONNECT2" "$TEMP2"
echo -e "\n\n\n\n\n"
clear
diff -urb "$TEMP1" "$TEMP2"
rm -r "$TEMP1"
rm -r "$TEMP2"

Magnetar on This American Life

Today I heard the tail end of a fantastic This American Life episode based on a ProRepublica story on the hedge fund Magnetar. The short version is that, when the housing market started to appear unstable in 2005, Magnetar realized that bad incentives at investment banks would allow it to make money by creating a resurgence of CDO investments in the housing market, prolonging the inflation of the housing bubble.

The bad incentives were that investment bankers were paid immediately in handsome fees for creating and selling these CDOs without the requirement of having “skin the game” when they might later became worthless. When the CDO collapses, the bank (and, through bailouts, the taxpayer) takes the loss, but the banker is long gone.

Magnetar made a name on buying the riskiest layer of CDOs, giving other investors the impression that they were a safe investment and causing the CDO market—and connected investments into the housing market—to take off again (when it could have otherwise returned to Earth slowly and less destructively). What was not made clear to CDO investors—and some say this constituted criminal activity by CDO managers—was the fact that Magnetar was simultaneously placing large bets against the same CDOs and actively encouraging banks to create them from much riskier loans.

I.e. Magnetar’s CDO investments were designed to allow future losses, but to temporarily pay the bills and give false information to the market, allowing Magnetar to later gain tremendously on its eventual downturn. Had the CDO creators and dealers passed on this information (by having incentive to do so), the investors would’ve realized these things were being designed to fail and stayed out of them.

Elgg, ElggChat, and Greener HTTP Polling

At my new job, we maintain a site powered by Elgg, the PHP-based social networking platform. I’m enjoying getting to know the system and the development community, but my biggest criticisms are related to plugins.

On the basis of “keeping the core light”, almost all functionality is outsourced to plugins, and you’ll need lots of them. Venturing beyond the “core” plugins—generally solid, but often providing just enough functionality to leave you wanting—is scary because generally you’re tying 3rd-party code into the event system running every request on the site. Nontrivial plugins have to provide a lot of their own infrastructure and this seems to make it more likely that you’ll run into conflict bugs with other plugins. With Elgg being a small-ish project, non-core plugins tend to end up not well-maintained, which makes the notion of upgrading to the latest Elgg version a bit scary when there have been API changes. Then there’s the matter of determining in what order your many plugins sit in the chain; order can mean subtle differences in processing and you just have to shift things around hoping to not break something while fixing something else. Those are my initial impressions anyway, and no doubt many other open source systems relying heavily on plugins have these problems. There’s a lot of great rope to hang yourself with.

Jeroen Dalsem’s ElggChat seems to be the slickest chat mod for Elgg. Its UI more or less mirrors Facebook’s chat, making it instantly usable. It’s a nice piece of work. Now for the bad news (as of version 0.4.5):

  • Every tab of every logged in user polls the server every 5 or 10 seconds. This isn’t a design flaw—all web chat clients must poll or use some form of comet (to which common PHP environments are not well-suited)—but other factors make ElggChat’s polling worse than it needs to be:
  • Each poll action looks up all the user’s friends and existing chat sessions and messages and returns all of that in every response. If the user had 20 friends, a table containing all 20 of them would be generated and returned every 5 seconds. The visible UI would also become unwieldy if not unusable.
  • The poll actions don’t use Elgg’s “action token” system (added in 1.6 to prevent CSRFs). This isn’t much of a security flaw, but in Elgg 1.6 it fills your httpd logs with “WARNING: Action elggchat/poll was called without an action token…” If you average 20 logged in users browsing the site, that’s 172,800 long, useless error log entries (a sea obscuring errors you want to see) per day. Double that if you’re polling at 5 seconds.
  • The recent Elgg 1.7 makes the action tokens mandatory so the mod won’t work at all if you’ve upgraded.
  • Dalsem hasn’t updated it for 80 days, I can’t find any public repo of the code (to see if he’s working on it), and he doesn’t  respond to commenters wondering about its future.

The thought of branching and fixing this myself is not attractive at the moment, for a few reasons (one of which being our site would arguably be better served by a system not depending on the Elgg backend, since we have content in other systems, too), but here are some thoughts on it.

Adding the action token is obviously the low hanging fruit. I believe I read Facebook loads the friends and status list only every 3 minutes, which seems reasonable. That would cut most of the poll actions down to simply maintaining chat sessions. Facebook’s solution to the friends online UI seems reasonable: show only those active, not offline users.

“Greener” Polling

Setting aside the ideal of comet connections, one of the worst aspects of polling is the added server load of firing up server-side code and your database for each of those extra (and mostly useless) requests. A much lighter mechanism would be to maintain a simple message queue via a single flat file, accessible via HTTP, for each client. The client would simply poll the file with a conditional XHR GET request and the httpd would handle this with minimal overhead, returning minimal 304 headers when appropriate.

In its simplest form, the poll file would just be an alerting mechanism: To “alert” a client you simply place a new timestamp in its poll file. On the next poll the client will see the timestamp change and immediately make another XHR request to fetch the new data from the server-side script.

Integrating this with ElggChat

In ElggChat, clicking a user creates a unique “chatsession” (I’m calling this “CID”) on the server, and each message sent is destined for a particular CID. This makes each tab in the UI like a miniature “room”, with the ability to host multiple users. You can always open a separate room to have a side conversation, even with the same user.

In the new model, before returning the CID to the sender, you’d update the poll files of both the sender and recipient, adding the CID to each. When the files are modified, you really need to keep only a subset of recent messages for each CID. Just enough to restore the chat context when the user browses to a new page. The advantage is, all the work of maintaining the chat sessions and queues is only done when posts are sent, never during the many poll requests.

Since these poll files would all be sitting in public directories, their filenames would need to contain an unguessable string associated with each user.

IE9 May Raise the Bar

Wow.

IE9 passing a bunch of tests

IE9 is coming, and it looks like it’ll get Microsoft back in the game. Full Developer Guide.

The Good: New standards supported, hardware-accelerated canvas, SVG, Javascript speed on par with the other browsers, preview installs side-by-side with IE.

The Bad: Not available on XP and no guarantee it will be. XP users will be stuck with, you know, all the other browsers that run their demos fine.

It could always be worse

OccasionallyVery infrequently, with help from my caffeine addiction and Intense Focus On Writing Awesome Code For Employers Who May Read This, empty Coke Zero cans will slowly accumulate in my vicinity. I couldn’t say how many. In the worst of times enough to not want to know how many.

This morning I stumbled across a 1995 photo of Netscape programmer Jamie Zawinski‘s cubicle, or the “Tent of Doom“.

pic of workstation with many soda cans

The awareness of more severe dysfunction in others is a cold comfort.

On the FairTax

I’ve not read the original FairTax book, and have only flipped through the follow-up written to answer the critics, but I have spent many hours reading about it online over the years, and back when I listened to Boortz of course he pushed it. At the moment, I don’t see it as workable and I think its rollout could be disastrously disruptive to the economy. Many of the goals and incentives set up by the FT are good, but there are a number of critiques I’ve read more recently that have not been adequately answered IMO.

Primarily, there’s Bruce Bartlett’s excellent piece “Why the FairTax Won’t Work” (pdf). This should be the first stop for those who’ve only read pro-FT literature. It begins with a decent description of the FT, but obviously it shouldn’t be the only thing you read about the FT to make an informed opinion.

Several folks commenting at the Fair Tax Blog make some compelling arguments against the FT, including weighing in on Bartlett’s critique. Many agree with him that a VAT would be a better consumption tax. This post has some lively discussion worth reading.

A lot of the selling points of the FT just seem too good to be true:

  • You keep 100% of your paycheck. This is the most obvious deception—a mechanical truth of the FT system with the emotional appeal of effectively raising your income. Of course, your paycheck would be either be smaller or your expenses larger, too. With the FT’s guarantee of being revenue-neutral, the FT cannot be a win for everyone, and when you start to look at who would greatly benefit from it, it should be obvious who the losers will be. This is not to say that everyone’s current tax level is just, but this particular line of rhetoric seems targeted towards the middle class, who I think would end up paying more under the FT. And retirees living off savings—having already been taxed on earnings—will be taxed again to get by.
  • It’s under 200 pages. Does anyone really believe that suddenly Congress would just have no way of cutting breaks for special interests? The problems of loopholes, unfairness, and the ballooning of the tax code is due to the people who amend it, and the FT won’t replace them. With not even most Republicans willing to touch it, getting a FT through Congress would take a number of sweetheart deals right off the bat and probably provisions making it easier to tamper with going forward. Remember there would be $485B going to citizens yearly in “prebate” checks, and Congress would determine who gets what; more room for deal-cutting.
  • No more IRS! For those who strongly believe federal taxation is out-of-control, the notion of sticking it to the IRS will sound satisfying, but if a national consumption tax became the sole source of revenue for the federal government, you’d better believe it would build a new, huge bureaucracy to ensure compliance. Also, since the FT would no longer allow state and local governments tax-free purchasing, the states would likely need to jack up their income taxes to compensate.

Bartlett’s piece really is well-researched and a must-read, and if you know of a serious critique that takes on his arguments head-on, I’d love to read it.

It wasn’t torture when America did it.

Suuure. Salon’s Mark Benjamin on what our last Vice President has described as “a dunk in the water”. Disturbing.

Also looks like Obama’s (unsurprisingly) caving on civilian trials. Nothing says “rule of law” like pre-trial torture sessions and determining location and rule of court by political theater. KSM may be a mass murdering bastard, but aren’t we supposed to be better than regimes who use show trials? Don’t we usually sanction or invade regimes like that?