Plugin-based Systems (and Events)

Modern application design is solved! OK, well we at least have a set of camps with their own principles, tools, and paradigms leading us toward the light. One might be “build some components, wire them up with references to each other, and let them talk to each other.” We love/hate static typing, but it allows tools to reason about really large programs.

All that is great, but I feel like plugin system design is in the wild west. To be clear, I mean dropping a new directory of code in place, performing some ritual, and the system behaving radically differently. Of course, we’re getting better at this as apps learn from each other, but I feel like our principles and tools aren’t particularly well matched to this goal of plugins being able to cooperate on a deep level to build up a system.

Allowing plugins to provide APIs introduces a big set of challenges. Just a few:

  • How do we conditionally use API’s if they’re available?
  • How can a plugin decorate/filter the API’s output?
  • How can a plugin replace the API with its own?
  • If two plugins want to replace the API, which “wins”?
  • How can a plugin remove part of the behavior/API of another plugin?
  • How much can plugins detect about each other?

Most robust plugin systems solve these with event systems and some mechanism of ordering plugins to solve disputes (all unique).

A big problem is that events are almost always run-time linking based on strings. Hence it’s very difficult for tools and humans to reason about which listeners will be called, in what order, and what data will flow to each listener and be returned to the dispatching party. Ideally IDEs could sense all this stuff, and help wire up new listeners and events. Instead, devs on a plugin-heavy system must do string searching on event names or at best use some reporting tool built into the event system.

Symfony’s typed event objects and Ruby’s symbols probably help here. Drupal has a strong convention both in code and docs that helps, and is popular enough for toolmakers to focus on it. Middleware systems as in Express (node) and Stack (PHP) offer a formal way to compose applications, which is pretty exciting, but I’m not sure they solve any of the above problems of plugins tightly collaborating on processes.

What’s the future look like here, and what’s the way toward it? Standardizing on a single event system seems unlikely, but what are the best and most powerful ones out there? What language features would make this easier? Can someone stop me from diving deep into Event-driven programming literature?

Elgg’s Path Forward

Like many older PHP projects, Elgg has lots of problems with tight coupling, procedural patterns, and untestability; and has a very web 1.9 model: spit out full page, add a little Ajax. The good news is that Elgg has a ton of great functionality and ideas embedded in that mess, we have a core team which often can find agreement about dev principles and goals, and we have a new schedule-based release process that ensures that hard work going into the product makes it to release more quickly.

Lately I feel like the Elgg core team is excitedly gearing up for a long hike, during which we’ll make tons of hard decisions and churn lots of code remolding Elgg to look more like a modern JavaScript + PHP API framework.

I’m not sure I want to make that hike.

My suspicion is there’s a shorter route around the mountain; some modern framework may be out there whose team has already put in the hard effort of building something close to what we’re looking forward to. I think the time it would take us to get there would be long and filled with tons of wheel-rebuilding—work that won’t be going into improving UX and which provides no cross-project knowledge gain for Elgg devs.

I’m also wondering if we would be wise to ignore our itching about back end code quality for a bit and focus all attention on the front end and on UI/UX problems. As a plugin developer, I certainly see back end design choices that cause problems, but they’re rarely blockers. I spend a lot more time improving the UX and dealing with our incomplete Ajax implementation. The jewel of the 1.9 release isn’t going to be the dependency container and PSR-0 compatible autoloading; it’ll be the responsive Aalborg theme.

For me, back end refactoring work is fun because it’s relatively easy. You’re changing the way the pieces snap together, not necessarily making them work better or solving new problems. It also keeps me in the comfort zone of working mostly with code and people I’m already familiar with. This is OK for a little while but doesn’t push me to grow.

This isn’t to imply that the core team is infected with Not-invented-here. We definitely want to replace as much home-grown code as possible with well-tested alternatives maintained elsewhere. It’s just a hunch I have that this will be a long process involving tons of decisions that have already been made somewhere else.

I’m still having a lot of fun developing for and in Elgg, but I’m itching to pick up something new, and to work in a system that’s already making good use of and establishing newer practices. Hitching Elgg to another project’s wagon seems adventurous.

I also have to vent that the decision to maintain support for PHP 5.2—a branch that ended long-term support 3.5 years ago—during 1.9.x seems disastrously wrong. 1.9 had a long development process during which a significant amount of high-quality, highly-tested, and actively maintained community code was off-the-table because it wasn’t 5.2 compatible. We had to port some things to 5.2 and fix the resulting bugs, and some unit tests are a mess without Closures; just a huge waste of time. Nor could we benefit from the work being done on Drupal or WordPress because both are GPL, as are a lot of other older PHP projects with 5.2-compatible code. PHP 5.2 is still expressive enough to solve most problems without namespaces, Closures, et al., but in 2014 devs don’t want to code with hands tied behind their backs to produce less readable code that will soon have to be refactored. /rant

Get rid of variable variable syntax

Uniform Variable Syntax was voted in (almost unanimously) for PHP6 PHP7 and introduces a rare back compatibility break, changing the semantics of expressions like these:

                           // old meaning            // new meaning
1. $$foo['bar']['baz']     ${$foo['bar']['baz']}     ($$foo)['bar']['baz']
2. $foo->$bar['baz']       $foo->{$bar['baz']}       ($foo->$bar)['baz']
3. $foo->$bar['baz']()     $foo->{$bar['baz']}()     ($foo->$bar)['baz']()
4. Foo::$bar['baz']()      Foo::{$bar['baz']}()      (Foo::$bar)['baz']()

IMO the “variable variable” syntax $$name is a readability disaster that we should get rid of. ${$name} is much clearer about what this is (dynamic name lookup) and what it’s doing: $ is the symbol table, and {$name} tells us that we’re finding an entry in it under a key with the value of $name. PHP should deprecate $$name syntax and emit a notice in the next minor version.

It should also deprecate the syntax ->$name and ::$$name. Both are bad news.

Doing so would completely eliminate the first 3 ambiguous expressions above, and the new warning emitted would call out code that would otherwise silently change meaning in PHP6 (one big negative about this RFC).

As for the fourth, Consider these expressions:

A. $foo->prop
B. $foo->prop()
C. $foo->prop['key']()
D. Foo::$prop
E. Foo::$prop()
F. Foo::$prop['key']()

To the chagrin of JavaScript devs, PHP will not let you reference a dynamic property in an execution context, so expression B will try to call a method “prop”, (and fatal if it can’t). For consistency, PHP should fatal on expression E. Currently PHP just does something completely unexpected and insane, looking up a local variable $prop.

Expression C does what you’d expect, accessing the property named “prop”, so expression F should do the same, which is why I think the RFC is a clear step forward at least.

Unpacking the Access Control Systems in Drupal and Elgg

Elgg’s access control system, which determines what content a user can view, is somewhat limited and very opinionated, with several use cases—access control lists, friends—baked into the core system. In hopes of making this cleaner and more powerful, I’ve been studying Drupal’s access system. (Caveat: My knowledge in this area of Drupal comes mainly from reading code, schema, docs, and two great overviews by Mike Potter and Larry Garfield, so please chime in if I run off the rails.) 

Drupal

Drupal’s system also influences update and delete permissions, but here I’m only interested in the “view” permission. Also, although Drupal has hook_node_access()—a procedural calculation of permissions for a node (like an Elgg entity) already in memory—I’m focusing on the systems that craft SQL conditions to fetch only nodes visible to the user. This is critical to get right in the SQL, because if your access control relies on code, you can never predict the number of queries required to generate a list for browsing. In this area, Drupal’s realms/grants API (hook_node_grants()is extremely powerful.

Realms and Grants in a Nutshell

At a particular time, a user exists in zero or more “realms”; more or less arbitrary labels which may be based on user attributes, roles, associations, the current system state, time…anything. Each realm has been granted (via DB rows) the permission to view individual nodes. So to query, we build up a user’s list of realms, this is baked into the query, and the DB returns nodes matching at least one realm.

E.g., at 2:30 PM today, an anonymous visitor might be in the realms (public, time_afternoon, season_winter*), whereas Mary, who logged in, might exist in the realms (public, logged_in, user_123, friendedby_345, role_developer, team_A, is_over_30, time_afternoon, season_winter). So Mary will likely see more nodes because her queries provide more opportunity to match grant rows. *Note these realms are made up examples.

Clearly this is very expressive, but Drupal (maybe for better) doesn’t provide many features out-of-the-box, so (maybe for worse) doesn’t build in many realms; the API is mostly a framework for implementing an access control system on top of added features. Contrib modules appear up to the task of providing realms based on all kinds of things (groups, taxonomies, associations with particular nodes), but it’s hard to collaboratively build an access control system, so these modules apparently don’t work well with each other and non-access modules must be careful to tap into the appropriate systems to keep nodes protected.

(Implementation oddities: The grants are done in the node_access table, which probably should’ve been called “node_grants”, especially because this table is only somewhat related to the hook called “node_access“. Less seriously—depending on your system size—each realm name (VARCHAR) is duplicated for every node/realm combination, so there’s some opportunity for normalization.)

Elgg

If you squint, Elgg’s system is a bit similar. Each entity has an “access level” (a realm), with values like “public”, “logged in”, “private”, “friends”, or values representing access control lists (a group or a subset of your friends like a Google+ circle).

That an entity can have only one realm is of course the biggest (and most painful) difference, but also the implementation is significantly complicated by some realms needing to map to different tables. E.g. Elgg has to ensure “friends” maps to rows in an entities relationship table based upon the owner of the entity, while also mapping to the ACL table.

I imagine a lot of these differences come from Drupal being old as time with a lot bigger API reboots, and because Elgg’s access system was targeted to meet the needs of features like friends and user groups, which were built-in from the beginning. It’s hard to predict which schema results in faster queries, and will depend on the use case, but Drupal queries I would guess are easier to generate and safer to alter.

Conclusions

I think in the long run Elgg would be wise to adopt a realms/grants schema, though I would probably suggest normalizing with a separate “realms” table to hold the name and other useful bits. Elgg group ACLs and friend collections would map directly into realms, but friend relationships would need to be duplicated into realms just like groups have an ACL distinct from the membership relationship. Really I think a grants table could completely replace Elgg’s “entity_relationships” table, since both tables just map one entity to others with a name.

As for Drupal, I think the docs could more clearly describe realms and grants (unless I’ve totally got this wrong). I’m less sure of the quality of the API that populates/maintains the tables; it looks like the hooks are pretty low-level ways of asking “would you like to dump some rows into node_access?” and it’s not clear how much of the table must be rebuilt or how often this happens.

Character Encoding Bug of the Day

Today I had one of those bugs that starts out looking simple and keeps going deeper and deeper. Video service Kaltura has a plugin for Moodle, that just stopped working one day (no changes on the server).

  • It’s throwing an exception because an expected element isn’t in the page.
  • Oh, the element’s supposed to be delivered by XHR from the plugin.
  • But the plugin’s code is generating correct markup…
  • Why is Moodle’s function to serialize an array into a JS function call returning null for that markup?
  • json_encode is converting the markup string to null?
  • Because json_encode is choking on invalid UTF-8.
  • Because the markup has a right single quotation encoded in Windows-1252 :(
  • And that string is coming from the Kaltura API.

So over 2 years ago someone named a video player Jim’s Test Player and over the weekend Kaltura’s API started returning that single quote in Windows-1252. We removed the quote from the name and the problem disappeared.

Installing xhprof on XAMPP for OSX Lion

Directions adapted from Ben Buckman.

Download xhprof.

cd path/to/xhprof/.../extension

# If you don't have autoconf... I didn't.
sudo chmod o+w /usr/local/bin  #(brew needs to write a symlink there)
brew install autoconf

sudo /Applications/XAMPP/xamppfiles/bin/phpize

Make sure you have a CLI C compiler. I installed one via XCode.

sudo MACOSX_DEPLOYMENT_TARGET=10.6 CFLAGS='-O3 -fno-common -arch i386 -arch x86_64' LDFLAGS='-O3 -arch i386 -arch x86_64' CXXFLAGS='-O3 -fno-common -arch i386 -arch x86_64' ./configure --with-php-config=/Applications/XAMPP/xamppfiles/bin/php-config-5.3.1

sudo make

sudo make install

sudo nano /Applications/XAMPP/xamppfiles/etc/php.ini

Add these lines to php.ini:

[xhprof]
extension = xhprof.so
xhprof.output_dir = "/Applications/XAMPP/xhprof-logs"

Restart Apache.

Decouple User and App State From the Session

When building software components, you want them to be usable in any situation, and especially inside other web apps. The user of your component should be able to inject the user ID and other app state without side effects, and similarly be able to pull that state out after use.

That’s why, beyond all the standard advice (GRASP SOLID), you’d be wise to avoid tight coupling with PHP Sessions.

PHP’s native1 session is the worst kind of singleton.

It’s accessible everywhere; modifiable by anyone; and, because there’s no way to get the save handlers, there’s no way to safely close an open session, use it for your own purposes, and reopen it with all the original state. Effectively only one system can use it while it’s open.

The Take Home

  • Provide an API to allow injecting/requesting user and application state into/from your component.
  • Isolate session use in a “state persister” subcomponent that can be swapped out/disabled, ideally at any time during use.

1Shameless plug: I created UserlandSession, as a native session replacement. You can use multiple at the same time, and there’s no global state2 nor visibility. This is not to suggest that you use it in place of native sessions, but it’s available in a pinch or for experimentation.

2Yes, I know cookies and headers are global state. UserlandSession is not meant to solve all your problems, pal.

In Support of Bloated, Heavyweight IDEs

I’ve done plenty of programming in bare-bones text editors of all kinds over the years. Free/open editors were once pretty bad and a lot of capable commercial ones have been expensive. Today it’s still handy to pop a change into Github’s web editor or nano. Frankly, though, I’m unconvinced by arguments suggesting I use text editors that don’t really understand the codeI believe that, independent of your skill level, you’ll produce better code and faster by using the most powerful IDE you can get your hands on.

To convince you of this, I’ll try to show how each of the following list of features, in isolation, is good for your productivity. Then it should follow that each one you work without will be lowering it. Also it’s important to note that leaving in place or producing bugs that must be fixed later, or that create external costs, reduces your real productivity, and “you” in the list below could also mean you six months from now or another developer. The list is not in any particular order, just numbered for reference.

  1. Syntax highlighting saves you from finding and fixing typos after compilation failures. In a language where a script file may be conditionally executed, like PHP, you may leave a bug that will have to be dug up by someone else after costing end users a lot of time. In rarer cases the code may compile but not do what you expected, costing even more time. SH also makes the many contexts available in code (strings, functions, vars, comments, etc.) significantly easier to see when scanning/scrolling.
  2. Having a background task scan the code can help catch errors that simple syntax highlighting cannot, since most highlighters are designed to expect valid syntax and may not show problems.
  3. Highlighting matching braces/parenthesis eases the writing and reading of code and expressions.
  4. IDEs can show the opening/closing lines of blocks that appear offscreen without you needing to scroll. Although long blocks/function bodies can be a signal to refactor, this can aid you in working on existing code like this.
  5. Highlighting the use of unknown variables/functions/methods can show false positives for problems, but more often signals a bug that’s hidden from sight: E.g. a variable declared above has been removed; the type of a variable is not what is expected; a library upgrade has removed a method, or a piece of code has been transplanted from another context without its dependencies. Missing these problems has a big future cost as these may not always cause compile or even runtime errors.
  6. Highlighting an unused variable warns you that it isn’t being used how it was probably intended. It may uncover a logic bug or mean you can safely remove its declaration, preventing you from later having to wonder what it’s for.
  7. Highlighting the violation of type hints saves you from having to find those problems at compile or run-time.
  8. Auto-completing file paths and highlighting unresolvable ones saves you from time-consumingly debugging 404 errors.
  9. Background scanning other files for problems (applying all the above features to unopened project files) allows you to quickly see and fix bugs that you/others left. Simply opening an existing codebase in a more capable IDE can reveal thousands of certain/potential code problems. If you’re responsible for that code, you’ve potentially saved an enormous amount of time: End users hitting bugs, reporting them, you reading, investigating, fixing, typing summaries, etc. etc. etc. This feature is like having a whole team of programmers scouring your codebase for you; a big productivity boost.
  10. Understanding multiple language contexts can help a great deal when you’re forced to work in files with different contexts embedded within each other.
  11. Parameter/documentation tooltips eliminate the need to look up function purpose, signatures, and return types. While you should memorize commonly used functions, a significant amount of programming involves using new/unfamiliar libraries. Leaving your context to look up docs imposes costs in time and concentration. Sometimes that cost yields later benefits, but often you’ve just forgotten the order of a few parameters.
  12. Jumping to the declaration of a function/variable saves you from having to search for it.
  13. Find usages in an IDE that comprehends code allows you to quickly understand where and how a variable/function is used (or mentioned in comments!) in a codebase with very little error.
  14. Rename refactoring can carefully change an identifier in your code (and optionally filenames and comments) across an entire project of files. This can also apply to CSS; when renaming a class/id, the IDE may offer to replace its usages elsewhere in CSS and HTML markup. The obvious benefits are time savings and reduction in the errors you might make using more simple string/regular expression replacements, but there are other gains: When the cost of changing a name reduces to almost nothing, you will be more inclined to improve names when needed. Better names can reduce the time needed to understand the code and how it should be used, and to recognize when it’s not being used well.
  15. Comprehension of variable/expression type allows the IDE to offer intelligent autocompletion options, reducing your time spent typing, fixing typing errors, and looking up property/method names on classes. But more than saving time, when an expected autocomplete option doesn’t appear, it can let you know that your variable/expression is not of the type that you think it is.
  16. IDEs can automatically suggest variables for function arguments based on type, so if you’re calling a function that needs a Foo and a Bar, the IDE can suggest your local vars of those types. This eliminates the need to remember parameter order or the exact names of your vars. Your role briefly becomes to check the work of the IDE, which is almost always correct. In strongly typed languages like Java, this can be a great boost; I found Java development in NetBeans to be an eye-opening experience to how helpful a good IDE can be.
  17. IDEs can grok Javadoc-style comments, auto-generate them based on code, and highlight discrepancies between the comments and the code. This reduces the time you spend documenting, improves your accuracy when documenting, and can highlight problems where someone has changed the signature/return type of a function, or has documented it incorrectly. IDEs can add support for various libraries so that that code can be understood by the IDE without being in the project.
  18. IDEs can maintain local histories of project files (like committing each change in git) so you can easily revert recent changes (or bring back files accidentally deleted!) or better understand the overall impact of your recent changes.
  19. IDEs can integrate with source control so you can see changes made that aren’t committed. E.g. a color might appear in the scroll bar next to uncommitted changes. You could click to jump to that change, mouse over to get a tooltip of the original code and choose to revert if needed. This could give you a good idea of changes before you switch focus to your source control tools to commit them. Of course being able to perform source control operations inside the IDE saves time, too.
  20. IDEs can maintain a local cache of code on a remote server, making it snappier to work on, reducing the time you’d spend switching to a separate SFTP app, and allowing you to adjust the code outside the IDE. The IDE could monitor local and remote changes and allow merging between the versions as necessary.
  21. IDEs can help you maintain code style standards in different projects, and allow you to instantly restyle a selection or file(s) according to those standards. When contributing to open source projects, this can save you from having to go back and restyle code after having your change rejected.
  22. Integrated debugging offers a huge productivity win. Inline debugging statements are no replacement for the ability to carefully step though execution with access to all variable contents/types and the full call stack. In some cases bugs that are practically impossible to find without debugging can be found in a few minutes with one.
  23. Integrated unit test running also makes for much less context switching when using test-driven development.

This list is obviously not exhaustive, and is geared towards the IDEs that I’m most familiar with, but the kernel is that environments that truly understand the language and your codebase as a whole can give you some powerful advantages over those that don’t. A fair argument is that lightweight editors with few bells and whistles “stay out of your way” and can be more responsive on large codebases–which is true–but you can’t ignore that there’s a large real productivity cost incurred in doing without these features.

For PHP users, PhpStorm* includes almost everything in the list, with Netbeans coming a close second. With my limited experience Eclipse PDT was great for local projects, but I’ve only seen basic syntax highlighting working in the Remote System Explorer. All three also fairly well understand Javascript, CSS, HTML, and to some extent basic SQL and XML DTDs.

*Full disclosure: PhpStorm granted me a copy for my work on Minify, but I requested it, and my ravings about it and other IDEs are all unsolicited.

Define namespace constants using expressions

Since const is parsed at compile-time, you can’t use expressions in namespace constants, but you can use define as long as the name argument is the full name from the global scope (run this):

namespace Foo\Bar;
const CONST1 = 1;
define('CONST2', 1 + 1); // global
define(__NAMESPACE__ . '\\CONST3', 1 + 1 + 1); // in namespace!
echo CONST1, " ", \CONST2, " ", CONST3; // echos '1 2 3'

The bad: PHPStorm comprehends const and global defines, but not define(__NAMESPACE__ . '\\CONST', $value)

PHP RFC Preview: Dynamic Callback Expressions

I’m posting this to get some initial feedback on this idea before I officially submit an RFC.

Background

Even with PHP’s growing object-oriented and functional programming features, the callback remains widely-used and useful. However, forcing authors to create callbacks via strings and arrays presents difficulties:

  1. Most IDEs do not recognize callbacks as such, and so cannot offer autocompletion, rename refactoring, and other benefits of code comprehension.
  2. Authors can misspell identifiers inside strings.
  3. Within namespaced code, authors can forget to prepend the namespace, since function calls within the namespace do not require it.
  4. Where use statements change the identifier for a class, authors can specify the local classname instead of the fully resolved name.

Proposal Continue reading