Making Drupal and Pressflow more mundane

Drupal and Pressflow have too much magic in them, and not the good kind. On the recent Facebook webcast introducing HipHop PHP, their PHP-to-C++ converter, they broke down PHP language features into two categories: magic and mundane. The distinction is how well each capability of PHP, a dynamic language, translates to a static language like C++. “Mundane” features translate well to C++ and get a big performance boost in HipHop PHP. “Magic” features are either unsupported, like eval(), or run about as fast as today’s PHP+APC, like call_user_func_array().

Mundane

  • If/else control blocks
  • Normal function calls
  • Array operations
  • …and most other common operations

Magic

  • eval()
  • call_user_func_array()
  • Code causing side-effects that depends on conditions like function existence
  • Includes within function bodies
  • Other PHP-isms that make Java and C++ developers cringe

How Drupal and Pressflow can run better (or at all) on HipHopPHP

Prelinking

Currently, we invoke hooks using “magic” (though still HipHop-supported) calls to call_user_func_array(). We don’t have to do that; we could be “prelinking” hook invocations by generating the right PHP for the set of enabled modules. If we generate the right PHP here, HipHop can link the function calls during compilation.

This sort of “prelinking” also cleans up profiling results, making it easier to trace function calls through hooks in tools like KCacheGrind.

Compatibility break? Nope, it should be possible to replace the guts of module_invoke_all() with appropriate branching and calls to the generated PHP.

Including files staticly

Drupal 6 introduced an optimization to dynamically load files based on which menu path a user is visiting. This won’t fly in HipHop; it’s simply not supported. Fortunately, this is easy to work around: we can either drop the feature (shared hosters without APC are already booing me) or we could, like in the prelinking example, generate a big, static includes file (which is itself included on HipHop-based systems) that includes all possible page callback handlers based on the hook_menu() entries. Sites that include the static includes file would skip the dynamic includes at runtime.

Compatibility break? None, assuming we take the approach I describe above.

Death to eval()

Like dynamic includes, eval() is unsupported on HipHop. Drupal has already relegated core use of eval() to an isolated module, which is great for security. eval() is pretty bad in general: PHP+APC doesn’t support opcode caching for it, so serious code can’t run in eval() sanely. Unfortunately, using the PHP module to allow controlling block display remains quite popular.

We have a few options here:

  • Drop the feature (ouch!)
  • Provide a richer interface for controlling block display, including support for modules to hook in and provide their own extended options
  • Pump out the PHP to functions in a real file, include that, and call those functions to control block display

Compatibility break? Yes, on all but the third option (writing out a PHP file).

Migrate performance-intensive code to C++

I’m looking at you, drupal_render().

This opportunity is exciting. Without the cruft of Zend’s extension framework, we can migrate performance-critical code paths in core to C++ and make use of STL and Boost, two of the most respected libraries in terms of predictable memory usage and algorithm running time.

Compatibility break? There’s no reason to have one, but keeping C++ and PHP behaviors consistent will be a serious challenge.

The takeaway

  • Use real, file-based PHP, avoiding dynamic language features.
  • Profile the system to find the biggest wins versus development cost for migrating core functionality to C++.

I’ll be presenting the “Ultimate PHP Stack” for large-scale applications at PHP TEK-X. Zend PHP, Quercus, and HipHop PHP (source code release pending) will all be contenders.

Commenting on this Blog post is closed.

Comments

When I was first thinking about this, I wasn’t that interested. I tend to be far more concerned with scale than speed. Yes Drupal maybe heavy on PHP, but I can scale PHP horizontally till the cows come home. Then I considered on-going projects making use of pluggable field storage. We are soon going to be (like facebook) less constructing huge joins in SQL, but joins in PHP. Fetching from multiple no-sql data sources and combining them at the webnode, not in the backend.

This maybe somewhat lessened with the advent of document stores, such as MongoDB, but with a move away from a JOIN-centric language, I still imagine logical joins in PHP are on the rise. Thus, the future is one where I have to worry about PHP performance and projects like this will become critical.

That said, we need good numbers. Even having joins in PHP, there is a limit to how much a change in PHP is going to speedup a webapp like facebook given the static overhead of data fetches across the network. I’d love to see some benchmarks with pure computing on the PHP side.

In Drupal 7, page generation latency in PHP is enemy #1.

Absolutely, really looking forward to numbers on that.

Agreed. Drupal 7 can bootstrap from APC with a warm cache, issues many less queries for normal operations, a lot of slow queries in core have been removed, and things like mongodb field storage should really help sites with large datasets, reverse proxy support is great etc. etc.. However raw page execution time in D7 is way up on D6 and this has been the case for over a year now.

This reminded me I hadn’t done any comparisons for a couple of months, some fresh ones here http://drupal.org/node/615822#comment-2552276

Around 1/4 to 1/5th of page execution time for a very simple page is in PHP in D7, compared to around 1/3rd in D6. This goes up once you’re rendering multiple nodes or comments too, since query time is more or less flat for things like that now and rendering fields, links, (and RDF) are some of the worst bottlenecks we have.

Worse, D7 amplifies the PHP overhead for page rendering (increased flexibility and use of drupal_render()ing nested arrays), running queries, and many other operations. So, adding the same module to D7 that does the same queries and renders the same pages will run, as measured in the execution time of the module’s own operations, considerably slower on D7.

Some of our client sites with moderate module counts and theme complexity already take 1-3 seconds in PHP time to build relatively common pages (e.g. viewing a node), and that’s in Drupal/Pressflow 6.

I wrote the fractions reversed, doh!.

To clarify (although it will be obvious for someone who looks at the benchmarks).

3/4 to 4/5 of page rendering time is in PHP in Drupal 7 on node/2

D6 spends around 2/3rds of page rendering time in PHP on much the same page.

Since query number and times on that page are more or less constant, this means around 20% slower response times when measured with ab or microtime across the whole page. Yes this doesn’t take into account network time or horizontal scaling of PHP but it’s not good.

On call_user_func_array() - how much would we gain from “prelinking” / generated PHP, vs just converting module_invoke_all() to use $function()? $function() is only around 10% more expensive than a standard function call according to Larry’s benchmarks here: http://www.garfieldtech.com/blog/magic-benchmarks We did the same for drupal_alter() and it’s now around 50% faster internally - patch and benchmarks at http://drupal.org/node/593522

To make the same change for module_invoke_all() would be an API change, although there are more or less no-op versions using count(func_num_args()); and $function() which could be done in Drupal 7 without an API change and might help this specific case (not to mention improving stack trace output, which is horrible at the moment).

Larry’s benchmarks are probably accurate for regular PHP, but HipHop is different. In standard PHP, all function calls are handled dynamically, however they’re invoked. In contrast, direct and call_user_func_array() are handled distinctly in HipHop:

  • Direct function calls get translated to real C++ function calls. At compile-time, these are translated to memory addresses so that control can directly pass by changing the stack pointer (SP) and program counter (PC) registers. This makes the function call a hop, skip, and JMP away. Good C++ compilers can even optimize certain function calls to be inlined, where the code is transparently refactored during compilation to make the function call unnecessary at runtime.
  • Indirect calls, like through call_user_func_array(), necessarily involve an additional lookup step to determine where to pass control because, like when calling hooks in Drupal, the name of the function is constructed at runtime.

There are, however, creative ways HipHop could support fast and dynamic-ish function invocation without making things too crazy on the PHP side. One could do this by creating a new class called Function. This class would be constructed with a string name for the function to be called later. This class would have one non-constructor member function: invoke(). In normal PHP, this class would be a simple wrapper around call_user_func_array().

In HipHop, a Function class constructed with a string literal could be handled as a function pointer; C++ can efficiently pass control using these. (Function classes constructed with dynamic strings would still be subject to the slower invocation method.) Drupal would change its hook model to be more like MediaWiki’s, where callbacks are explicitly registered. This approach would still run on Zend PHP with the wrapper class, but HipHop could juggle function pointers to fire off calls to hooks, removing the dilemma of either running hooks slowly or generating and compiling code.

This approach is also possible without a wrapper Function class, but the analysis required would be complex. Then again, the type inference logic in HipHop may be capable of such feats.

“To make the same change for module_invoke_all() would be an API change…”

Performing the best optimization would certainly require an API change (unless you resorted to automated rewriting of the source files). However, it’s possible to parse the possible hooks out of every module files, as we did with the now-abandoned function registry, and dynamically write a really long module_invoke_all() implementation that string-matches the hook to invoke and then directly calls the right functions.

It seems that again we arrived to the same point although for different reasons: http://drupal4hu.com/node/240 . Your reasoning is another why an install profile should be able to swap out some functions for it’s own versions as needed.

Very interesting article. I had read somewhere that because of slowdowns in D7, there were people looking to offload a bunch of the more intense operations to Java. From the sounds of it, providing code that HipHop can optimize and compile may be a much easier alternative, and provide the same (if not better) performance. Fun stuff!

Oh, providing real block visibility hooks instead of white-screen inducing block eval() is worth the price of admission. Given Facebook’s experience with APC, I’m excited about getting drupal on HipHop. Thanks for the read.

block.module actually has that now http://api.drupal.org/api/function/hook_block_info_alter/7

Although PHP module still provides the eval() option, so we need to look at a way to replace that everywhere (blocks, CCK, views, rules) by the time D8 comes around, and hopefully something in Drupal 7 contrib.

The underlying concept here, be able to push expensive parts of Drupal off to another language, I support. However, I’ve been pushing for a different approach.

With good Interface-based architecture and PHP autoloading, anything that is a class can be replaced with a PECL module (or Java if you’re on the PHP-on-JVM project, whatever that’s called). Enable the PECL module, and that class is available and the PHP version never gets loaded. Poof, Drupal ported to C (or Java, or C++, or whatever). I’m not sure if “leverage PHP less” is the correct solution, though. PHP has a lot of very nice features that fundamentally affect our architecture. The DB extenders, for instance, rely on __call() to work. We can’t drop that bit of magic without having to completely rearchitect that system, something I am not looking forward to. :-)

Pre-compiling PHP into a single file is actually very bad for code weight if you’re not on APC. Especially for the menu callbacks, you can’t just compile a file that has “all of the callbacks we use”. That’s right back where we started. You’d need to compile “all the callbacks used on this particular menu router item”… which brings us right back where we started.

We should also not see “push code to another language” as a silver bullet. It’s not. That’s an edge case that maybe 1% of sites will be able to leverage. Drupal itself has to be fast enough to still run on shared hosting where people won’t have the opportunity to install PECL modules or use alternate PHP runtimes or move everything to MongoDB.

So +1 on being able to push expensive code to non-PHP, but -1 on doing it in a way that makes people who aren’t professional sysadmins second class citizens.

I am 100% in agreement with Larry. HipHop is a cool tool, and works well for one-off platforms where there is a well known and understood server architecture. But it is far from a panacea, and would likely do more harm to Drupal’s main target audiences (namely, all of those instances running on shared hosts).

PECL provides the ideal route forward: make it possible to swap a C library into place of a PHP library. PECL is standard. PECL is supported all over the place. And PECL is 100% integrated with PHP — for obvious reasons. No weird gotchas. No forced revisions to Drupal’s overall architecture.

Are there any tools to automatically generate PECL libraries from PHP though? We don’t have C developers by the dozen to do all that work for us.

Also it looks like the only feature we’d actually have to drop is eval(), that would be a good thing in itself since I’m sure we can replace the flexibility that sometimes provides with something better.

We can look at call_user_func_array() but Drupal will still work with that and HipHop - so the only question is whether there’s internal refactoring we could do to take advantage of the speed improvements if we skip it, and it looks like it’d be entirely possible to do that internally within module.inc - either in core, or in Pressflow for the 1% of sites which really need it. I’m not sure where “forced revisions to overall architecture” comes from - that seems much more a description of changes which would allow stuff to be replaced by C rather than compiled into it.

HipHop PHP provides two very different facilities for us to up performance:

  • The initial transition to compiled C++ nets an average of 50% improvement for Facebook’s work. It may net more for Drupal because so much of our overhead is in the theme layer, playing around with strings.
  • Writing good, truly native C++ code that is accessible from PHP should give a 100-1000% (if not more) improvement over PHP+APC.

Optimizing well for the first requires the architectural changes I’ve mentioned in my original post, but it does not require us to write C code. Even if we did make such architecture changes, I don’t consider it an essential change to Drupal’s model.

Technically, we can do much of the second already with C-based extensions to PHP, but that requires a rewrite of the code and maintenance of equivalent PHP and C versions. However, we could architect around minimizing the amount of code that has to exist in both versions.

It looked to me like it supports most of the things we actually need to run core though right? eval() is only in PHP module, they have a C extension for PDO apparently (listen carefully to the end of the screencast).

I’ve recorded about 7-800 /unique/ function calls in a vanilla HEAD installation before. Some functions are called several hundred times each on common pages. $function and call_user_func_array() are < 500 out of probably 10,000 odd - so assuming compatibility, I think we’d see some decent gains even without refactoring.

I find the idea of moving bits of Drupal off to a different language very interesting. I really hope that the community can come together on some sort of consensus on a way forward, as having some people writing PECL extensions, others porting bits to C++ for use with HipHop, and others writing bits to Java for use with Quercus would be untenable.

I’m super excited to be following this conversation in the year ahead.

David, this is an excellent writeup. I was considering doing my own, but like always, you scooped me :)

Here’s an interesting one for you.. PHP’s SoapClient library does some magic under the hood to generate class methods on a $client object in a late-binding fashion.. I believe the result of which is that SoapClient::__call() is used to invoke operations specified by a WSDL.. Any idea if this kind of thing is supported with HipHop (it’s not with quercus, since java doesn’t have the facility to arbitrarily route unknown methods to a __call() method the way php does (short of some really funky AOP magic).

According to the webcast, __call() is apparently supported. In fact it looks like nearly everything is except create_function() and eval().

Most of the changes David’s describing are about taking advantage of the performance gains to the greatest extent, from what I saw in the screencast, Drupal 7 ought to more or less run on HipHop already without changes (there’s a PDO C extension shipped with it) - as long as you don’t enable PHP module.

My suggested changes for conditional includes are mandatory to run Drupal on HipHop.

This topic reminds me of the talk Rasmus gave at Drupalcon Hungary in 2008. He advised Drupal to move at least some of the lower level code into a extension written in C. This approach would reduce a lot of systemcalls and greatly improve performance and scalability.

Would it be a feasible approach to separate some of drupals lowlevel code, give it the hiphop treatment and run in conjunction with the phpcode running on top? This would prevent to creation of two code bases and makes it easy to avoid some of the more difficult points you mention in your post for the time being.

For sure I’m not quite understanding all aspects and for sure lack the skills to actualy play with this at this stage but I’m intrigued by these developments.

There is not an efficient way to bridge HipHop code with regular PHP.