At 1am Pacific on January 29th, 2015, Entertainment Weekly quietly switched to Drupal 7. Ten WordPress VIP blogs and a monolithic Vignette V6 Content Suite were deprecated. Over a half million articles and images were migrated to a new design and modern publishing platform, and Four Kitchens led the development.
On February 27th, 2015, three of the web chefs on the project presented a migration case study to a full room at SANDcamp 2015. Myself, Matt and Patrick fielded questions about the project. While no recording was available, I’d like to share the presentation with you.
Overview and process
Entertainment Weekly is a magazine that covers film, TV, music, Broadway theater, books, and pop culture, and ew.com has (officially) a reach of 13.7 million consumers per week. A massive amount of business logic, advertising, and rapid content creation and curation is required to keep the news flowing and the audience growing.
We started work on the site in early 2014, with the first commit in late April. Over the project, the Four Kitchens team grew from 3 to 6 developers, and the EW internal team grew from 1 to 4. 4K provided project management, data structures, migration, and implemented the design and advertising requirements. EW was the product owner, created the design, provided the workflow tools, and managed the infrastructure and build process. This was clearly a team effort, and cross-team collaboration was essential to the project’s success.
Process, both in standards and documentation, played a large role in making the project work. For example, we drafted a Definition of Readiness and Completion, which was an evolving framework of expectations that ensured that stakeholders were giving context necessary for development and developers were delivering consistent, high quality and well-documented results. We also engaged in cross-team peer reviews and collaborated on improving local development environments.
Front end
We discussed the front end, including theming, performance and advertising. We used Aurora as the base theme along with Themekey to switch between a desktop and mobile theme (advertising requirements necessitated two distinct themes) using a cookie provided by edge servers. Gulp compiled the Sass and ran JavaScript checkers like jshint and jscs.
Advertisement rendering was optimized to maximize performance. First, there was no logic in the templates; pre-processing was heavily used. Data attributes contained values, which avoided inline JS and improved rendering performance. Ad rendering occurs in the footer, after the page load.
Content migration
Content Migrations were my primary focus. WordPress VIP exports in WXR format, which is basically XML generated by RSS feed helpers. There’s a lot of redundant data in the archives, which are split up arbitrarily instead of providing one monolithic file. Some of it (such as comments) were no longer relevant in our work following an older migration to Disqus.
WordPress Migrate works for simple migrations with no custom mapping and couldn’t handle additional fields from WP. However, it does provide a fantastic starting point as extendable classes, and I highly recommend this approach.
Also, Drupal doesn’t understand WordPress shortcodes and filters, for some strange reason. Many of the custom shortcodes were being deprecated as well. My approach was to create a minimal collection of WordPress function files, mocked enough classes so WordPress thought it had bootstrapped, and used WordPress to render the WordPress shortcodes. Worked like a charm!
In order to perform the filtering, consolidation, cleanup, and rendering, I pre-processed all the files using custom Drush commands. The end result were cleaned files, logically organized and separated by content type. A more traditional migrate approach was taken with the result.
Vignette content was exported in TI generated XML files. These too were pre-processed and cleaned, as close to a decade of data is prone to have edge cases and oddities interspersed.
Another logistical issue was consolidating content and tagging across so many disparate systems. WordPress IDs collided across 10 blogs (we prepended a machine name to the identifier to mitigate).
Duplicate tagging and a flat taxonomy was an interesting problem; how to map, correct, clean and consolidate upon import? I resolved that by creating a spreadsheet that EW editors could open in their favorite office program with columns that would rename, ignore, and map to destinations. A tag named in the exact same way would be consolidated, with legacy identifiers added to the existing.
Performance and caching
Our deliverable was the site code, not the platform, so instead of relying on horsepower, we focused on building and delivering content as quickly as possible. A multifaceted approach was required; both front and backend optimization was required.
On the frontend, all CSS was Sass and Compass optimized, and was targeted to only load what was needed. All JavaScript was linted and used strict standards, was placed in the footer to be non-blocking, and leveraged global objects to avoid overhead.
Editors were looking to minimize the publish-to-live window in order to provide breaking news and compete with social media. There’s a constant need to compromise on this front; cached content can be stale content, so a nuanced approach was needed to provide both performance and timeliness. On the frontend, we ensured that proper cacheable headers were set, and used a shorter time-to-live on Akamai than on Varnish. Then, purges on affected pages upon publishing events are sent to the stack to refresh the content.
On the backend, we eliminated all PHP errors, no matter how minor, as every PHP error, no matter how minor, slows performance. We cached and EXPLAINed custom queries, and did everything we could to minimize unnecessary overhead. For example, we minimized the module count and leveraged aggregated benchmark statistics to determine bottlenecks for optimization.
We load tested the production environment with migrated data using blitz.io with New Relic for introspection, exercising multiple content types and intentionally exceeding TTLs (we know that caches serve pages quickly, but what about rebuilding them under load?). The front-end was tested using utilities like WebPageTest.org to analyze the structure and identify blocking events.
Read the slides and see it live
We definitely enjoyed working on this project, and watching the cutover was wonderful – our baby was born and continues to grow! I hope you can learn from our experience by reading the SANDcamp Entertainment Weekly case study presentation on SlideShare!
We’ve also submitted the talk to DrupalCon 2015 Los Angeles please comment on the DCLA page if you’re interested in seeing the session live.
Do you have any questions about the Entertainment Weekly migration to Drupal? Let us know in the comments!
Making the web a better place to teach, learn, and advocate starts here...
When you subscribe to our newsletter!