One year ago, when I joined the Motor Presse GmbH & Co. KG, a well established and renowned publishing house in Germany, most of the magazine’s web presences based on a local CMS. The CMS is quite flexible and was - for the time it was developed around 4-6 years prior - a good choice to use. But the main concepts for caching and the programming paradigms used are quite ancient and there’s no real hint on any improvements in the future. So my base setup was clear: Old CMS, really slow (SOAP) interface for decoupling, tons of business logic in templates and webpages that grew organically over a long timespan. Time for a change. One thing was clear: Since most editors were quite proficient using the established CMS for creating their content, this was not a thing we could easly change. But we needed new ways of doing things: A version control, continuous integration, Testing, different test/stage systems and most of all a common development VM with an according IDE.
When we got the task to build a relaunch for menshealth.de, we made it our mission to completely restructure our team, development processes and systems. And this is how it went.
First of all: We are a small development team, consisting of 4 developers and one designer who worked nearly exclusively for the relaunch project and the experience levels of working with more sophisticated frameworks like Symfony2 varies a lot!
First thoughts on infrastructure and design
On first sight the main performance bottlenecks were the (normalized) MySQL database and the non-maintainable templating structure/handling. Facing these facts and knowing we could not move to another CMS in time, we decided to split the future application into three parts. First, the CMS as it was, to keep our editors satisfied and the migration manageable - also this gives us the opportunity to keep our existing infrastructure. Second an API (REST… what else) for data transformation, access and also later for external data exchange plus user authentication. Third and last part is a frontend application to visualize the data from our API.
Because I am a great fan of Symfony 2, the choice concerning the framework was quite clear. And having a need for fast and structured storage of complex data, we decided to try MongoDB which I had in use in other projects before and was quite satisfied with. As we decided to split the application we’ve also gained the possibility to asynchronously work our data transformation tasks from CMS to API. RabbitMQ and the AMQP library were logical choices, since they are stable, performant and integrate greatly into a sf2 project.
Building the application infrastructure and deployment
Having defined our components and after long talks with our server provider, we came up with following setup: (image) A varnish proxy/cache with failover in front of two webservers for loadbalancing, two MongoDB instances with an arbiter, one Memcache instance each for session sharing, one API server and the already existing MySQL cluster with one CMS (backend) server. Images are delivered by a small “micro-image-server” from a mounted NFS, images uploaded via the backend CMS. This setup is managed by puppet.
QA, release & deployment
For release and deployment we’ve decided to go for “we do everything in the master” during the hot phase of the project. As soon as things cooled down, we’ve switched to the more (process-)secure gitflow. For Deployments over all systems we use ansible with a custom role, based on ServerGrove’s symfony2 deployment.
- dev → virtual machine with vagrant and ansible, local for each developer.
- stage → separate server.
- preview → separate server with production settings.
- prod → production servers.
Our QA flow now consists of these components:
- Jira ticket, approved and assigned by project management.
- Development in feature branch (git flow) on a local VM.
- develop branch is checked by Jenkins/PHPCI for every commit and then auto-deployed to test server. First acceptance test stage.
- Release branches are also checked by Jenkins and then deployed to preview system, second acceptance test stage.
- master branch is checked by jenkins after every merge, manually deployed to production.
What we did for scalability
We started with a short list of known performance bottlenecks/requirements for our special case:
- Slow CMS backend due to loads of normalizing in MySQL and bad application architecture php-wise.
- Frontends need no direct write access to database (security & speed!).
- CMS and frontend can have slight asynchronicity.
- Frontend needs to be fast without varnish and therfore allow for a higher content update frequency.
- Varnish cache times should be manageable through frontend responses.
- Make it easy to add new frontend servers if load increases (vertical scaling).
- Keep backend decoupled but allow for vertical scaling there, too.
Looking at those requirements, we see: Nothing really new or unfamiliar. This list lead to following architecture:
- CMS totally decoupled, triggers REST API calls for every CUD action with minimal payload. Noticed the missing “R”? Reads are done directly from DB for performance reasons.
- API application processes messages with RabbitMQ/AMQP to keep load manageable.
- API extracts and transforms data from MySQL server to MongoDB documents.
- Data goes into a MongoDB cluster.
- Frontends access MongoDB directly, but readonly.
Some more ideas we’ve sucessfully to the test:
- Symfony controllers are used as services (see Mathias Noback). This allows for easier testing and much better dependency visibility!
- One-Bundle approach as advertised in best practices, modified to have
src/Corefor our library classes.
- gulp/stylus for css/js management.
- Removed Assetic, since it is not necessary when using gulp.
- Using Symfony Response object and custom configurations for ETag/cache lifetime.
- APCu cache for userland caching.
- All monolog logging is pulled by Graylog via logfile (GELF) → Awesant → RabbitMQ.
- The library is connected to coreBundle via services (providers) which are only handlers for private services, collected during compiler passes. This has a huge impact on DIC size and performance.
- All access to CMS data is hidden behind a facade with interfaces and a provider structure that allows for different API versions and access for a (possible) variety of CMS systems per client.
During the time we’ve encountered some (from hindsight) funny errors:
- Using APCu, always check how much memory consumption you have, how quickly the cache builds and flushes. Check hitrates! In our case the cache filled up and flushed so quickly, you couldn’t see it on monitoring. But hitrates show ;-)
- Logging! I can’t emphasize enough: Every crucial action that fails should be logged - but with enough meta-information to be conclusive!
- If you’re using the reverse proxy cache kernel, make sure it uses the same strategy (e.g. key) as your varnish does. We had the user agent in varnish cache key, but not within the vary headers - so the reverse proxy cache eratically cached desktop/mobile sepcific stuff and people got a lot of wrong ads. Took me several hours to find :-(
When we first load-tested the application, we were quite surprised by the good overall performance. This behavior continued with the go-live - with some hickups with the cache keys for varnish. We’re quite satisfied with our current architecture, especially since we decoupled most of the code from the framework itself and placed it into library structures. But there is still a lot of work to do: We’ve got a low UnitTest coverage, some of the library parts should be excluded as separate libraries and repositories and also - and this is true for nearly every project - a lot of refactoring has to be done in templating and some library parts.
And this is what we’ve built: menshealth.de