nginx+drupal revisited

nginx

Recent nginx updates support try_files and internal location directives. These features make nginx more flexible as a web server for Drupal.

  • try_files checks for existence of files in order, and returns the first file that is found. In Drupal’s logic, try_files enables the server to check Boost-generated cache, imagecache images, and Drupal installation in order.
  • @location syntax for internal locations. Internal locations are not exposed directly via nginx. They are accessible by try_files, customized 40x messages, and rewrites.

drupal

Using try_files and @location syntax together provides an easier way to run Drupal.

Most FastCGI parameters are in fastcgi_params which comes by default in nginx installation.

security and performance

Apache proxy, cache, and web service optimization

At UPEI our web pages are powered by an open-source web platform Drupal but served as static pages that are mirrored (in our terminology, scraped) by httrack to a front-end server. Most components of web pages are static except emergency messages, contact forms, and some bits of media files. All external access goes to the front-end server, while only a few requests reach our back-end server through the university firewall.

INFRASTRUCTURE

Our system is constructed by five different pieces: A front-end web server (at the same time a reverse proxy), a back-end web service and HTTP media server, a back-end production server, a development server and a database server. The front-end web server, the back-end production server, and the development server are all based on Debian Linux and an old but very stable Apache 1.3. The web service and media server is based upon a very fast and reliable HTTP server Nginx. Our database server is MySQL 5.1.

CHALLENGES

The original infrastructure has only the front-end static HTTP server and the back-end HTTP Drupal server. While most content is static on our website, we still need some dynamic content for feeds, emergency messages and forms. The back-end HTTP Drupal server handles too much PHP requests and is dying.

The major issues I am concerned about:

Performance. Our infrastructure must handle all hits for emergency situations. In other words, external access to Drupal must not rely on Apache.

Security. All external inputs must be filtered, monitored, and isolated from the production server.

Reliability. Production server down time must not affect public access.

Scalability. The infrastructure must be open to future expansion.

The bottleneck of our system was in the dynamic part.

HTTP SERVERS

The front-end server is a stable Debian Linux installation that serves all static pages and acts as a reverse proxy server to web services and legacy systems. Since our daily page views are well under 1 million per day, the server runs happily with Apache 1.3 as a static server. Small media files are reversely proxied to the back-end media servers and kept with Apache caching.

The back-end production server provides Drupal access to all content managers in the university. The development server is a sandbox server for theme development and module development. Both these two servers run on Debian Linux and Apache 1.3 and connect to separate database servers.

The media and forms server runs on Nginx to provide media file downloading/streaming and non-cacheable AJAX responses. It has restrictive access to the production database server and most POST requests are filtered and monitored. Nginx is well-known for its performance and scalability. WordPress.com runs on Nginx as a load-balancer.

OPTIMIZATION

Compression. All texts including html files, javascripts, and css stylesheets are encoded with mod_gzip in the front-end server.

Cache in the client side. All images, and fonts are cached in the browser by Expires header and Cache-Control header for at least 45 days. ETag is properly disabled for binary content. This optimization has significant improvement for the second visit. Our home page is significantly large in size (very graphics oriented for marketing purposes). The first visit may be slow (2.58MB in size). Client-side cache, however, improves the second visit to about 30KB to 50KB. Large images are also loaded in the background instead.

Cache in the server side. Small media files are cached in the front-end server to prevent proxy access to the back end.

Home page CSS refresh issue. HTTP cache control and expires headers are used in the front-end server to make client browsers load the home page every visit.

PHP 5.3 and MySQL 5.1

Running these brand new distributions.

Amazon EC2

Amazon EC2 is an amazing service for those who want stability, scalability, and extensibility. Technically speaking, EC2 is an on-demand VPS (Virtual Private System) for which you pay when you need. EC2’s upside is that no customer service and additional payment transactions are involved if a server is “purchased.” EC2’s service is paid by instance hours. If an instance is not running, you do not need to pay for it. EC2’s instances support up to 8 cores and 17GB memory. Its elastic block store supports unlimited storage space that is pay-as-you-want.

Considering how unstable the MediaTemple (gs) that I am using, EC2 is the next round for me. EC2 provides better supported, more stable, flexible, and robust than any other VPS competitors in the market, iff you are geek-enough to use it.