Building Drupal with Reusable Fields

CCK and Field API

CCK is the most useful module for Drupal. It adds custom fields and content types to Drupal. Drupal 7 incorporates the community efforts of CCK into the core as the Field API.

Field API UML

The CCK module allows a content type to have multiple fields with various field types and different field widgets and formatters. A field must be assigned a widget to define its input style and at least one formatter to define its display style. The UML diagram above describes the relationship between content types, fields, field widgets, and field formatters in creating a CCK type.

Reuse Fields

Add new field

When a CCK field is added to a content type for the first time, this CCK field’s is created in Drupal as a class and an instance is assigned to the given content type. When the field is assigned to a content type, its configuration parameters are stored in the instance instead of the class. Instead of adding new fields to a content type, adding existing fields is a better option to reduce the system’s complexity and to improve scalability.

Add existing field

Adding an existing field only requires the administrator to choose a field from a list of fields defined in other content types and select a widget to define the input style. The most recently created field instance brings in the default parameters that can be changed later.

Reuse Fields with the API

CCK allows adding customized fields, widgets, and formatters in modules. Many third-party modules (Drupal CCK modules list) are already created to accomplish different tasks, including images, videos, and other internal and external references. CCK for Drupal 6 provides a set of API (CCK Developer Documentation for Drupal 6) for module developers. Drupal 7 provides native Field API for module developers.

Performance Issues

CCK (or Field API in Drupal 7) adds extra complexity to a Drupal system. When creating a new field, the field’s definition is added to the field class table and the field’s configuration is added to the field instance table; meanwhile, a new table is added to the Drupal database to store the field data. Database tables add complexity to the system. In addition, queries of nodes will incur JOIN expressions of tables to field data. Multiple JOINs will impact database performance since MySQL responds poorly to queries with multiple JOINs of tables if not properly configured.

Reuse of fields can reduce the number of tables in the Drupal database. For example, if 10 image fields, field_image_a, field_image_b, …, field_image_j, are added to the system, 10 tables are added to the database. If a single content type only utilizes two image fields, one thumbnail and one image, we can redefine the fields as field_image_thumbnail and field_image. Only two tables are introduced to the database with the latter configuration.

Reuse of fields can also reduce the system’s complexity. Instead of creating and maintaining 10 different fields, Drupal admins maintain only two fields and their documentation. Database administrators only need to improve performance of two extra tables. KISS is always a good principle.

Building Drupal with Naming Conventions

Creating Drupal sites is easy and requires no fancy skills. Drupal installation is simply one-click; setting up modules is simply one-click; creating new content types is also simple mouse clicks. Unfortunately, the power of control over Drupal is usually abused because of Drupal’s initial impression of simplicity. When building a hobby site, Drupal entities, such as blocks, content types, views, and URLs, are created randomly without deliberate consideration. Meanwhile, Drupal gurus would probably go in another approach by carefully planning names ahead.

Naming conventions

Naming conventions are overhead for most casual hobby use of Drupal. When a hobbyist wants to install Drupal for the first time, the guy won’t gain anything from naming conventions. When the hobbyist becomes a Drupal professional and sets up his tenth Drupal installation for a client, the guy may want some naming conventions for blocks, fields, content types, and views he created so he can easily maintain the other nine Drupal websites.

The Drupal community has defined coding standards for naming functions and variables. This standard is roughly based upon PEAR Coding Standard. Unfortunately PHP does not support either dot-separated packages or namespaces. Neither does Drupal support namespaces for its variable names and machine names for fields, content types, views, and other Drupal entities.

Human-readable Names and Machine-readable Names

Drupal demands two different types of names: a human-readable name and a machine-readable name.

Human-readable name and machine-readable name

The human-readable name is a text field containing any character. Drupal stores it as plain text into the database and treats it as plain text to display. Drupal recommends the human-readable name to contain only alphanumerics and spaces. However, this is not a strict restriction. The human-readable name can be used for applying naming conventions.

The machine-readable name is a text field containing only lowercase letters, numbers, and underscores. Drupal usually uses the machine-readable name directly as PHP variable names, database table names, and database field names. The machine-readable name must then follow the strict character restrictions. Machine-readable names only allow the use of lowercase letters, numbers and underscores. Underscore becomes the only option to define namespaces in machine names.

For example, for a user-created Blog type:

> * Name: user.Blog
> * Type: user_blog
> * Description: A user Blog content type.

End users may be unable to understand the meaning of Create a user.Blog. Using underscore-separated machine names and leaving the human-readable names as usual is probably a better idea.

The user created blog type can be rewritten as:

> * Name: Blog
> * Type: user_blog
> * Description: A user Blog content type.

While developers can recognize the user.Blog content type from user_blog, users reads Create a Blog in their menus.

A longer name example may write:

Jounral:
> * Name: Journal
> * Type: webinit_acad_journal

Journal Issue:
> * Name: Journal issue
> * Type: webinit_acad_journal_issue

Journal Article:
> * Name: Journal Article
> * Type: webinit_acad_journal_article

Building Conventions

Naming conventions to Drupal developers are kinda the same thing as coding standards to programmers. Building conventions among Drupal developers is reaching consensus among a team of developers. The technical leader is responsible for building up conventions in his team.

Consumer Consensus Although consumers may not see any machine-readable names explicitly on web pages, human-readable names are visible to consumers in many menu items. Developers must realize that human-readable names are consumed by end users. Display names are not only meaningful to developers, but also meaningful to end-users. In addition, end-users are also concerned about entity descriptions.

Developer Consensus Developers may reach an agreement about naming conventions.

The above example about academic journals and articles is defined by use case. All journal-related items belong to the journal subsystem of the acad scope because journals and articles are designed within the journal subsystem. Along with the content types, developers can create journal related blocks and views following the namespace webinit_acad_journal_. For example,

A Journal View:
> * Name: Journal view
> * Machine: webinit_acad_journal_view_journal

A Journal Issue View:
> * Name: Journal issue view
> * Machine: webinit_acad_journal_view_issue

The functionality domain is more useful for functions like node reference views and other assisting-purpose views. For example, in webinit_acad_journal_issue content type, it has a node reference field of journals from a view dedicated to listing journals. The view follows the pattern,
> * Name: Node reference view of journals
> * Machine: noderef_journal

However, this view can also be put into the namespace specified above,
> * Name: Node reference view of journals
> * Machine: webinit_acad_journal_noderef_journal

By packaging content types and views into the same namespace, users are able to focus on the problem scope and the set of features in Drupal provided by developers. Developers can easily find out bugs within the scope during maintenance.

Next article will discuss Reusable Fields.

nginx+drupal revisited

nginx

Recent nginx updates support try_files and internal location directives. These features make nginx more flexible as a web server for Drupal.

  • try_files checks for existence of files in order, and returns the first file that is found. In Drupal’s logic, try_files enables the server to check Boost-generated cache, imagecache images, and Drupal installation in order.
  • @location syntax for internal locations. Internal locations are not exposed directly via nginx. They are accessible by try_files, customized 40x messages, and rewrites.

drupal

Using try_files and @location syntax together provides an easier way to run Drupal.

Most FastCGI parameters are in fastcgi_params which comes by default in nginx installation.

security and performance

Apache proxy, cache, and web service optimization

At UPEI our web pages are powered by an open-source web platform Drupal but served as static pages that are mirrored (in our terminology, scraped) by httrack to a front-end server. Most components of web pages are static except emergency messages, contact forms, and some bits of media files. All external access goes to the front-end server, while only a few requests reach our back-end server through the university firewall.

INFRASTRUCTURE

Our system is constructed by five different pieces: A front-end web server (at the same time a reverse proxy), a back-end web service and HTTP media server, a back-end production server, a development server and a database server. The front-end web server, the back-end production server, and the development server are all based on Debian Linux and an old but very stable Apache 1.3. The web service and media server is based upon a very fast and reliable HTTP server Nginx. Our database server is MySQL 5.1.

CHALLENGES

The original infrastructure has only the front-end static HTTP server and the back-end HTTP Drupal server. While most content is static on our website, we still need some dynamic content for feeds, emergency messages and forms. The back-end HTTP Drupal server handles too much PHP requests and is dying.

The major issues I am concerned about:

Performance. Our infrastructure must handle all hits for emergency situations. In other words, external access to Drupal must not rely on Apache.

Security. All external inputs must be filtered, monitored, and isolated from the production server.

Reliability. Production server down time must not affect public access.

Scalability. The infrastructure must be open to future expansion.

The bottleneck of our system was in the dynamic part.

HTTP SERVERS

The front-end server is a stable Debian Linux installation that serves all static pages and acts as a reverse proxy server to web services and legacy systems. Since our daily page views are well under 1 million per day, the server runs happily with Apache 1.3 as a static server. Small media files are reversely proxied to the back-end media servers and kept with Apache caching.

The back-end production server provides Drupal access to all content managers in the university. The development server is a sandbox server for theme development and module development. Both these two servers run on Debian Linux and Apache 1.3 and connect to separate database servers.

The media and forms server runs on Nginx to provide media file downloading/streaming and non-cacheable AJAX responses. It has restrictive access to the production database server and most POST requests are filtered and monitored. Nginx is well-known for its performance and scalability. WordPress.com runs on Nginx as a load-balancer.

OPTIMIZATION

Compression. All texts including html files, javascripts, and css stylesheets are encoded with mod_gzip in the front-end server.

Cache in the client side. All images, and fonts are cached in the browser by Expires header and Cache-Control header for at least 45 days. ETag is properly disabled for binary content. This optimization has significant improvement for the second visit. Our home page is significantly large in size (very graphics oriented for marketing purposes). The first visit may be slow (2.58MB in size). Client-side cache, however, improves the second visit to about 30KB to 50KB. Large images are also loaded in the background instead.

Cache in the server side. Small media files are cached in the front-end server to prevent proxy access to the back end.

Home page CSS refresh issue. HTTP cache control and expires headers are used in the front-end server to make client browsers load the home page every visit.

Use Nginx to run your Drupal site

Type: Tutorial
Difficulty: Intermediate

I have a fresh website based on Apache+PHP5 to be converted into Nginx and PHP5-FastCGI. What can I do?

Stage 1 CGI version of PHP5

Nginx only supports CGI version of PHP5 (not the Apache module). In FastCGI mode, PHP5 runs like a server that forks out a number of children to handle incoming requests. This number is indicated in the start-up script. It can be any number where necessary. Of course, we would not blow up our servers, so memory_limit*number of PHP children < available memory.

In Debian/Ubuntu systems, we can simply install php5-cgi in one line:

This will install the CGI version of PHP5 that includes FastCGI support. Any modern Linux distribution would come with such a similar package management system. After installation, run the following command to confirm that PHP has FastCGI enabled.

Stage 2 Spawn the FastCGI server

PHP5-CGI binary supports to serve up as a FastCGI server. However, setting up the environment is complicated with PHP5-CGI binary. Instead, we can use a general FastCGI spawn-er from Lighttpd to help create the service. Download the latest version of Lighttpd from here, extract the package, run the configure script, make, and copy spawn-fcgi binary to /usr/bin.

Then we can spawn the PHP5-FastCGI like this:

This command will instantiate two PHP5 FastCGI processes (each of which have 5 children) and bind them to 127.0.0.1 (localhost) and port 16000. So we have ten processes listening for PHP requests. The PHP processes run under www-data permission.

Stage 3 Build Nginx

Imagine how one man can beat the world? Nginx (Engine X) is a blazingly super fast HTTP server written by Ignor Sysoev. According to Netcraft in December 2008, Nginx serves or proxied 3.5 millions of virtual hosts in the 3rd place of the market. 2 of Alexa Top-100 sites use Nginx.

Download Nginx from its official site and extract the tarball, then run:

Nginx is configured with most useful modules. Note that –http-client-body-temp-path, –http-proxy-temp-path and –http-fastcgi-temp-path are cache directories used by Nginx. Default user and group can be configured to the system’s default user for http service instead of nobody, although they can also be configured at runtime.

Stage 4 Run Nginx

Starting up Nginx is simple and straight. After properly configuring your nginx settings, just type nginx and hit return. Then it will start. I also provide a set of Nginx configuration here to simplfy your process. There are several important pieces of code to make Drupal work under Nginx in the configuration.

The location context for PHP scripts makes Nginx talk to PHP FastCGI server. And the if context for rewrite makes Drupal support clean URLs.

You’re done!

Download Nginx configuration files

References

An old thread from the official Drupal forum.

A Mobile Website in Drupal

How can you set up a website for mobile browsers in five hours?

First, we have websites that have RSS output, such as UPEI‘s website, so you can use Drupal to aggregate news and information from them. The mobile version should not generate content, but it serves only as an aggregator. Drupal’s cron job will automatically update feed items. UPEI’s mobile website aggregates feeds from UPEI websites, including media releases, department notices, and other feedable information.

Second, we use a mobile theme for Drupal as the basic theme for mobile browsers. This theme places blocks from top to bottom, including left sidebar, content top, and right sidebar. The navigation menu can be placed in the left sidebar. We also need to modify the template file page.tpl.php to suit our need, such as the header and footer and other signatures. We have to change

Third, we use an override stylesheet to provide extra styles for Webkit based browsers, such as MobileSafari on iPhone and Android’s browser. This stylesheet overrides font sizes and display element sizes and word break settings.

Then there is the final product (Use your iPhone!).

General Representation for Drupal feeds

FeedAPI is an excellent module to deliver information from the outside world into a Drupal installation. Currently FeedAPI supports to import RSS nodes, which include most applications. But legacy systems (which are probably 3 to 7 years old) and most enterprise information systems have no RSS output that supports only a few fields as a content type. In such an occasion, a more generic format is required to feed data into Drupal with FeedAPI. One of the best practices is to acquire information from legacy systems in XML format that is supported by a FeedAPI’s parser (which is already described in the previous post). Although any format is parsable, XML is more intuitive.

Generally speaking, FeedAPI accepts a list of items that are described in the same format. To generalize FeedAPI usage of feeds, a simple XML format is proposed:

In the above example, a simple XML wrapper is used to encasulate the actual data (foo’s item description). Foo’s original XML elements an be directly ecansulated into the data field with proper namespace settings.

Elements

feed

This element has four attributes, including id, title, description, and link. id is a unique identification string in a drupal installation, it is imported into drupal as the guid field. title and description are respectively title and description in a FeedAPI feed. When a feed in the above format is imported, these two fields are imported as the feed’s title and description. link is imported as the original_url field for a FeedAPI feed. In current implementation, guid and original_url must have at least one exist. In my SimpleXML parser implementation, guid (the id attribute) is required.

The feed tag consists of a series of node tags.

node

The node tag is a container for a node’s details. Currently only data tag is allowed under node tag. Other tags can also be added under node to specify any node information. node has also the four attributes that are described in feed to represent drupal-specific meta data.

data

The data tag is a container of the original data that is converted (or directly copied) from an original XML. A namespace is suggested to indicate the source of the data. Any XML data can be added under data tag, however, the format must be consistent so the parser is able to get the same set of information for each node.

Feed Synchronization

FeedAPI acts a standard aggregator’s behaviour. For example, when FeedAPI is watching a feed that updates periodically, the items of the watched feed are updated rather than synchronized to the drupal site. This behaviour suggests that if an item does not exist in the updated feed, it is not removed from the drupal site. While this behaviour is suitable for most drupal sites that act as news aggregators, it is not suitable in some enterprise applications that need real synchronization between the presentation layer and the EIS. Two methods can be implemented to synchronize data between the presentation layer. One method is that EIS exports incremental information of data (which is the difference between the previous revision and the current revision of data), and the presentation layer parse and apply the incremental data. The other method is that EIS exports all data, the presentation layer analyze the difference between the previous revision and the current revision, and remove deleted items that do not exist in the latest update from EIS.

Although the former method is usually more optimized, the EIS with which I am working is a legacy system that supports no data warehousing technologies — in short, it contains no timeline data and fails to support row revisions. Therefore, my implementation is limited to the latter method. Fortunately, FeedAPI provides a flexible interface that allows me to implement the synchronization without contaminating FeedAPI’s source code.

FeedAPI provides feedapi_refresh_feedapi hook for parsers and processors to post-process a feed after refresh. Synchronization of feeds will depend of this post-process mechanism.

This implementation of feedapi_refresh_feedapi hook provides a synchronization mechanism to remove all deleted items from the imported feed. However, drupal’s node_delete function does permission check against current user, while the routine checks a feed using drupal’s cron. With node_delete, an anonymous user is unable to remove items. So this hook circumvents the permission check. Although it introduces a possible security leak, this hack is neccessary unless a better cron is implemented.

Update: Due to FeedAPI’s mechanism to deal with unique feed items, an item’s ID must be unique across ALL feeds rather than in one feed.

Data Feeding in Drupal

Data Feeding is an important topic in Enterprise Information Systems (EIS). In a normal three-tier enterprise application, the web tier pulls data from the EIS tier that usually exists in an internal EIS. Java EE systems directly employs its API and presentation layer tags, or seldomly, uses another regulated format (XML usually) to pull the data. The former method involves code-level or API-level compatibility, therefore, it is not recommended unless the web tier requires to. The latter method decouples API-level dependency between the web tier and the EIS tier. Drupal is a flexible and extensible platform written in PHP and has excellent performance when properly configured. Because its ease of development and usage, many small businesses and organizations deploy Drupal in the presentation layer as their external or internal websites. FeedAPI is an extensible interface to Drupal. It supports importing feed-based information from another spot including legacy EIS that supports exporting structured lists of data.

As a community effort, FeedAPI is intially written to aggregate RSS feeds from other websites. However, since it has a sophisticated extensible interface, other parsers and processors can be easily added to process other structured lists, for instance, XML lists. Since all EIS can actually export data into XML files, XML is a perfect format to handle structured EIS data.

A node list XML format is designed for EIS to prepare data for FeedAPI.

This XML format data are then interpreted by a FeedAPI parser module, SimpleXML parser, which employs PHP 5 SimpleXML extension to parser XML data. The parser will interpret the XML data so FeedAPI can update the nodes into Drupal. The SimpleXML parser is very similar to the SimplePie parser but with a XML parsing function. The parsing function code snippet is as following:

This parser convert the received XML data into a FeedAPI array. Attributes of a node, including id, title, description, and link are interpreted into a FeedAPI item’s guid, title, description, and link. A FeedAPI’s guid is a unique id among all feeds. It should be unique in a Drupal installation.