Feed Synchronization

FeedAPI acts a standard aggregator’s behaviour. For example, when FeedAPI is watching a feed that updates periodically, the items of the watched feed are updated rather than synchronized to the drupal site. This behaviour suggests that if an item does not exist in the updated feed, it is not removed from the drupal site. While this behaviour is suitable for most drupal sites that act as news aggregators, it is not suitable in some enterprise applications that need real synchronization between the presentation layer and the EIS. Two methods can be implemented to synchronize data between the presentation layer. One method is that EIS exports incremental information of data (which is the difference between the previous revision and the current revision of data), and the presentation layer parse and apply the incremental data. The other method is that EIS exports all data, the presentation layer analyze the difference between the previous revision and the current revision, and remove deleted items that do not exist in the latest update from EIS.

Although the former method is usually more optimized, the EIS with which I am working is a legacy system that supports no data warehousing technologies — in short, it contains no timeline data and fails to support row revisions. Therefore, my implementation is limited to the latter method. Fortunately, FeedAPI provides a flexible interface that allows me to implement the synchronization without contaminating FeedAPI’s source code.

FeedAPI provides feedapi_refresh_feedapi hook for parsers and processors to post-process a feed after refresh. Synchronization of feeds will depend of this post-process mechanism.

This implementation of feedapi_refresh_feedapi hook provides a synchronization mechanism to remove all deleted items from the imported feed. However, drupal’s node_delete function does permission check against current user, while the routine checks a feed using drupal’s cron. With node_delete, an anonymous user is unable to remove items. So this hook circumvents the permission check. Although it introduces a possible security leak, this hack is neccessary unless a better cron is implemented.

Update: Due to FeedAPI’s mechanism to deal with unique feed items, an item’s ID must be unique across ALL feeds rather than in one feed.

Data Feeding in Drupal

Data Feeding is an important topic in Enterprise Information Systems (EIS). In a normal three-tier enterprise application, the web tier pulls data from the EIS tier that usually exists in an internal EIS. Java EE systems directly employs its API and presentation layer tags, or seldomly, uses another regulated format (XML usually) to pull the data. The former method involves code-level or API-level compatibility, therefore, it is not recommended unless the web tier requires to. The latter method decouples API-level dependency between the web tier and the EIS tier. Drupal is a flexible and extensible platform written in PHP and has excellent performance when properly configured. Because its ease of development and usage, many small businesses and organizations deploy Drupal in the presentation layer as their external or internal websites. FeedAPI is an extensible interface to Drupal. It supports importing feed-based information from another spot including legacy EIS that supports exporting structured lists of data.

As a community effort, FeedAPI is intially written to aggregate RSS feeds from other websites. However, since it has a sophisticated extensible interface, other parsers and processors can be easily added to process other structured lists, for instance, XML lists. Since all EIS can actually export data into XML files, XML is a perfect format to handle structured EIS data.

A node list XML format is designed for EIS to prepare data for FeedAPI.

This XML format data are then interpreted by a FeedAPI parser module, SimpleXML parser, which employs PHP 5 SimpleXML extension to parser XML data. The parser will interpret the XML data so FeedAPI can update the nodes into Drupal. The SimpleXML parser is very similar to the SimplePie parser but with a XML parsing function. The parsing function code snippet is as following:

This parser convert the received XML data into a FeedAPI array. Attributes of a node, including id, title, description, and link are interpreted into a FeedAPI item’s guid, title, description, and link. A FeedAPI’s guid is a unique id among all feeds. It should be unique in a Drupal installation.