The Manifest Pattern

The Collections Interoperability Hub (described in my previous post) provides access to content from any number of external repositories using a single, standard, web-service based interface and implements some basic shared conventions for content. In this post I want to describe the approach we are using to handle requests – that has some important advantages.

The idea was first brought up in one of our weekly calls with the workspace group. (I do not remember who suggested it). The idea was to support the discovery of content in contributing repositories using the Zotero browser plug in. The user would find items of interest, bookmark them with Zotero, and then upload the bookmark file. Out code (a modified OpenCMIS server) then parses the bookmark file, then provides access to the bookmarked content.

This is a specific instance of what I call the Manifest Pattern. In the Manifest Pattern client system logs in for a specific user and creates a request folder in the hub repository (via the CMIS web services). The client then uploads a "manifest" document to the request folder (in the current implementation, the Zotero bookmark file, but other formats might be supported) and the system responds by acquiring (if necessary) and linking the content mentioned in the manifest to the request folder. If the manifest document changes, the system responds by re-attaching only the requested content to the folder. The Manifest Pattern will work best when coupled with a messaging system to notify clients when new content is available.

The first advantage of the Manifest Pattern concerns access control. Since the creator of the request folder owns that folder, we can insure that only the requestor gets access to the requested content. There can be no wondering in to view content requested by other clients. We can insure that the system is efficient by maintaining a single shared (invisible) cache of processed content, then providing that content to the user by linking it to the request folder only if they are allowed access. I will talk more about this in a future post on access control.

A second advantage of the Manifest Pattern is that it allow a user to create a collection of content objects that that user can access and work on from multiple workspaces and tools. Client systems can allow users to manage the content available via this folder by adding items to or removing items from the manifest document. If effect a user could be given the ability to manage and re-use their own cache of content.

The Manifest pattern can also provide updates on the linked content as these become available. It can consult a contributing repository to make sure that linked content is still available, and available to the specific user. The real potential of the Manifest approach becomes apparent when we expand our definition of what might be considered a manifest. Here are some examples:

Additional bookmark formats, perhaps supporting content requests from other open sources (Wikipedia Commons for example) as well as repositories with established relationships.

A folder of XML documents representing requests for the products of processing by external tools. For example a request that a document be processed by MorphAdorner. The processed content would be added to the request folder.

A Business Process Execution specification (or other workflow definition) defining a process for acquiring and processing content.

A Java jar file containing code to be executed in a sand boxed (restricted) environment. The code might be executed on specific content objects. Similarly a restricted Python or Ruby script.

I recently had a conversation with our media architect about an unrelated project that involved working with image content from the California Digital Library. The situation illustrates how flexible the current architecture is. His group creates online journals, blogs, and collaborative sites using Drupal.

I suggested that if we had a connector for CDL content, he could identify the images he wanted to incorporate in his site using Zotero, upload the bookmark file, and then access the content via an Atom feed provided by the CMIS web service. Since Drupal already supports Atom (and there is also a PHP-based CMIS library) the selected images could be incorporated in his Drupal sites with almost no programming or customization.