The Collections Interoperability Hub

Here is a simplified description of CMIS: it provides a folder and document model where folders and documents have types and properties. Documents also wrap one or several content streams (I am skipping over policy and relationship objects for simplicity). The folder and document hierarchy is exposed via RESTful web service using a two-way version version of the ATOM syndication format.

The CI hub is based on a modified version of the OpenCMIS server code that is part of the Apache Chemistry project. (see: http://incubator.apache.org/chemistry/opencmis.html) The OpenCMIS code base provides a CMIS interface that supports a plug-in server to provide content and property information that is published through web services. We took one of the existing server plug-ins and extended it to create something that looks to service clients like a CMIS repository -- but is actually a gateway to content from other repositories.

Our modified OpenCMIS server supports any number of connector plug-ins, each of which implements a simple interface and provides content from a particular flavor of contributing repository. The idea is to encapsulate information about the specifics of access and local content conventions in to each flavor of connector plug-in. So far we have a connector that accesses Hathi Trust's Data API, a generic connector for Fedora repositories (in effect providing read-only CMIS access to Fedora content). Connectors for Perseus and TCP are under development.

Our connectors provide access to the original content exactly as provided by the contributing repository, but we also process that content to provide a basic rendering using uniform conventions. The goal of this "standard model" is to provide enough information for client systems to create navigation and display without having to know the specific data formats providing by the contributing repository. A client system can implement a single navigation and display mechanism without having to know if it is displaying content from Hathi or Perseus or TCP.

One of the main advantages of this architecture is that it keeps client systems from having to implement specialized code to access content from each contributing repository. By providing a single access point with some common conventions, each tool avoids having to implement its own connectors to every repository it supports. In other words it helps solve the n-to-n problem in providing collections interoperability.

In following posts I will talk about the advantages and potential of the approach we have taken in the way requests are handled that offers substantial advantages over just using a web service, and how we are going about solving issues of access control.