
James Myers, Al Geist, Jens Schwidder, Matt Elder, Alan Chappell, David Jung, Prasad Saripalli, Tara Talbott
During the last two quarters, the SAM team focused on completing core functionality across the Metadata Management Services (MMS) and Notebook Services (NS) layers of SAM, including database mapping and configurable security, culminating in a SAM 1.0 release in June 2003. SAM MMS provides a middleware solution for providing HTTP and webDAV access to content in file systems and databases with capabilities for metadata generation and data translation. SAM NS builds upon the MMS layer to provide modular services for implementing an electronic notebook. As a demonstration of NS capabilities, the SAM team has implemented a module for the Electronic Laboratory Notebook (ELN) 5.0 client that makes it SAM NS compatible and capable of working with SAM-based ntoebooks. As noted below, several SAM-related papers and presentations were given during the last two quarters, and SAM team members continued to actively work with collaborators and the community to define relevant capabilities and interface standards. Ongoing work includes preparation for an August project review, implementation of a SAM 1.1 release with incremental MMS and NS enhancements, and planning for a SAM 2.0 release during FY04 including Semantic Service functionality.
GridFTP Integration: The SAM team is currently investigating the use of GridFTP in two contexts: A GridFTP Data Store that can be used by SAM to store submitted content, and GridFTP as an alternate file-like access protocol for SAM. The former capability is now being prototyped using the Java COG kit, while the latter, which would potentially involve the Extended Retrieval/Extended Store capabilities, is still being investigated.
Semantic Services: With the release of significant metadata mananagement functionality in SAM 1.0, and growing community interest in semantic data mapping, the SAM team is shifting emphasis towards the detailed design of the Semantic Services (SS) layer. Several initial capabilities are being developed to help elicit requirements while design work proceeds towards a more comprehensive mechanism. In particular, a pedigree/provenance property has been defined whose value is dynamically generated based on a description of pedigree in terms of other relationships. The pedigree definition and property value are both in the Resource Description Framework (RDF) format. An additional mechanism to export an RDF description of all the existing webDAV properties on a resource, which will make SAM metadata available for processing in RDF-capable applications and agents, is in progress. This mechanism will make assumptions about the meaning of properties (i.e. that they all refer to the curent resource as the subject of a subject-verb-object relationship). Feedback from CMCS and other projects on the usability of these capabilities will help guide the overall design for the SS layer.
SAM Notebook Interface: Work Is continuing on the design of a componentized notebook interface to enhance/replace the transitional notebook. To elicit further requirements, some preliminary development work is being undertaken to make components of the exiting ELN interface available as portlet-wrapped applets. The note creation/submission capability of the ELN will be the first such applet and, as noted below, the SAM team will be working with collaborating projects to understand how such a component can be used to provide third-party annotation as well as domain specific data annotation capabilities. Feedback from these efforts will be combined with emerging next-generation notebook concepts to refine component designs.
Slide 2.0/3.0 Migration: In anticipation of a Slide 2.0 release by the Apache Jakarta developers, and the development of Slide 3.0 as a reference implementation of the Java Content Repository (JSR 170) standard, the SAM team is continuing to investigate the changes that will be necessary to migrate SAM MMS and notebook functionality. Slide 2.0 will be a significant upgrade that adds support for DAV versioning and the DASL search language, both of which may impact implementation strategies for SAM capabilities for metadata generation and data translation. Slide 3.0 will provide a higher-level server-side API and will standardize some of the functionality for messaging and configurable security that the SAM project has added to Slide.
SAM 1.0 Release: A 1.0 release of SAM including Metadata Management and Notebook Services has been released, along with an updated SAM-compatible Electronic Lab Notebook (ELN) 5.0 client. Significant efforts were made to provide a simple installation procedure, to create a useful default configuration, and to develop documentation. The SAM 1.0 Release web page includes links for downloading the software, installation instructions, a feature list, information on configuring SAM and using it's APIs, ELN Help pages, a FAQ, and for requesting support. Open Source licensing of SAM 1.0 is being pursued.
Support For External Data Transformations: The existing metadata generation and translation mechansisms in SAM have been enhanced to support the use of external web-service-based data transforms and to support multistep transformations that may include BFD, web service, and XSLT steps (in that order). This capability will be included in SAM 1.1.
SAM Notebook Services and Transitional Electronic Notebook: A suite of notebook services that will provide a full superset of functionality available in the DOE2000-developed electronic notebooks is under development. Basic capabilities for submitting content to SAM and viewing notebook pages generated from SAM content were demonstrated at the SC02 conference. A variety of enhancements have been made to these capabilities to support, for example, referencing existing SAM content in a notebook (versus uploading new files), and using translators registered with the SAM MMS layer to display page content. We have also developed services to support dynamic creation of notebooks via a web interface and dynamic listing of the available notebooks on a server. These capabilities were release as part of SAM 1.0. Additional capabilities for managing access to notebooks, configuring email notification, and performing other administrative tasks are currently being developed for a 1.1 release expected in August. In collaboration with the CMCS project, a portlet mechanism for launching the ELN client has been developed, and we have demonstrated that relationships between data and ELN-generated notes can be viewed, along with relationships generated by other applications, using the graphical CMCS pedigree browser.
Performance Enhancement: In collaboration with the CMCS project, work was performed to profile SAM performance on a variety of WebDAV operations and to investigate potential performance enhancements. Recent tests are showing a factor of up to 5 times improvement in retrieval times and we are continuing to investigate a number of potential changes to the Jakarta Slide internals to provide additional increases.
DAV-Database mapping: The mechanism for connecting to “arbitrary” back-end databases completed at the end of 2002 has been enhanced to support a model in which metadata is queried dynamically from the underlying database rather than cached within SAM's local metadata store. Work to document the mapping language and test additional use cases is continuing.
Configurable Authentication and Authorization: The SAM team has implemented a Java Authentication and Authorization Service (JAAS) based mechanism for plugging alternate authentication mechanisms into SAM. Initial capabilities in this area were demonstrated at Supercomputing 2002. For SAM 1.0, work was done to enhance this mechanism to allow additional credential information to be passed to SAM and to create a Grid Security Infrastructure (GSI) / Globus 2.0 compliant JAAS module. This GSI authentication module communicates with a MyProxy server to authenticate the user and retrieve a Grid proxy certificate in their name. The implementation is based on the Java COG Kit and uses a map file to match the Grid credentials to the local SAM user account. Additional JAAS login modules supporting Globus 2.2.1 and Globus 3 are being developed, as is a mechanism to allow anonymous access to public areas of SAM while enforcing strong authentication on access controlled areas.
An alternate packaging of the authorization mechanism as servlet request filters has been developed as alternative to existing implementation as a Tomcat Realm. This work allows SAM's authentication plugin mechanism to be used with other servlet-based application servers. JAAS modules for a specific authentication mechanism are compatible with both the Realm and Filter implementations. The Filter-based mechanism will be available in the SAM 1.1 release.
Work is also being done to develop a general mechanism for supporting alternate authorization mechanisms. In a demonstration at the SC2002 conference, the ability to replace Slide's standard username/password authentication and access control list security with policy-based fine-grained access control (using a 'random' policy as a demonstration) via a configuration option. Work is proceeding to separately encapsulate authorization requests ( a 'principal' requesting to perform an 'action' on a particular 'resource') and the management of indirect principals such as groups, and roles. These modifications should allow SAM to be used with a wide range of access control list (ACL) and policy-based access mechanisms.
Notarization Service: An initial implementation of a SAM Notarization web service has been completed. The service generates a digital signature and time stamp on data digests submitted by clients such as an electronic notebook, thus providing persistent evidence of data integrity. The Notary service is currently envisioned as a service provided by trusted entities, similar to how identity certificates are now obtained via Verisign and similar entities. The Notarization service is a standard web service that supports signing/notarization as well as optional storage/querying of existing stored notarization records. Notarization records use the XML cryptographic signature standard. A separate Notarization proxy service has been created providing functionality useful within a trusted domain. The service acts as a proxy, giving the (virtual) organization the ability to route all clients to a particular outside service, and also provides services for computing canonical data formats and data digests and verifying notarization records.
SAM team members participated in a wide range of meetings, workshops, reviews, and collaboration discussions in the first 6 months of 2003:
Java Content Repository API: The Java Community Process: JSR 170 group is nearing the 1rst public release of it's Java Content Repository API standard specification, Myers (member of the Expert Group). Work is ongoing within the Jakarta Slide project (which is used by the Scientific Annotation Middleware) to be a JSR 170 Reference implementation.
Portal Framework/Middleware: Members of the SAM project teams are involved in ongoing discussions for the Development of Grid User Computing Environments (GUCE) Project, with developers of the CHEF portal framework at U. Michigan and the NSF NMI funded GUCE effort. The purpose of the collaboration is to coordinate the development of an open source science/education portal toolkit that would include tools such as content repositories and electronic notebooks.
Grid Information Retrieval Working Group: The SAM team is participating in discussions in the GIR working group related to search and retrieval of scientific data in Grid environments and contributing to several Grid documents.
Data Format Description Language (DFDL) Working Group: The SAM team is working to leverage the development of the Binary Format Description (BFD) lanaguage in creating a Grid standard for describing the syntax and semantic labelling of scientific data. Alan Chappell is serving as co-chair of this working group.
Collaboratory for Multiscale Chemical Science (CMCS): On ongoing collaboration related to the use of SAM as the primary CMCS data/metadata management system. Modifications to SAM were made to support anonymous browsing of public data managed by SAM, supporting the CMCS model of allowing groups to publish data for unrestricted public access.
Discussions with potential user groups: The SAM team has held a number of discussions and demonstrations with projects involved in biology, climate research, astrophysics, and other disciplines, as well as software developers interested in data/metadata management capabilities. We anticipate at least some of these discussions will proceed to pilot testing of SAM and more formal collaborations.