
James Myers, Al Geist, Jens Schwidder, Alan Chappell,, Tara Talbott, Mike Peterson, David Jung, Prasad Saripalli, Matt Elder
During the last quarter, the SAM team focused on preparation for an August project review, implementing a SAM 1.1 release with full support for the ELN notebook client, and planning for a SAM 2.0 release during FY04 including Semantic Service functionality. As noted below, several SAM-related papers and presentations were given during the last quarter, and SAM team members continued to actively work with collaborators and the community to define relevant capabilities and interface standards. Ongoing work includes development of SAM 2.0 including support for versioning and semantically-scoped queries, development of the Data Format Description Language (DFDL) within Global Grid Forum, and preparation for SuperComputing 2003.
Data Grid Integration: The SAM team is currently investigating options for integrating SAM's naming, annotation, translation, and records capabilities with underlying Data Grid repositories. Initial development work is being performed to demonstrate connections to Data Grids via GridFTP using the Java COG kit.
Slide 3.0 Migration: In anticipation of a Slide 3.0 release as a reference implementation of the Java Content Repository (JSR 170) standard, the SAM team is continuing to investigate the changes that will be necessary to migrate SAM MMS and notebook functionality. Slide 3.0 will provide a higher-level server-side API and will standardize some of the functionality for messaging and configurable security that the SAM project has added to Slide.
Semantic Grid: With the release of initial RDF capabilities functionality in SAM 1.1, and growing community interest in semantic data mapping, the SAM team is shifting emphasis towards the detailed design of the Semantic Services (SS) layer and Semantic Grid concepts. Several initial capabilities are being developed to help elicit requirements while design work proceeds towards a more comprehensive mechanism. In particular, a pedigree/provenance property has been defined whose value is dynamically generated based on a description of pedigree in terms of other relationships. The pedigree definition and property value are both in the Resource Description Framework (RDF) format. An additional mechanism to export an RDF description of all the existing webDAV properties on a resource, which will make SAM metadata available for processing in RDF-capable applications and agents, is in progress. SAM team members are preparing papers in this area that will be presented during the next quarter.
SAM 1.1 Release: A 1.1 release of SAM has been released, along with an updated SAM-compatible Electronic Lab Notebook (ELN) 5.1 client. Significant efforts were made to further simplify installation procedure and the default configuration. New functionality includes support for anonymous access to public data, web-based SAM and notebook management interfaces, completion of the SAM transitional notebook client (ELN 5.1) and notebook services, RDF export of pedigree information, an initial notarization service implementation. SAM performance was improved via inclusion of an optimized data store developed by Tom Allison (NIST). Logic for launching the ELN client was modified to support Mac OS X. The SAM 1.1 Release web page includes links for downloading the software, installation instructions, a feature list, information on configuring SAM and using it's APIs, ELN Help pages, a FAQ, and for requesting support.
Open Source Software Licensing: During the last quarter, the SAM team has held numerous discussions with PNNL Intellectual Property staff and Paul Gottlieb (DOE HQ) to update the PNNL Open Source process and develop a standard OSS license template. A significant aspect of this work was agreement that ongoing project work could be hosted at third-party sites such as SourceForge. This work is essentially complete, which will allow release of the ELN 5.1, SAM 1.1, and BFD 1.1 software in the next quarter.
Slide 2.0 Migration: Migration to the Slide 2 codebase has begun. This will allow significant enhancements to SAM including support for DAV versioning, binding (hard links) and the DASL search language, all of which impact implementation strategies for SAM capabilities for metadata generation, data translation, and semantic services. This upgrade will also allow support of https in the standard SAM client library.
Support For External Data Transformations: The existing translation mechanism in SAM have been enhanced to support multi-step transformations that may include BFD, and/or XSLT steps. This capability is included in SAM 1.1. A further enhancement to support BFD, web service, and XSLT steps (in that order) is in development.
SAM Notebook Interface: Work is continuing on the design of a componentized notebook interface to enhance/replace the transitional notebook. Preliminary development work has been done to make components of the exiting ELN interface available as portlet-wrapped applets and a lightweight note creation/submission capability using this work has been created by a collaborating project.
SAM Notebook Services and Transitional Electronic Notebook: A suite of notebook services that provide a full superset of functionality available in the DOE2000-developed electronic notebooks has been developed. Capabilities for managing access to notebooks, configuring email notification, and performing other administrative tasks have been released as part of SAM 1.1. Work was also performed to remove a long-standing limitation on being able to automatically launch the ELN client on platforms such as Mac OS X; need for "LiveConnect" Java-Javascript communication interface has been eliminated in this release.
Configurable Authentication and Authorization:
The SAM team has implemented a Java Authentication and Authorization Service (JAAS) based mechanism for plugging alternate authentication mechanisms into SAM. During the last quarter, the SAM team has worked with collaborators to test these capabilities and connect to MyProxy servers to authenticate the user and retrieve a Grid proxy certificate in their name.
An alternate packaging of the authorization mechanism as servlet request filters has been developed as alternative to existing implementation as a Tomcat Realm. This work allows SAM's authentication plugin mechanism to be used with other servlet-based application servers. JAAS modules for a specific authentication mechanism are compatible with both the Realm and Filter implementations. The Filter-based mechanism is available in the SAM 1.1 release.
Work is also being done to develop a general mechanism for supporting alternate authorization mechanisms. Work has been done to separately encapsulate authorization requests (a 'principal' requesting to perform an 'action' on a particular 'resource') and the management of indirect principals such as groups, and roles. These modifications, and related web-based administration interfaces, are available in SAM 1.1 as a configurable option.
SAM team members participated in a wide range of meetings, workshops, reviews, and collaboration discussions during this quarter:
Java Content Repository API: The Java Community Process: JSR 170 group released the 0.8 draft of its Java Content Repository API standard specification to the Java Community for review, Myers (member of the Expert Group). Work is ongoing within the Jakarta Slide project (which is used by the Scientific Annotation Middleware) to be a JSR 170 Reference implementation.
Portal Framework/Middleware: Members of the SAM project teams are involved in ongoing discussions for the Development of Grid User Computing Environments (GUCE) Project, with developers of the CHEF portal framework at U. Michigan and the NSF NMI funded GUCE effort. The purpose of the collaboration is to coordinate the development of an open source science/education portal toolkit that would include tools such as content repositories and electronic notebooks.
Grid Information Retrieval Working Group: The SAM team is participating in discussions in the GIR working group related to search and retrieval of scientific data in Grid environments and contributing to several Grid documents.
Data Format Description Language (DFDL) Working Group: The SAM team is working to leverage the development of the Binary Format Description (BFD) language in creating a Grid standard for describing the syntax and semantic labeling of scientific data. Alan Chappell is serving as co-chair of this working group.
NSF SBIR Program/Educational Technologies, Myers participated as an ad-hoc reviewer
Collaboratory for Multiscale Chemical Science (CMCS): On ongoing collaboration related to the use of SAM as the primary CMCS data/metadata management system. Modifications to SAM were made to support anonymous browsing of public data managed by SAM, supporting the CMCS model of allowing groups to publish data for unrestricted public access.
Genomes To Life Program Two of the five large GTL centers are investigating the use of SAM technology. One application currently being developed is a Matlab-like biology tool that will be able to access genomics and annotation data and meta-data stored in SAM repositories, perform tasks on the data, and write the results back to yet other repositories. Another task in the GTL project is the development of a bio-aware electronic notebook system that will provide capabilities for handling data types natural to the life sciences. It is anticipated that this notebook implementation will use the software developed by the SAM project including the notebook, meta-data and semantic services.
Discussions with potential user groups: The SAM team has held a number of discussions and demonstrations with projects involved in biology, climate research, astrophysics, and other disciplines, as well as software developers interested in data/metadata management capabilities. Some of these discussions are proceeding to pilot testing of SAM and the ELN.