
James Myers, Al Geist, Jens Schwidder, Matt Elder, David Jung
The SAM team's focus during the last quarter has been to complete development of demonstrable functionality across the Metadata Management Services layer, including database mapping, configurable security, and a transitional electronic laboratory notebook. Combined with earlier efforts to implement metadata generation and translation capabilities, the work this quarter will form the basis of a SAM MMS beta release. As noted below, SAM capabilities were presented in several exhibits at the SC2002 conference in November. Team members have also been very active in SAM-related conferences and workshops this quarter.
SAM Notebook Interface: Initial work has begun to investigate technologies and design a componentized notebook interface to enhance/replace the transitional notebook. Such a task is necessary to fully leverage technologies such as XML and RDF and to allow third party applications and environments to directly incorporate notebook functionality.
Slide 2.0 Migration: In anticipation of a Slide 2.0 release by the Apache Jakarta developers, the SAM team has begun investigating the changes that will be necessary to migrate SAM MMS and notebook functionality. Slide 2.0 will be a significant upgrade that adds support for DAV versioning and the DASL search language, both of which may impact implementation strategies for SAM capabilities for metadata generation and data translation. As part of this work, we will be collaborating with the CMCS team to measure the performance of various configurations and versions of Slide and SAM with the aim of identifying potential bottlenecks that can be addressed by the SAM team.
Configurable Authentication and Authorization: Interfaces to allow use of "arbitrary" third-party authentication and authorization components/services have been developed. In a demonstration at the SC2002 conference, SAM was shown running with Slide's standard username/password authentication and access control list security, and, with a change in configuration, using an external username/password database and using policy-based fine-grained access control (using a 'random' policy as a demonstration).
SAM Transitional Electronic Notebook: Development of core capabilities for adding and viewing content in an electronic notebook was completed as demonstrated at SC2002. Creation of the SAM-based notebook involved writing DAV- and SAM-specific components implementing the newly refactored ELN5.0 notebook object storage and retrieval functionality as well as the development of preliminary SAM Notebook Services (NS) for notebook discovery and launching, page display, etc. One benefit of this design is the ability to use the ELN 5.0 client with both CGI and SAM based notebook servers.
While the CGI server, developed as part of DOE2000, implements its own file-based data store, the SAM server stores notebook content as standard DAV-resources and makes notebook object metadata, e.g. author name and creation time as well as the notebook/chapter/page/note relationships, available as standard DAV properties. Thus, notebook content and metadata are directly available to other applications and can be viewed through generic tools such as the CMCS portal explorer and pedigree browser.
Work is continuing on the transitional notebook to implement administrative and view customization capabilities that are available in the CGI-based ELN but were not yet included in the SAM-based version demonstrated at SC2002. When this work is complete, the new notebook will be released with a mechanism for migrating data from existing CGI servers.
DAV-Database mapping: Work to create a mechanism for connecting to "arbitrary " back-end databases was completed at the end of the quarter. The implementation includes a new component using Slide's Store interface that reads a registered DAV-database map and dynamically implement the mapping between the database schema and a DAV view. Some additional work to document the mapping language and test additional use cases will be required before the software is released.
SAM team members participated in several meetings, workshops, and other discussions during the last quarter:
Workshop on Data Derivation and Provenance, Chicago, Oct. 17-18, 2002 Participated in general discussions of provenance issues from file and database perspectives, co-chaired a session on “Provenance and Annotation”, co-authored position papers related to SAM and CMCS projects:
CompreHensive collaborativE Framework (CHEF) Developers Meeting, Oct. 14-15, 2002, Ann Arbor, MI Participated in discussions concerning the use of SAM MMS and notebook capabilities within the Jetspeed-based CHEF collaborative portal environment, including an informal presentation ("The Scientific Annotation Middleware Project from A Portal Perspective"
Computing in Science and Engineering special issue on Scientific Databases, 2003 Prepared a draft of an invited paper “Re-integrating the Research Record” detailing the design goals and initial implementation of SAM.
International Symposium on Collaborative Technologies and Systems, 2003 Western MultiConference, Jan 19-23, 2003 Orlando, FL Served as a member of the technical program commitee and provided peer-review of submissions. A paper, " Collaborative Electronic Notebooks as an Electronic Records: Design Issues for the Secure Electronic Laboratory Notebook (ELN)" and a demonstration of the ELN have been accepted for presentation at the conference.
Howard Hughes Medical Institute 2002 Undergraduate Program Directors Meeting: Collaborations in Science, Oct. 28-30, 2002, Chevy Chase, MD Presented a plenary talk on "The Technological Tools For Collaboration: The PNNL/EMSL Collaboratory Project"
Grid Computing Planet, Oct. 28, 2002, Boston, MA Presented a talk on "The Grid As Infrastructure for Collaborative Science" outlining DOE Grid efforts and the need for higher-level middleware such as SAM.
Advanced Collaboration Workshop, Center for Behavioral Neuroscience, Emory University, Oct. 11, 2002, Atlanta, GA Remote presentation of "The PNNL/EMSL Collaboratory Project" describing the Virtual NMR Facility and incuding a demonstration of the ELN and discussion of SAM directions.
Executive IT Life Science Forum, Dec4-5, 2002, New York, NY Invited presentation of "The Grid as Infrastructure for Collaborative Science" (revised) providing an overview of DOE Grid and Collaboratory efforts.
NSF Biological Database and Informatics (BDI) Program, 2002 Ad-hoc reviewer of a notebook-related proposal to the BDI program.
Collaboratory for Multiscale Chemistry: Provided ongoing assistance in using SAM as the primary CMCS data/metadata management system and contributed to several CMCS demonstrations given at the SC2002 conference.
Workflow and Provenance Discussions: Discussions were held with the Pervasive Collaborative Computing Environment (PCCE) and Extensible Computational Chemistry Environment investigators concerning the possibility for integrating efforts related to workflow and provenance tracking across the projects. As part of the discussions, a mini-symposium entitled "Cyberinfrastructure for Scientific Research: Managing Complexity" was held at PNNL to introduce researchers across the lab to the concepts of workflow and provenance and to describe progress being made in these areas within the SAM, CMCS, and PCCE projects.