
James Myers, Al Geist, Elena Mendoza, Jens Schwidder, Alan Chappell
The SAM team's focus during the last quarter has been the development of SAM's Metadata Management Services layer. As detailed below, significant progress has been made in developing MMS capabilities. Versions of MMS basic metadata/data management functionality, notification capabilities, and client-side utilities have been created and are available to collaborators for initial testing and development efforts. Effort continues in refining project requirements, investigating relevant technologies, and refining detailed design and implementation plans. Team members have also been active in community activities this quarter.
Revised requirements documentation and an outline of specific implementation details for SAM MMS functionality have been posted to the website as an aid to collaborating projects and to elicit feedback. These documents will be 'living' and will be periodically updated.
Work continues to define a mechanism to design a security framework that allows using different authentication and authorization technologies. Efforts are underway to understand details of generic security technologies such as the Java Authentication and Authorization Service (JAAS) and to identify the appropriate integration points within the SAM middleware.
Additional design details are being generated for the implementation of metadata generation and translation services, as well as for the implementation of a transitional DAV-enabled electronic notebook.
Messaging We've extended the Slide webDAV servlet to publish Java Messaging Service (JMS) messages for any operation that changes data or metadata on the server. This implementation was developed using the open source OpenJMS engine and has slight dependencies on it. The JMS properties reported, which can be used by JMS subscribers to filter messages that they receive, are shown below. Configuration options can be set in the servlet's web.xml file. An option is available to report the values of updated DAV properties in the JMS message body. Subscribers are responsible for contacting the DAV server directly for any additional information required (i.e. to retrieve the content of a new DAV object).
| Properties reported (with example values) | ||
|---|---|---|
| JMS Property | DAV Method(s) | Example Value |
| SAM_DavMethod: | All | PROPPATCH |
| JMSXRcvTimestamp: | All | 1017243344844 |
| SAM_Timestamp: | All | 27 Mar 2002 15:35:44 GMT |
| SAM_HttpStatusCode: | All | 207: Multi-Status |
| SAM_Userid: | All | JimMyers |
| SAM_ResourceURL: | All | http://localhost:8080/slide/files/get_interests.html |
| SAM_DavPropertiesAdded: | PROPPATCH | cmcs:CASNumber, dc:contributor |
| SAM_DavPropertiesRemoved: | PROPPATCH | test |
| SAM_NewResourceURL: | COPY/MOVE | http://localhost:8080/slide/files/testcopy |
This software is available upon request and will be posted to the SAM website.
Metadata generation & translation Two efforts are progressing in this area. In one, servlets are being built that will allow users to specify a source file and an XSLT or Binary Format Description (BFD)translator and to receive the result back via the browser. This capability is being developed as a prototype for integration with the DAV server and as a stand-alone utility that will aid in the development and refinement of translators. In the second, the BFD language and engine have been extended to support additional binary format elements common in files produced from Fortran codes. These updates have been tested using output from Chemkin, a widely used Fortran chemistry application of interest in the CMCS project.
Security/JAAS In support of the research into JAAS, we've developed a simple application exercising JAAS capabilities that can be used as a demonstration of the concept of developing security-implementation-independent software and as test harness for developing new JAAS provider libraries. At present, the prototype includes a JAAS provider that makes authentication decisions based on an XML configuration file.
webDAV Client
We have extended the webDAV client software that is part of the Slide project with functionality targeted towards site management and working with complex properties. Specifically, we have implemented a site-to-site copy capability DAV objects between servers. This capability is critical for development efforts. Additionally, we have implemented support setting and retrieving DAV properties in non-DAV: namespaces and with nested XML values. These capabilities are allowed by the DAV protocol and are implemented in the Slide server, but were not yet available in the webDAV client. This functionality will be important for working with metadata involving multiple namespaces, e.g. Dublin Core for pedigree information and RDF for semantic relationships.
SAM team members participated in a variety of meetings, workshops, and other discussions during the last quarter:
SciDAC/NC investigators meeting, January 15-17, 2002: The SAM project was presented as a poster, was mentioned in the CMCS poster, and was represented on the "Metadata" panel. These, and similar presentations by other SciDAC and NC projects led to significant information exchange and new contacts. It appears that the primary interest in SAM at present is as a means to manage pedigree information and to handle heterogeneous/evolving metadata. The conceptual prototype for a pedigree browsing component generated significant discussion and it became clear that this component should be a priority for development. As a result of the meeting, we are having continuing information exchanges with the Earth Systems Grid (ESG) and "Middleware Technology to Support Science Portals" projects.
Global Grid Forum 4, February 17-20, 2002:A relevant highlight of this meeting was the repeated call for metadata services, described in terms of metadata catalogs, provenance/pedigree systems, notebooks, workflow records, etc. across several working groups. This seems to be being driven by the current wave of ‘community-level’ Grid projects that are dealing with globally shared data sets (data from accelerator runs, climate models, observatories, etc.) Communities are struggling to standardize and categorize their metadata while data storage projects are trying to extend their functionality to more fully/broadly support the use of metadata. A session was held to focus on standardizing ‘applications’ metadata – primarily specification of the run-time environment required by an application (e.g. directories, libraries, disk space needed), but potentially including aspects of pedigree. It appears that this meeting will result in the collection of existing applications metadata schema within the Grid Computing Environments working group. A good general listing of pedigree requirements was given by members of the EU DataGrid effort: see in particular pages 14-25 of http://www.cs.man.ac.uk/grid-db/papers/Requirements.pdf.
Good Laboratory Practice (GLP) Compliance Strategies workshop, February 21-22, 2002, Philadelphia, PA:Jim Myers gave a talk entitled "Implementing a Software System that is 21 CFR part 11 Compliant: Improving Electronic Laboratory Notebook (ELN) Documentation".
"Defining the Mandate for Proteomics in the Post-Genomics Era", February 25, 2002, The National Academies, Washington, DC: Jim Myers co-chaired a breakout session on "Policy and Infrastructure for Collaboration". A written report from the workshop is expected within the next 4-6 months. In the breakout, several aspects of collaboration were discussed that are somewhat unique to proteomics - that, like genomics, proteomics promises many practical medical benefits that strongly drive the push from research to production/ application, that there is an overall shortage of trained experts (especially multidisciplinary experts), that collaboration is needed to identify opportunities to address completeness (supplementing work done for specific projects to create a community resource than can be mined for other purposes), etc. Intellectual property and credit issues were seen as key barriers. The fact that proteomics researchers share physical samples was seen as a unique addition to the traditional model of collaboratories focusing on shared, data, instruments, and expertise. Potential solutions were seen in both the technology and policy areas - metadata can be used to track intellectual contributions at a finer grain which might help tenure committees to find metrics other than first-authorship on publications to evaluate candidates, fellowships and sabbaticals can provide training opportunities and collaboratories can make remote fellowships a possibility, user facilities, and virtual user facilities, can also help address the scarcity of resources and expertise.
Electronic Notebooks as Official DOE Records (ongoing): The SAM project is participating in discussions at PNNL to define the specific steps that will be necessary to deploy an electronic notebook as an official laboratory record. SAM is providing technical guidance related to digital signatures and timestamps and gathering operational requirements for advanced notebooks.
Collaboratory for Multiscale Chemical Science (CMCS) (ongoing): Discussions continued regarding requirements for SAM event publishing, pedigree support, and the use of advanced DAV capabilities (e.g. versioning, search). Several of the tools noted above have been made available to CMCS for beta testing.
"Laboratory Notebooks 2002", (upcoming, Philadelphia, May 16-17, 2002): Al Geist has been invited to present a talk entitled " Supporting the Laboratory Notebook and Its Information—Capturing and Storing Meta-Data" that will describe the SAM infrastructure supporting notebooks.