
James Myers, Al Geist, Jens Schwidder, Tara Talbott, Mike Peterson, Alan Chappell, Carina Lansing
During the last quarter, the SAM team focused on delivering key enhancements to SAM including
Complementing this work were a large number of community interactions at major conferences and workshops including the DOE Data Management Workshop Series and continuing progress in Java and Grid related standardization efforts. SAM-related presentations were given at the Global Grid Forum and the Challenges of Large Applications in Distributed Environments (CLADE) Workshop.
Semantic Grid: Work is continuing on the detailed design of SAM's Semantic Services (SS) layer and Semantic Grid concepts. Several initial capabilities have been developed and are being evolved to help elicit requirements while design work proceeds towards a more comprehensive mechanism. The RDF pedigree/provenance property, which includes reified information concerning the software used to generate derived data, has been ported to the SAM 2.0 framework (based on Jakarta Slide 2.0). Implementation has begun on a semantic search capability accessible via WebDAV's Distributed Searching and Locating (DASL) search mechanism based in-part on the requirements identified in the JSR 170 standardization effort. It is anticipated that the CMCS project will begin using this capability to provide flexible provenance-based queries in the next quarter. At the conceptual level, the work in this area is feeding into activities such as the DOE Data Management Workshop series, the GGF Semantic Grid Research Group, and proposal development.
Data Format Description Language (DFDL):Work is continuing to design a standard for a language that can describe the content of arbitrary data files. The Grid Forum DFDL working group is very active and is currently developing an XML Schema-related syntax for the DFDL language. Examples forming a set of 'unit tests' and 'integration tests' are in development and will drive the specification. The SAM team is very involved in crafting the standard and is contributed significant concepts that derive from the developed of the BFD language and its extensions within the SAM project. As the language nears standardization, the SAM team will be investigating design options for building a DFDL parser. Interest in DFDL is growing - as a technology for persistent archiving of arbitrary data, and as a grid data virtualization mechanism.
Java Content Repository (JSR 170) Standard: The JSR 170 specification has now passed through public review and is nearing a proposed final draft. The SAM team is participating in the design of the specification and evaluating it as a potential data grid integration mechanism. To prepare for such use, the SAM team is continuing to investigate the changes that will be necessary to migrate SAM MMS and notebook functionality to the JSR 170 reference implementation being developed within the Jakarta Slide project (as a potential Slide 3.0 interface on the client and server sides). While volatility in the specification has limited progress in moving towards implementation, the JSR 170 specification is expected to move quickly towards a final version over the next few months.
SAM Notebook Services Layer: Building on work to refactor the transitional ELN notebook support within SAM to create a high-level notebook services API, ORNL will take the lead in migrating the DOE2000 eNote notebook to use SAM services. This work will involve generalization of the existing notebook service layer which will provide a migration path for eNote users and will be leveraged in the development of web-based SAM annotation and notebook interfaces.
SAM 1.2 Release:SAM 1.2 has been released to support the immediate needs of collaborating projects. It includes minor bug fixes, support for multi-step metadata generation and data transformation mechanisms that may include BFD, web service, and/or XSLT steps, and a new ServletFilter-based security mechanism that simplifies integration with third party authentication services and use of SAM 1.1 on non-Tomcat application servers. SAM 1.2 is anticipated to be the last 1.x release.
SAM 2.0:Migration to the Slide 2 codebase has been performed. On the client side, this work introduced support of https in the standard SAM library based on the Apache Commons http-client package. The SAM server has also been refactored to work with Slide 2 and to expose and exploit newly available capabilities including DAV versioning, binding (hard links) and the DASL search language. The SAM team has investigated new data storage modules developed for Slide and identified configurations that provide significant overall performance increases (i.e. a factor of 2-4 improvement depending on the operation performed). Bug fixes and additional performance and scaling improvements are being shared with the Jakarta Slide project.
PNNL Notebook Deployment: Work has been done to prepare for an upgrade of all ELN servers operated at PNNL to use a SAM 1.2 server with SSL encryption. Due to unexpected delays in the availability of the EMSL web server for debugging and testing, an interim strategy of converting the current Perl-based notebooks to SSL has been done. Upgrading to SAM-based servers is now expected in the fall. In response to renewed interest within Battelle in the use of electronic notebooks as laboratory records, the SAM team has investigated the use of SAM's flexible authentication mechanism to support integration of PNNL's Entrust-based digital signature mechanism with the ELN. This is also expected to raise the priority of research related to digitally signed annotation capabilities consistent with SAM's flexible annotation model.
Open Source Software Licensing: SAM source code is now available upon request under an Apache/BSD-style license. The source code has been distributed to collaborating projects and we anticipate moving the live project CVS repository to www.sourceforge.net this fiscal year.
SAM team members participated in a wide range of meetings, workshops, reviews, and collaboration discussions during this quarter:
Collaboratory for Multiscale Chemical Science (CMCS): An ongoing collaboration related to the use of SAM as the primary CMCS data/metadata management system. CMCS and SAM are currently collaborating on performance enhancements, porting CMCS to SAM 2.0, adopting semantic search capabilities, and general hardening.
Network for Earthquake Engineering and Simulation (NEES) Grid: PNNL has accepted a subcontract from the NEESgrid project. This effort, which focuses on integration with the NEESgrid portal, security mechanism, and metadata/data repository, is nearing completion. An extension, which will address internationalization of the ELN (support for alternate languages for the labels used within the ELN user interface, and the ability to enter textual notes in alternate languages), is anticipated in the near future.
Web Downloads Registrations to download SAM and notebook software are continuing at a pace of 1-2 per day.
International Conference on Semantics for a Networked World, with a focus on Grid Databases: Jim Myers was invited to serve on the Program Committee for this conference, which will be held July 17-19, 2004, Paris, France.
GridSem 2004, 1st International Workshop on the Semantic Grid, Jim Myers was invited to serve on the program committee for this conference, which will be held August 23-24, Valencia, Spain.