Workshop on Building a Collaboratory in Environmental and Molecular Science

Richard Kouzes, James Myers, and John Price

A workshop was held by the U.S. Department of Energy's (DOE) Pacific Northwest Laboratory (PNL) on March 17-19, 1994, to discuss the development of a "Collaboratory" in the environmental and molecular sciences. The workshop was attended by representatives from the molecular, environmental,computer, and social sciences. A Collaboratory is a meta-laboratory that spans multiple geographical areas with collaborators interacting electronically. The National Research Council (NRC) recently published a study entitled National Collaboratories, Applying Information Technology for Scientific Research. (V.G. Cerf, et. al.. Washington, D.C.: National Academy Press, 1993). The Collaboratory concept represents a qualitatively different way of using communication and information technologies and promises to accelerate the development and dissemination of basic knowledge, and to minimize the time-lag between discoveries and their practical application.

The environmental and molecular sciences are a compelling target for a Collaboratory development project. The task of environmental remediation facing our nation is enormous, and solutions will require the integration of knowledge from many fields. The breadth of knowledge required to take basic molecular research to new remediation technologies necessitates close intra- and interdisciplinary communication on a national scale. PNL has embraced the idea of an Environmental and Molecular Sciences Collaboratory (EMSC) as a powerful tool to connect researchers across the nation as they create solutions to environmental problems.

The DOE's commitment to environmental cleanup at its sites presents significant scientific and technical challenges. These challenges are exemplified by the environmental problems at the Hanford site in southeastern Washington state. The Hanford Site has approximately 1.4 cubic kilometers of hazardous and radioactive wastes, 150 square miles of contaminated aquifer, 60 millions gallons of radioactive wastes (260 MCi) stored in underground storage tanks (of which more than one-third are believed to be leaking), 270 tons of spent fuel, 9 inactive reactors, and 7 major inactive reprocessing plants. The site is the equivalent of nearly 1400 Superfund sites divided into 78 distinct groups sharing common traits and geographies. PNL, the DOE 's multiprogram national laboratory in Richland, Washington, is tasked with developing innovative, cost-effective technologies to address the environmental challenges at Hanford and across the nation, and to facilitate the application and commercialization of these technologies.

The DOE's Environmental and Molecular Sciences Laboratory (EMSL) at PNL is a key to accelerating and ensuring the effectiveness of this cleanup effort. The EMSL will be a national focus for the environmental and molecular science research communities. The new laboratory will house about 250 resident and visiting scientists in a 200,000-square-foot building. Over a five-year period, the DOE is investing $229.9 million to design and develop prototypical instrumentation, construct the building, and acquire and install the advanced instrumentation that will make the EMSL a unique national resource. When complete, the EMSL's research programs and operations budget will be approximately $60 million per year. The EMSL's special resources will enable scientists to apply advanced capabilities to research and technology development in areas such as contaminated soils and groundwater; waste analysis, characterization, processing, and storage; and human and ecological health effects. The EMSL will be part of DOE's high-performance computing network allowing data and information generated in the EMSL to be shared electronically with the national and international scientific communities.

As a national collaborative research and technology laboratory, the EMSL will attract hundreds of scientists from academia, industry, and other government laboratories across the United States and around the world. These visitors will work with PNL staff and other collaborators to develop new levels of understanding and permanent solutions to the nation's environmental problems.

PNL's motivation for hosting the Collaboratory workshop and promoting an EMSC grows naturally out of the EMSL project. The MSRC, which will manage the EMSL, is a unique combination of physical, biological, computer, and cognitive scientists charged with developing the instruments and computing infrastructure for the laboratory. While the MSRC will provide a coupled core of developers and end users for an EMSC, realization that such a collaboratory must serve a national community prompted PNL to gather advisors, potential co-developers, and potential participants at a very early stage to help define the EMSC concept.

The object of the EMSC Workshop was to define both long and short term goals for an EMSC and to guide current PNL efforts toward a national scope. The workshop considered molecular science (experimental, theoretical, and computational), computer (hardware and software), social (human interaction), and management (funding, legal) issues that must be faced in developing an EMSC. Sixty participants from academia, industry and government, with expertise spanning the scientific fields identified above, gathered in Richland, Washington, to discuss the requirements of an EMSC and develop recommendations for its design and implementation. (Roughly 40 other interested scientists participated in the workshop via an MBONE audio/video multicast on the Internet.)

As envisioned by the workshop participants, the EMSC would provide a means for more coordination and collaboration between scientists conducting research relevant to environmental remediation, regardless of their geographic location. It would offer a common set of computer hardware and software tools to support remote collaboration. It would also develop the social and policy structure required to establish a true "collaborative culture" of scientists in the theoretical, computational, and experimental molecular sciences across the nation. The long term goal of an EMSC should be the establishment of a diverse electronic community of scientists, from basic researchers in biology, chemistry, and materials science, to applied researchers involved in atmospheric and terrestrial monitoring, modeling and simulation, to engineers developing innovative environmental preservation and restoration technologies.

The long term vision expressed in the workshop is predicated on the idea of an integrated electronic environment where data, analysis tools, and equipment may be easily accessed from remote locations. Four broad categories of information sharing have been identified in the NRC study of collaboratories: 1) data sharing, 2) software sharing, 3) remote instrument control, and 4) communication with remote colleagues. Several generic tools in these categories are now available, but the workshop participants agreed that further developments, and customization to the specific needs of the environmental and molecular sciences will be required. The wide variety of data generated in the molecular sciences were contrasted with the large, uniform datasets that have formed the basis for other collaborative efforts. The requirement to share data generated by many techniques at an interpreted level (as information about molecular systems) was seen to distinguish EMSC development from other efforts.

While the required path of technology development for an EMSC seemed fairly well defined, solutions to the social and legal issues that will arise with distributed groups were not as clear. Most of these issues will be generic across all collaboratories:

* establishing and maintaining trust between remote collaborators

* credit/funding for electronic and/or collaborative contributions

for contributions outside home institution/discipline

for tool building

for reanalysis of existing data/data mining

for assuming the borrower burden - accepting the costs of maintenance, and user education and support

* electronic peer review/quality control of electronic information

* security of electronic information/access privilege levels

* intellectual property

can you protect 'hypertext links'?

the granularity of intellectual property will increase

* scheduling remote resources/multiple remote resources

The participants recommended that prototype software for an EMSC be created and deployed as soon as possible as the best way to gather information on user requirements and to begin to resolve the social issues. The workshop participants felt that collaboratory software must be extremely flexible and allow for individual styles of use. Another recommendation was to make prototypes span institutions from the start to assure that cross-organizational issues were not ignored. A rapid iteration of design, development, and testing, with close coupling of developers and end users, was seen as essential for EMSC development.

In response to workshop, PNL is continuing to test and deploy:

* MBONES internet audio/video software from Lawrence Berkeley Laboratory

* Electronic Data Notebooks (such as The Forefront Group's VNS)

* NCSA's MOSAIC world wide web browser from the National Center for Supercomputer Applications

* EPICS networked instrument control software from Los Alamos National Laboratory

The MSRC and the Applied Physics Center at PNL are presently developing a prototype shared software display tool (a TeleViewer) and integrating it with a suite of existing tools to create a prototype for a collaboratory software environment. This development is a quick response to the workshop's call for more practical data on the use of collaborative tools by molecular science researchers. Plans call for deploying this environment to support an EMSL research project involving collaborators at PNL and at the University of Washington in NMR and to monitor how the environment is actually used. Additional uses of the environment will be sought and iterative enhancements will be made as possible.

It is clear that an EMSC has the potential to greatly benefit the DOE and the scientific community in general by expanding the resources available to individual researchers, increasing the efficiency of our research system, and by coupling basic and applied research efforts more tightly to national environmental goals. It has the potential to remove the walls around departments and organizations, and could lead to the creation of a meta-laboratory with capabilities - in both expertise and equipment - that far exceed those available in any one laboratory alone. The Environmental and Molecular Science Collaboratory Workshop has played and important role in defining a path toward this goal.

At the workshop, staff from the MSRC presented a whitepaper that explains the collaboratory concept and PNL's interest in an EMSC. A second version of the whitepaper, which incorporates the results of the workshop, is being prepared. Copies of this whitepaper, preliminary conference proceedings, and information about the EMSL will be incorporated into the MSRC Mosaic pages (URL http://www.msrc.pnl.gov:2080) as it becomes available. Interested parties are encouraged to contact the authors for more information on participating in the development and testing of an EMSC.

Pacific Northwest Laboratory is operated for the U.S. Department of Energy (DOE) by Battelle Memorial Institute under contract DE-AC06-76RL0 1830.