This material has been published in the Proceedings of the 1999 International Chemical Information Conference, ISBN: 1-873699-63-8. 196 pages. by Infonortics Ltd. This material may not be copied or reposted without explicit permission.
(The version posted here does not include figures.)
James D. Myers
W.R. Wiley Environmental Molecular Sciences Laboratory
Pacific Northwest National Laboratory
Richland, Washington, USA
Collaboratories and virtual facilities (VFs) are a new way of organizing and performing scientific work that holds tremendous promise. Researchers accessing these facilities remotely can securely control instruments, run analysis and visualization tools, store notes in a shared electronic notebook, and converse with colleagues using videoconferencing, whiteboards, and shared applications, as easily if they were on site. Pacific Northwest National Laboratory's Environmental Molecular Sciences Laboratory (EMSL) is a new national user facility that is adopting the Collaboratory as a primary means of supporting users and interacting with collaborators and partners. The EMSL Virtual NMR Facility, already being discussed as a national model for future NMR facilities, provides a good example of the state-of-the-art, and of the specific benefits that can be obtained. (Details of the EMSL Collaboratory efforts are available at http://www.emsl.pnl.gov:2080/docs/collab/.)
The term Collaboratory was coined in 1989 by William Wulf while he worked for the US National Science Foundation. Wulf's vision was a
For many researchers, the VNMRF is already preferable to a physical visit to the facility, and as technologies and our understanding of distance collaboration improve, we expect VFs to become even more compelling. The convenience and power provided to individual researchers and, at the organizational level, the cost and time savings to be had, will make the VF concept a primary component of new scientific resource centers. Over time, VFs will become drivers of instrumentation advances and will likely play an increasing role in managing members' data and their software environments. Linkages between the curators of community databases and the "curators" of virtual instrumentation and computation centers are also likely as all these facilities work to develop complete collaborative problem solving environments. These environments will support the full range of research tasks, from background research and problem definition, through data acquisition and analysis, to comparison with previous work and publication. As this evolution occurs, it will have impacts on scientific data and programming standards, software licensing practices, and may even impact individuals' job descriptions and the nature of research organizations.
The Collaboratory project at EMSL began more than six years ago, before the EMSL building itself existed. Sparked by a 1993 report from the National Research Council, we began a program linking research and development of collaborative software with deployment and investigation of the processes and dynamics of scientific collaboration. Through a series of U.S. DOE and internally funded projects (3), we have worked with partners at several national laboratories and universities to develop and deploy powerful, extensible, cross-platform tools for real-time and asynchronous collaboration. These tools, which are publicly available via the WWW, are being used to support a variety of multi-institution research and education projects and are a standard service available to users of the EMSL facility. In the VNMRF, we have used the programming interfaces of these tools to provide custom research environments tailored to NMR, e.g. an electronic notebook that accepts NMR experiment parameters directly from the NMR console and can display rotatable 3-D views of protein structures. The VNMRF was developed in close collaboration with NMR researchers and facility operators and, though still evolving, is already having a major impact with more than 25% of external users performing experiments remotely.
The EMSL Collaboratory software itself has been described in more detail elsewhere (4-6). The current release of the software and supporting information is freely available from our website (7). A brief description is provided here, focussing on aspects important in the VNMRF.
CORE2000. The suite of real-time collaboration tools developed at EMSL is called the COllaborative Research Environment, or CORE2000. CORE2000 is an extension of the National Center for Supercomputing Applications (NCSA) Java-based Habanero (8) environment. CORE2000 adds shared computer screens, remote cameras, and third party audio and video conferencing to Habanero's whiteboard, chat box, and other tools. The CORE2000 client allows users to start or join sessions by supplying the session name, the server hostname (or IP number), and optional port number. (Users can access a continuously running CORE2000 server maintained at EMSL or start their own locally.) When a user starts or joins a session, they see a palette of icons representing the available tools (Figure 2). Anyone in the session can then click at any time on whatever tools are needed. CORE2000 starts each tool simultaneously on whatever mixture of PC, Mac, and UNIX systems that the remote collaborators are using. A future version of CORE2000 will allow collaborators to start, monitor, and join sessions via a Web page.
CORE2000 offers a variety of collaboration capabilities. The third party audio and video tools allow participants to converse and to see each other. CORE2000 can launch the publicly available Mbone (9) tools, the option used in our VNMRF project, or CUSeeMe (10) (limited to non-Unix participants). The chatbox tool is used to exchange short text messages. The whiteboard tool allows users to create sketches and diagrams together using a variety of pen colors. Users can drag-and-drop geometric shapes (lines, rectangles, ellipses, etc.), type text, or draw freehand on the whiteboard. They can also import GIF or JPEG images, such as NMR spectra, pulse sequence diagrams, electrophoresis gels, or molecular models onto the whiteboard, and mark them up as the discussion proceeds. The TeleViewer is CORE2000's dynamic screen sharing tool. Developed several years ago at the EMSL, the TeleViewer allows users to transmit a live view of any rectangle or window on their screen to all session participants. Collaborators simply click-and-drag a rectangle over the area they wish to share and transmission to the group begins. The TeleViewer is very effective in supporting spontaneous interactions, allowing researchers to share their latest results and visualizations, or work as a group to edit a document. The TeleViewer is also effective for training, allowing a mentor to demonstrate the use of a program to a student and watch as the student works. There are also tools in CORE2000 specifically for viewing 3D molecular models: the Molecular Modeler displays pdb-formatted molecular structures, and the 3D XYZ tool displays molecules stored in the .xyz format.
CORE2000 also has a simple programming interface in common with Habanero that allows new tools to be added as needed. Various groups have used this interface to develop sophisticated, domain specific tools including collaboratively controlled geographical information system viewers, and image analysis software for the Visible Human project (8). At PNNL, a data acquisition system for a mass spectrometer was developed using this programming interface. During the VNMRF project, we used this interface to develop a collaborative remote pan-tilt-zoom controller for cameras (i.e. Canon VC-C1 Communication Camera) positioned in the EMSL NMR labs, allowing researchers to get a sense of lab activity and to view some important non-computerized instrument status displays.
Electronic Laboratory Notebook (ELN). An ELN is an analog of a paper laboratory notebook, designed to allow distributed teams to record and share a wide range of notes, sketches, graphs, pictures, and other information. The ELN can be used to store literature references, experimental procedures, equipment design drawings, summary tables, annotated graphs and visualizations, etc. The Web-based EMSL ELN was developed as part of a collaboration with researchers at Lawrence Berkeley National Laboratory (LBNL) and Oak Ridge National Laboratory (ORNL). The EMSL ELN (Figure 3) presents an initial login screen requiring the user's name and password, and then displays a main window containing a table of contents with a user-defined hierarchy of chapters, pages, and notes. The content of the currently selected page appears in a separate browser window. All entries are keyword searchable. Notes on a page are created using a variety of "entry editors" which are launched from the main window. The notebook currently includes editors to create text (plain, HTML, or rich text), equations (LaTeX), and whiteboard sketches (using the CORE2000 whiteboard), to capture screen images, and to upload arbitrary files. Once a note is created, a click on the "submit" button publishes it to the notebook page and makes it available to other authorized users of the notebook. Entries are shown as part of a page, tagged with the author's name and the date and time of the entry. The current ELN restricts access to group members using passwords. The next version of the notebook will include certificate-based user authentication, encrypted data transmission, and digital signatures to provide stronger protection and to begin to address the issues related to using a notebook as a legally defensible document.
Each "note" can be rendered by the browser (e.g. text, images), by using external applications (e.g. by launching Microsoft Word), or by using Java applets (e.g. equations, molecular structures). The creation and display of entries is fully customizable via simple editor and viewer programming interfaces. Over the course of the VNMRF project, these interfaces were used to create an NMR Spectroscopists' "version" of the ELN. One of the first customizations of the notebook involved linking in a Java applet viewer for protein structures entered into the ELN as protein data bank (pdb) formatted files. After a brief search and some initial tests, we integrated the WebMol Java applet (11). WebMol displays pdb-formatted molecular structures in a 3D, rotatable format and allows users to display inter-atom distances and angles - enough information to allow quick analysis and comparisons without having to launch a stand-alone analysis package. We have also developed some Java applets for the ELN, including one to display NMR parameter files. This applet shows the parameters not as a long text list, but in a more usable interactive window format that displays only the lines of text associated with the selected parameter. We have also extended the ELN by creating an "ELNWizard" that can be called from within other programs, i.e. from the spectrometer control software, to automate transfer of parameter sets and screen snapshots immediately to a user-specified chapter and page. The ELNWizard can also be used to create scripts that automatically record instrument status at predefined intervals or in response to events.
Secure Instrument Control and Data Access. At the beginning of the VNMRF project, the EMSL already had mechanisms in place to allow remote users to access EMSL computer resources and data. Since the NMR spectrometer console software is based on Unix and X-Windows, these mechanisms were also sufficient to allow remote users to control the spectrometer. (The spectrometer manufacturer often takes advantage of this to install and troubleshoot spectrometers over the Internet.) While X-Windows makes it possible to run programs over the Internet, it provides no protection against "session hijacking" and other attacks that could allow hackers to take control. We felt more protection would be needed, especially as we began advertising the continuous availability of EMSL's high field, high profile spectrometers. EMSL NMR operations staff collaborated with EMSL's Computing and Network Services group to set up and use secure shell (ssh) (12), a publicly available tool that provides authenticated, encrypted telnet-like functionality along with encryption for X-windows. We also felt it would be necessary to allow collaborative, versus simply remote, access to the spectrometer so that local experts and others could observe the spectrometer console in real-time to advise and/or learn from the remote operator. Our simplest real-time collaboration solution is to have the remote operator capture and share the spectrometer console using the TeleViewer, allowing the rest of the group to observe, but not control, the spectrometer. An advantage of this method is that only the remote operator needs to be running X-Windows. Other collaborators can then use CORE2000 and do not need to run X-windows. We are currently investigating ways to provide secure group control of the spectrometer, again in a platform independent manner.
Hardware and Network Setup. The computers used for VNMRF collaborations are the researchers' existing desktop machines. Modern Microsoft Windows and Unix based machines, e.g. 400 MHz Windows95 PCs and Sun Ultra series machines with 128 Mbytes of memory, are sufficient to run all necessary client and server software. As part of the project, cameras and echo cancellers were installed on each machine. (Echo cancellers are small hardware devices that attach to the audio input and output of the computer and serve as both microphone and speaker, allowing both parties to speak at the same time, as with a telephone. Without one of these devices, only one party can speak at a time, as with walkie-talkies.) We have also experimented with providing small scanners that allowed researchers to conveniently scan gels of the purified protein samples and other paper "documents" into the ELN.
A research team with members at PNNL and LBNL did the first VNMRF experiments over a T3 network link (45 Mbits/sec) between the laboratories, part of DOE's Energy Sciences Network. Other users have found that a T1 (1.2-1.4 Mbits/sec) network connection is sufficient to effectively use all of the tools for one collaborating group (2-3 participants) although this does not provide full-motion video and there may be short delays during spikes in other network traffic. Faster networks and the ability to prioritize traffic (to guarantee a requested quality-of-service) should make this situation less common over time. For the remote user securely logged into an EMSL spectrometer, the response over a T1 or faster link is nearly as fast as for a local user. At the low end, even a modem provides sufficient bandwidth for using electronic notebooks and some screen sharing, and combined with a telephone line for audio, can allow a remote expert to participate in discussions and give advice.
The EMSL VNMRF has been in general use for less that a year. Today, 25% of the external users of EMSL's NMR spectrometers do so via the Internet, and this number is expected to rise to 50% as users who wish to make one physical visit to the facility switch to remote operations for subsequent work. The success of the VNMRF is not unique; collaboratories and VFs are under development in many fields, from space physics (13) and fusion (14) to combustion (15) and materials micro-characterization (16), and they too are changing the way research is being done in their target communities. VFs are becoming integral to the plans for the future. A likely scenario for NMR, as recently envisioned by the Committee for High Field NMR, is the creation of a set of complementary ultrahigh field user facility sites, or "sectors", that would become part of a "National Magnetic Resonance Collaboratorium" (17).
It is interesting to speculate about the long term impacts of VFs on researchers, research institutions, and commercial providers of scientific equipment and software. Clearly, VFs have the potential to make state-of-the-art instrumentation, computational resources, and software accessible to a broader audience of faculty, students, industrial, and government researchers. They will form at a scale sufficient to provide advanced services and support (perhaps shared with other virtual or physical facilities) in areas ranging from machining and electronics fabrication, to software engineering, training, etc.. Their scale will also allow individual researchers associated with such a facility to specialize, using these services and focusing on the development of new instrumentation, experimental procedures, or software for analysis and visualization. The ability to support such specialists will keep VFs at the forefront of scientific instrumentation and software advances. Other researchers may generalize, focussing more on the scientific questions driving their research and relying on multiple VFs for the resources and expertise required to obtain, interpret, and compare data from different techniques. VFs may merge or form alliances to provide cross-disciplinary capabilities.
The need to compare data from multiple techniques and to translate information across disciplines will drive the adoption of open data standards and lead to the concept of applications as components connected via a scientific workflow. VFs will add value by pushing toward integrated problem solving environments that support background research, project planning, data acquisition, analysis and visualization, comparison with previous work, publication, etc. These environments will link commercial, public, and custom tools with the facility-wide security, messaging, data storage, notebook, workflow, collaboration, reference database, and other services. Researchers will expect applications to work with each other and with the collaborative problem solving environment infrastructure. The applications will need to be written to take advantage of these services and to expect that their events (e.g. "data acquisition complete") and outputs (e.g. results and experimental parameters) will be interpreted and used by other applications. Reference data from one application/service will be used to guide data acquisition and analysis in another package. As data moves between applications, metadata about its processing history will be automatically recorded. This metadata could eventually become part of the publication/database submission package, allowing much more thorough validity checking.
Today, VFs are primarily being built at existing instrument centers, where they can take advantage of existing operations, computing, and management infrastructures. One of the promises of the VF approach is that they can be built/expanded by upgrading and re-purposing resources at multiple sites, thereby saving time and money. As it becomes more common for VFs to have both members and resources at multiple sites, their role as organizations with distinct management and services will become more apparent. For the NMR Collaboratorium, it has been suggested that policies, procedures, and software infrastructures across the host sites be compatible enough that users can effortlessly switch between the sites. This will require host institutions to coordinate everything from security policies to proposal handling and user support. VFs will expect software "site" licenses for development and end-user tools and for scientific database subscriptions that apply across the VF rather than across physical institutions. The software systems provided to access the VF will need to be well supported with installation scripts, help systems, error recovery, etc., as a production environment. It is even conceivable that VF operations could be outsourced to a commercial entity. In addition to providing remote access and collaboration capabilities, a company might offer to provide instrument time, data storage, analysis tools, and computation resources. In essence, this would be a switch from selling products (an instrument or analysis program) to services (hours of time on a specific class of instrument and use of analysis tools for the acquired data), analogous to the move towards web-hosting centers and application service providers (ASPs) in the business world.
Collaboratories and VFs are quickly becoming a viable means of conducting scientific research. They hold promises of cost saving and convenience and will likely become an important means of providing advanced scientific resources to a wide range of users. They will enhance our ability to quickly address new scientific questions and will lower the barriers to cross-disciplinary research. Our own experiences with the EMSL VNMRF have shown that researchers can quickly become productive using the Internet to run experiments and work with colleagues. As VFs mature, they will expand their capabilities, integrating, packaging, and supporting an ensemble of tools tailored to make their user communities more efficient and effective. While the exact nature of their evolution is unclear, their scale, and their ability to support both research into new experimental techniques and the rapid deployment of the resulting advances, will place them at the forefront of 21rst century scientific research.
The author would like to acknowledge the efforts of the Collaboratory project team and NMR researchers and facility operators who contributed to the development of the VNMRF and would like to thank many individuals for helpful discussions on the impacts of Collaboratories including members of the Collaboratory development team, Ray Bair, and Deborah Gracio. This work was supported by the U. S. Department of Energy through the DOE2000 program and the Distributed Collaboratory Experiment Environments (DCEE) program, both sponsored by the Mathematical, Information and Computational Sciences Division of the Office of Energy Research, and through the Laboratory Directed Research and Development program at Pacific Northwest National Laboratory (PNNL). PNNL is operated for the U. S. Department of Energy by Battelle. The W. R. Wiley Environmental Molecular Sciences Laboratory (EMSL) is a national scientific user facility sponsored by the U.S. Department of Energy's Office of Biological and Environmental Research and located at PNNL.
Figure 1: The EMSL Virtual NMR Facility provides desktop access to people, spectrometers, data, and more.
Figure 2: The main CORE2000 interface showing some of the tools that can be launched with a click of a button.
Figure 3: The table of contents and a sample page from the EMSL Electronic Laboratory Notebook. Editors (lower left) allow entry of text, images, drawings, files, equations, etc.