SGCI Client Cosmic^2 and Its Use of Globus
By Nayiri Mullinix
"We knew that the need was there for a cryo-EM gateway, and having the experts at SGCI available made the gateway development process much more efficient since they have the starting technologies to work with and they know how to solve the unique problems facing gateway developers."
The COSMIC2 science gateway offers an easy, web-based platform for structural biologists to determine macromolecular structures using cryo-electron microscopy (cryo-EM). Over the last 5 years, cryo-EM has undergone recent advancements that allow for the determination of atomic structures, which has led to a revolution in the field of structural biology. In fact, the revolution made headlines earlier this month when the Nobel Prize in Chemistry 2017 was awarded to three scientists for cryo-EM. Worldwide, researchers are now collecting and analyzing unprecedented amounts of data. Per experiment, an individual scientist could easily process up to 10–30 terabytes of data from a single dataset collected over the course of a week. These datasets then require at least 50,000 core hours per structure, requiring the use of High-Performance Computing (HPC) resources. As a result, new users to cryo-EM face a number of obstacles, including the handling of these large datasets and dealing with managing and submitting jobs to HPC resources. The COSMIC2 science gateway aims to lower these barriers.
As a client of the SGCI, COSMIC2 development began by working with our Extended Developer Support consultants Mona Wong and Choonhan Youn to develop the gateway. Initial work was led by both Michael Cianfrocco and Andres Leschziner at the University of California, San Diego, and this has transferred with Cianfrocco as he starts his own research laboratory at the Life Sciences Institute at the University of Michigan.
In order to implement terabyte-sized data transfers, Mona and Choonhan have worked to customize and extend the existing CIPRES Workbench framework in a way that would allow them to integrate two Globus technologies, Globus Auth and Globus Transfer cloud services. Globus Auth is a federated identity management service that can manage user identities and access to resources. It allows users to utilize their existing organization login (XSEDE, their own university, Google, etc.) to access the gateway. Globus Transfer offers large data transfer in a secure, reliable, and asynchronous manner, giving the COSMIC2 users the ability to transfer their terabyte-sized datasets to the gateway. Once the data has been successfully transferred, the user can set up an analysis task and submit it as a job to the XSEDE HPC resource Comet at the San Diego Supercomputer Center. When the analysis job has been completed, the user can view and download their results from the gateway. The new Globus integration Java package is available publicly via GitHub.
About the work completed on COSMIC2, Cianfrocco had this to say:
"By working with SGCI, we gained access to experts who provided comprehensive assistance in the many aspects of developing our gateway. It takes the burden off the PI to not have to piecemeal services together but, instead, to know that, by engaging with SGCI, we'd have access to all the consultants needed to help with a variety of integrations. We knew that the need was there for a cryo-EM gateway, and having the experts at SGCI available made the gateway development process much more efficient since they have the starting technologies to work with and they know how to solve the unique problems facing gateway developers. With cryo-EM, there are interesting challenges in data size and computing, so there was no simple turnkey solution. While CIPRES and the workbench framework provided great starting points for COSMIC2, it still needed to be extended to handle large data transfers, asynchronous transfers, multiple file uploads, etc. Having experienced developers to help with this provides a significant advantage."
The new COSMIC2 gateway will soon be released to beta testers. Attendees of the Gateways 2017 conference in October will have the opportunity to see a demo of the gateway given by Cianfrocco, which will serve as the official beta release.
Figure 1. Diagram of a typical COSMIC2 user workflow. First, the user will upload their data and, depending on the size, will either use Globus Transfer or the browser. Next, the user will create and configure an analysis task and submit the task to the HPC as a computation job. When the job is done, the user can view and download the output results. Lastly, the user can perform simple management steps on their data, tasks, and jobs.
Figure 2. The COSMIC2 login is automatically redirected to Globus login for User Authentication/Authorization.
Figure 3. Diagram of the components contributing to the functions of COSMIC2. COSMIC2 serves as the intermediary between a user's dataset and analysis, connecting storage, Globus services, and XSEDE high-performance computing.