Open Science Grid (OSG)
Tech Blog: How SGCI Helped nSides Use OSG and the Agave Platform to Augment Its Drug Side Effects Database
By Choonhan Youn
Explanation of the Agave Platform
The Agave Platform is an open source, science-as-a-service API platform for powering your digital lab. Agave allows you to bring together your public, private, and shared high-performance computing (HPC), high throughput computing (HTC), Cloud, and Big Data resources under a single, web-friendly REST API. In order to ease the access to HPC resources, SGCI consultants worked with the nSides project to adopt the Agave framework which could wrap the existing HPC resources into web services. The Agave API provides a comprehensive view of core computing concepts such as security, groups, data, applications, jobs, and systems. It also provides support services, such as unified accounting, provenance, metadata, tags, notifications, and global events. These support services allow the core concepts to bind together into a cohesive interface.
What are the benefits to science gateway developers? What problems does it address?
Since the Agave platform is designed to operate as a multitenant, cloud-based, Software-as-a-Service (SaaS) solution, the benefit of using the Agave platform is that Agave APIs allow developers to expose these HPC systems to be integrated into the nSides web application easily.
For the nSides project, we developed a gateway which acts as an interface to a database of side effects resulting from single and multiple drug exposures. The side effects are computed via deep learning models which are designed to be computed on the Open Science Grid (OSG). Since it is not feasible to create models for every possible combination of drugs, we needed to develop a RESTful Web API where users can request a model to be computed for a specific combination of drugs not already in the database.
How has it been implemented by SGCI staff?
Implementing the nSides gateway for running nSides scripts on OSG involved the following:
- We originally performed testing of our methods manually on the distributed computing resource, OSG, using a command line interface and shell scripts. In order to automatically generate and submit a job to the OSG, we developed Agave wrapper scripts that contain the specific job attributes. Agave app means an executable code is available for invocation through the Agave Jobs service on a specific execution system. We created App service on Agave platform that describes this script, using Agave ToGo tool.
- The Systems service in Agave gives you the ability to add and discover storage and compute resources for use in the rest of the API. We configured two system services in Agave. The storage system is only used for storing and interacting with data. And the execution system is used for running jobs as well as storing and interacting with data.
- The back end to the nSides gateway is implemented in Python (using the Flask web framework), including a fully-featured REST Web API for managing the job lifecycle components (See the example). The Agave API is fully integrated into the nSides web front end, which includes secure user authentication and a dashboard for visualizing the status of submitted jobs.
An image representation of the computational infrastructure presented. Spontaneous reporting data is processed on grid computing systems to generate an adverse event database (lower left) with a web front-end (lower right) which has an on-demand interface (top center) to request drug combination adverse reactions. User queries are handled by a set of middleware tools and Agave API (bottom center).
What would make someone choose this solution over another?
I think this is a good approach to bring in the projects that traditionally used HPC resources in various manual ways. Many components can be wrapped into web services, and they can be exposed and orchestrated in web-based frameworks. For the nSides project, we developed a robust searchable shared-usage gateway to the OSG resources. This gateway uses a three-tiered architecture consisting of browser-based user interfaces on the front end, the OSG job submission system on the back end, and middleware to facilitate communication between the other two components. We deploy the user interfaces using an application written using the Python Flask framework, which then accesses a variety of web services that constitute the middle tier of the gateway. These web services are arranged in a way that allows a heterogeneous collection of resources to be accessed remotely in a uniform fashion. Additionally, we bundled the user interface front end alongside a RESTful Web API that allows authenticated users to submit jobs programmatically. This API is implemented using the Agave tenant service, which is a cloud-based API system designed for developing APIs to be used for scientific computing. The Agave job API manages all aspects of job execution and management, including data staging, job submission, job monitoring, output archiving, event logging, sharing, and notifications.
- Agave ToGo is a full-featured, open source reference web application that allows you to leverage the data, computation, and collaboration features of the Agave platform. It can be used by gateway developers to test the platform, build custom integrations, and develop Agave services. On Agave ToGo, the nSides project is using the Agave public tenant service hosted by the Texas Advanced Computing Center (TACC). First, you need to sign up on Agave ToGo app, and then log into it. Next, we created Agave systems service for storing codes and data, and executing codes on OSG. Additionally, the Agave software development kit (SDK) and command line interface (CLI) tools are also available for gateway developers.
- Using nSides code scripts, we developed the Agave wrapper template script that specifies inputs and parameters for the job, and submits and monitors condor DAG (Directed Acyclic Graph) jobs. This script that is forked at the command line is registered as an Agave app service and run through the Agave Jobs service, in order to run the code on OSG and populate the job outputs.
- These developed Agave REST APIs were fully integrated with the nSides gateway (Python flask web framework), including Agave authentication and authorization services. Agave authorization service follows a fully compliant OAuth2 implementation and the generated access token is used for invoking the Agave APIs.