sciencegateways - Gateways 2016: Abstracts

Gateways 2016: Abstracts

Details: Published on Friday, 24 September 2021 16:40

Jump to Concurrent Sessions below

Concurrent Sessions A (Wednesday at 2:10pm)

Often Overlooked but Always Important: Considerations for Gateway Creation

Science Gateways and Cybersecurity: Learning from the Past and Preparing for the Future (Randy Heiland and Von Welch)

Science gateways connect communities of scientists and engineers to distributed cyberinfrastructure (CI). Cybersecurity is therefore an important component to help protect the people, machines, and data from malicious activity and accidental mistakes. The Science Gateways Community Institute (SGCI) and Center for Trustworthy Scientific Cyberinfrastructure (CTSC) have partnered to address cybersecurity for gateway development and operation. This paper and presentation will provide an overview of and goals for this partnership.

SciStarter 2.0: A Gateway to Drive Research, Participation, and Community-building in Citizen Science (Catherine Hoffman, Caren Cooper, Steve Gano, Ira Bennett, and Darlene Cavalier)

Citizen science is exploding. Each day there are more projects announced, exciting findings led by citizen scientists released, and new people discovering a passion for citizen science. The ecosystem of projects is continuing to expand and SciStarter is the place for people to navigate this new, empowering world of discovery.

SciStarter is a popular online hotspot for citizen science. As a Match.com meets Amazon for citizen science projects, we connect the millions of citizen scientists to thousands of projects and events, and to the resources they need to participate. These opportunities represent ways for the general public from kids to adults to get involved in scientific research.

Recently, SciStarter developed a new digital gateway to support and sustain engagement in citizen science. SciStarter includes contribution tracking tools to make it easier to participate in multiple projects, enhanced GIS information to promote locally relevant projects, and an online personal dashboard to keep track of contributions. By tracking contributions to citizen science with SciStarter, participants will gain credit for their work. We’ll soon be building ways for that credit to turn into real rewards, like course credit, badges on online social profiles, or even discounts at local retailers! Finally, we’re building an inventory of tools used in citizen science projects to make it easy to get equipped. With one-click, someone can find, join, and get the tools needed for any SciStarter citizen science project.

This talk with explain the past, present, and future of SciStarter as a gateway to citizen science with a particular focus on how we build community outside our online presence. We'll showcase our partnerships with the Science Cheerleaders and the Expert and Citizen Assessment of Science and Technology as unique ways to build our community.

Capturing the “Whole Tale” of Computational Research: Reproducibility in Computing Environments (Bertram Ludascher, Kyle Chard, Niall Gaffney, Matthew Jones, Jaroslaw Nabrzyski, Matthew Turk and Victoria Stodden)

We present an overview of the recently funded “Merging Science and Cyberinfrastructure Pathways: The Whole Tale” project (NSF award #1541450). Our approach has two nested goals: 1) deliver an environment that enables researchers to create a complete narrative of the research process including exposure of the data-to-publication lifecycle, and 2) systematically and persistently link research publications to their associated digital scholarly objects such as the data, code, and workflows. To enable this, Whole Tale will create an environment where researchers can collaborate on data, workspaces, and workflows and then publish them for future adoption or modification. Published data and applications will be consumed either directly by users using the Whole Tale or can be integrated into existing or future domain Science Gateways. Our approach is analogous to a general science gateway as it aims to deliver a collection of services and integration software that simplifies the use of scientific services and cyberinfrastructure.

Return to Concurrent Sessions A in Program

Resource Efficiency

MyGeoHub Science Gateway for Spatial Data and a Model for Sustainability (Lan Zhao, Carol Song and Larry Biehl)

Science gateways have gained popularity and adoption in the past few years as an efficient and user friendly way to share scientific data, models, tools, and workflows in a community. However, all science gateways face a major hurdle of sustainability in the long term. When funding ends, the gateway developed for the project is likely to become obsolete due to non-trivial maintenance costs. We describe here a cost-sharing solution based on the HUBzero gateway framework. Instead of dedicating one hub for each project, this solution hosts multiple projects with synergies on one physical hub while at the same time providing branded individual web space for each project by leveraging HUBzero’s support of super groups. This reduces the cost in maintaining the underlying infrastructure, which is further shared among multiple projects, keeping the hosted gateways alive in the long term. We applied this model to MyGeoHub and will discuss our implementation experience.

Interactive Leaderboard for Requesting and Tracking Expensive Calculations of Optional Properties across a Database of Materials (Donald Winston, Joseph Montoya and Kristin Persson)

Databases of crystalline compounds typically have core properties such as relaxed structure, energy of formation, etc. pre-computed and available to users. However, additional material properties such as full elastic tensor, band structure, etc. may be both not of interest to all users of the database and computationally expensive to determine. Furthermore, researchers interested in such properties may only be interested in their values for specific compounds or chemistries. To crowdsource prioritization of optional-property calculations, we built an interactive leaderboard that serves as a gateway to our cyberinfrastructure (CI) for queuing jobs and dynamically reassigning job priorities. Users browse an ordered list of queued jobs, filter by chemistry for materials that are missing the property of interest, vote on existing or new jobs, and receive email notification when voted-on material properties are online. Here we will detail the use case of full elastic tensor calculations for the Materials Project, though the leaderboard interface can be run as a standalone web service and customized for contexts other than computational materials science.

Better Data Discoverability in Science Gateways (Supun Nakandala, Sudhakar Pamidighantam, Suresh Marru and Marlon Pierce)

Science gateways primarily focused on remote job execution management generate domain specific output data mainly readable by application specific parsers and post processing utilities. For example, computational chemistry data outputs encode molecule information, convergence of the simulation and energy values. Such domain-specific information is non-trivial to search in a generic fashion. It is thus desirable to add a wide range of application-specific and user-specific post-processing features that may include remote executions of scripts and smaller applications that don’t require scheduling on clusters. It is also desirable to support integrations with searching, indexing, and general purpose data analysis and mining tools provided by the Apache “big data” software stack. As gateways become tenants to general purpose platform services, providing a general purpose infrastructure that enables these application specific post-processing steps is an interesting architectural challenge. Furthermore, it is desirable to share results from the post-processing and indexing. In this paper, we discuss how we have incorporated a new automated application output indexing system for the SEAGrid Science Gateway using Apache Airavata that will parse and index generated output for easy querying. We also examine data sharing and automated data publication so that another user can reuse the results without running an already executed experiment and hence reduce resource utilization.

Return to Concurrent Sessions A in Program

Concurrent Sessions B (Wednesday at 3:30pm)

Gateway Case Studies with a Data Focus

A Novel Gateway for Visual Exploratory Analysis and Sharing of Survey Data (Ilya Zaslavsky, Side Li, Stephan Lamont, David Valentine and Thomas Whitenack)

SuAVE (Survey Analysis via Visual Exploration) is a key component of an emerging infrastructure enabling users to publish, share, and collaboratively analyze survey data in natural and social sciences. We discuss SuAVE analytical capabilities, which combine visual, statistical and cartographic analyses, as well as associated tools for publishing surveys, annotating them, and sharing annotations with partners. In addition to presenting this application as a scalable science gateway for survey data of different types, we demonstrate it using the 2015 Science Gateways Institute survey as a sample dataset. The Institute survey was published in SuAVE to make analysis of raw survey data easily accessible, intuitive, and visually attractive, while offering a combination of powerful data exploration tools.

The DesignSafe-CI Architecture (Stephen Mock, Matthew Hanlon, Dan Stanzione and Ellen Rathje)

Natural hazards threaten life and property across the United States and the world. The DesignSafe cyberinfrastructure (CI) supports multi-hazard engineering by addressing the big challenges associated with multi-hazard research. The DesignSafe CI was recently deployed and is under active development. This paper outlines the existing and planned architecture of the DesignSafe CI.

Return to Concurrent Sessions B in Program

Gateway Case Studies

The Renaissance Simulations Laboratory (Kacper Kowalik, Michael Norman, Britton Smith and Matthew Turk)

The Renaissance Simulations are a set of astrophysical calculations run on the Blue Waters supercomputer, designed to study the way that the Universe was reionized. The simulations are cutting edge, featuring radiation transport, chemical processes, hydrodynamics, gravity and star formation prescriptions; they have been run on the Blue Waters supercomputer using the ENZO simulation platform. We present the Renaissance Simulations Laboratory (RSL), a web-based mechanism of accessing, analyzing, visualizing, and then studying the outputs from the Renaissance Simulations, taking a hybrid approach of utilizing both general-purpose systems such as the Jupyter Notebook and bespoke, and constructed widgets such as SQL queries around halo catalogs.

CoESRA: From virtual desktop to science gateway (Hoang Anh Nguyen, David Abramson, Siddesware Guru and Yi Sun)

The Collaborative Environment for Ecosystem Science Research and Analysis (CoESRA) is a Web-based virtual desktop environment that integrates existing eResearch infrastructure in Australia for synthesis and analysis of scientific data for the ecological science community. Data synthesis and analysis is performed through scientific workflows. Even though this is powerful, it has a large learning curve for novice users. We have implemented a Web layer on top of an existing virtual desktop layer to hide this complexity from users. This new layer allows users to execute scientific workflows without requiring a desktop, and thus reduces the learning curve. The virtual desktop is still accessible for more advanced users.

Return to Concurrent Sessions B in Program

Using Containers for Gateway Construction

Containers-as-a-service via the Actor Model (Joe Stubbs, Rion Dooley and Matthew Vaughn)

Modern science gateways face distributed systems engineering challenges from multiple fronts. In this extended abstract, we introduce Abaco (Actor Based Containers), a framework and hosted service based on Linux containers and the actor model of concurrent computing for tackling problems related to portability, scalability, reproducibility, and the heterogeneity of physical infrastructure powering science gateways. We describe the initial implementation of Abaco, its use in the Agave science-as-a-service platform, as well as some related projects and areas of future work.

Demo: Creating Sandboxed User Environments with Jupyterhub and Docker (Shreyas Cholia, Donald Winston and Daniel Gunter)

In this demo we would like to showcase an infrastructure for sandboxed notebook platforms using Docker and Jupyterhub. We will show how this infrastructure is used for two scientific applications: in The Materials Project, we created a per-user sandboxed environment for workshops and tutorials that can be directly accessed over the web; for the Institute for the Design of Advanced Energy Systems (IDAES), we enabled playgrounds for users where they can explore scientific models and data without the difficulties of going through a complex installation and setup process.

Return to Concurrent Sessions B in Program

Concurrent Sessions C (Wednesday at 4:20pm)

Education and Gateways

Creating a Gateway for Collaborative Learning (Bev Corwin, Christie Ewen, Sean Auriti, John Ulmer, Janine Medina and Noreen Whysel)

The OWASP Learning Gateway Project proposes developing a platform that is adaptable to the learning needs and user requirements of a global community of technology learners. The gateway will offer learning pathways that connect learners with mentors and the knowledge needed to be successful. Our ultimate goal is to be a successful Science Gateways Incubation Project.

The Barcode of Life Student Data Portal: Supporting a National Network of Students in Cataloging Multicellular Life with DNA Barcodes (Ralph Imondi, Linda Santschi and Sujeevan Ratnasingham)

We present a series of key issues, challenges, and solutions associated with a classroom model of citizen science that engages a nationally distributed network of high school students (and other non-experts) in contributing professional quality biodiversity genomics data to the International Barcode of Life (iBOL) project. The successful implementation of this research-based educational model was significantly aided by the creation of the BOLD Student Data Portal (BOLDSDP), an open-access interface to the Barcode of Life Data Systems (BOLD) platform utilized by iBOL practitioners to create a digital genetic registry of Earth’s plants with DNA barcodes. By eliminating barriers associated with class management, data record assembly, data validation, and data publication, BOLD-SDP has created a remarkably effective pathway for secondary and postsecondary students to not only contribute to the landmark iBOL effort, but to engage in a constellation of 21st century science practices centered on data literacy and big data science.

VIDIA: a HUBzero Gateway for Data Analytics Education, plus a demo of the VIDIA Gateway: a Virtual Infrastructure for Data Intensive Analysis (Jeanette Sperhac and Steven M. Gallo) (30 minute presentation and demo)

We describe a scientific gateway collaboration undertaken by members of the State University of New York (SUNY) system. The University at Buffalo's Center for Computational Research (CCR) partnered with SUNY College at Oneonta to offer a gateway for teaching data analytics. The result, called VIDIA, hosts open-source tools that have been used by more than 250 students enrolled in 16 SUNY courses. Additional tools enable researchers across the SUNY system to submit larger jobs to CCR's compute cluster. VIDIA supports data-intensive computation for teaching and research at campuses that lack access to traditional high-performance computing (HPC) resources.

Return to Concurrent Sessions C in Program

Wrangling Data

DARE-BigNGS : A Science Gateway Model for Scalable NGS Data Analytics Over Distributed HPCs and Clouds (Joohyun Kim, Shayan Shams, Nayong Kim, Mohammad Jalalzai and Seung-Jong Park)

We introduce the science gateway project, DARE-BigNGS, which is built upon a gateway model whose primary goal is to provide services of scalable Next-Generation Sequencing (NGS) data analytics. As use cases, the two signature pipelines for transcriptome/metagenome and somatic mutation discovery, respectively, are developed and are offered as services via the gateway. In this work, we report the core strategies, benchmark results, and technical details around how to achieve the scalability for NGS data sets intrinsically associated with challenges with ever-growing data volumes and complexity of data analysis due to errors and artifacts of the sequencing technology. Recent enhancements on user-friendly interface components of the gateway project are also described.

Demo: Interactive Access to Internet Topology Data (Kc Claffy, Marina Fomenkov, Bradley Huffaker and Young Hyunn)

CAIDA researchers have gathered the largest set of network topology data used for a broad spectrum of scientific research: from physics to biology, from cyber infrastructure vulnerability assessments to theory of complex networks. In order to broaden the research community access to the wealth of raw and curated data that we produce, we are developing two interactive portals, AS Rank and Vela, that we would like to demonstrate at the Gateways Conference.

Effective and interactive dissemination of diffusion data using MPContribs, plus a demo of UW/SI2 and MPContribs (Patrick Huck, Kristin Persson, Dan Gunter, Dane Morgan, Henry Wu and Tam Mayeshiba) (30-minute presentation and demo)

We will describe in this talk how the general approach taken by MPContribs solves the very specific challenges faced by the UW researchers in effectively disseminating their data to the public. The presented solution developed in the collaborative effort between UW and LBNL is the first to demonstrate how MPContribs can empower research groups through the rapid development and deployment of customized but MP-compatible web applications either using on-site or MP resources. It will also be shown how these efforts directly translate into solutions for the ongoing collaboration with researchers at the Advanced Light Source at LBNL [1] in which we aim to develop a processing pipeline for experimental XAS data from the beamline computer to integrated analysis web apps on MP.

In our demo portion, we show the integration of the UW/SI2 workflow with MPContribs and JupyterHub. See [2] for a quick impression of the general functionality for the UW/SI2 use case. The video and the demo illustrate how MPContribs can be used to contribute, explore and feed data to the generic contribution details pages as well as a project-specific web application.

[1] MPContribs, arXiv:1510.05024, arXiv:1510.05727, MRS Spring 2016

[2] https://www.youtube.com/watch?v=wbWde5StHnU (3:43min)

Return to Concurrent Sessions C in Program

Concurrent Sessions D (Thursday at 9:25am)

Platforms for Gateway Building

GenApp: Extensible Tool for Rapid Generation of Web and Native GUI Applications (Alexey Savelyev and Emre Brookes) (30-minute presentation)

GenApp is a universal and extensible tool for rapid deployment of applications. GenApp builds fully functioning science gateways and standalone GUI applications from collections of definition files and libraries of code fragments. Among the main features are the minimal technical expertise requirement for the end user and an open-end design ensuring sustainability of generated applications. Because of the conceptual simplicity of use, GenApp is ideally suited to scientists who are not professional developers, to disseminate their theoretical and experimental expertise as embodied in their code to their communities by rapidly deploying advanced applications.

Enabling Interactive Notebooks on Supercomputers with Jupyterhub (Shreyas Cholia, R. Shane Canon and Rollin Thomas)

Interactive notebook systems such as Jupyter represent a new paradigm in web science gateways that can combine interactive code execution with data analysis and exploration. In our work we demonstrate how one can create and manage interactive notebooks in a multi-user supercomputing environment using the Jupyterhub platform. We describe our architecture along with custom modules that we developed for Jupyterhub to manage authentication, notebook execution and interaction with the job queueing system. We illustrate the power of this system through the OpenMSI use case, and outline future directions for this effort.

Return to Concurrent Sessions D in Program

Data Infrastructure Building Blocks (DIBB)

SeedMe2: Data sharing building blocks (Amit Chourasia, David Nadeau, John Moreland, Dmitry Mishin and Michael Norman), plus a demo of SeedMe.org – Share data and visualization (Amit Chourasia, Mona Wong, Dmitry Mishin, David Nadeau and Michael Norman) (30-minute presentation and demo)

The need for data sharing and rapid data access has become central with the rise of collaborative research in many disciplines. Several data sharing approaches have emerged for consumer use cases that primarily need an easy way to share files using web browsers. However, these approaches are not well suited to the particular demands of large-scale data sharing for computational research. Whereas consumer approaches primarily support manual user interfaces to add and remove files, the huge number of files that can be generated during and after a large-scale computation job make manual data sharing interfaces impractical. Instead, these tasks require mechanisms that integrate into computation workflows to automatically post files during and after computation jobs. Furthermore, scientific data sharing requires additional metadata and descriptive information that characterizes shared data to record job and compute platform characteristics, input data, job parameters, job completion status, and other record-keeping required to document the trajectory of computational research. Without these features, consumer data sharing approaches are not well suited for computational science.

Brown Dog A Science Driven Data Transformation Service (Kenton McHenry, Jong Lee, Praveen Kumar, Barbara Minsker, Jay Alameda, Shannon Bradley, Luigi Marini, Rob Kooper, Christopher Navarro, Smruti Padhy, Marcus Slavenas, Sandeep Satheesan, Yan Zhao, Bing Zhang, Inna Zharnitsky and Eugene Roeder)

With growing diverse volumes of digital data becoming part of modern scientific workflows, many research projects today begin with a process of data wrangling, i.e. finding, manipulating, indexing, cleaning, and bringing together needed datasets. Brown Dog, a Science Driven Data Transformation service, aims to alleviate much of the overhead and heterogeneity involved in this step, which in turn hinders scientific reproducibility, by providing data transformations such as format conversions and content based extractions as a service. Through a REST API Brown Dog supports diverse usage by various clients such as gateways, programming languages, and tools. As a gateway it provides a venue to access and preserve data transformation tools, track provenance, track information loss, manage data movement, and process jobs in a scalable manner across a diverse set of computational resources. Overall, Brown Dog provides a low level data infrastructure to interface with digital data contents and through its capabilities enable a new era of science and applications at large over otherwise difficult to access datasets. Further, Brown Dog aims to serve not just the scientific community but the general public as a “DNS” for data, moving civilization towards an era where applications can be largely agnostic to the format/structure of the data and can instead focus on novel processes/applications on the contents.

Return to Concurrent Sessions D in Program

“Lessons Learned” Plenary Panel (Thursday at 10:40am)

This panel will feature four 15-minute presentations about gateway building lessons that are broadly applicable to many projects, followed by an open discussion and Q&A.

Organizational Attributes of Successful Science Gateways and Cyberinfrastructure Projects (Kerk Kee)

While much of the adoption and diffusion of science gateways and/or computational tools for e-science is driven by the attributes of the tools [see 1], the organizations behind the tools play a critical role in determining the ultimate diffusion. This paper reports 10 organizational attributes of successful gateways and cyberinfrastructure projects. Based on a systematic analysis of 135 interviews, the organizational attributes include having leaders with credibility, multidisciplinary expertise, collaborative environment, shared goals, a common language, strategic structure, productive routines, organizational capacity, sustainable funding, and personnel continuity.

Analysis of nanoHUB.org from a Business Perspective (Lynn Zentner, Michael Zentner and Gerhard Klimeck)

Science Gateways are typically implemented by scientists and cyberinfrastructure experts to serve a specific, externally-funded scientific focus. Sustainability of those gateways and the knowledge and tools they contain continues to be of strong interest to both the gateways and the funding agencies that support their creation and initial operation. nanoHUB.org is an established gateway entering its 15th year of operation and now serving over 1.4 million visitors annually. The leadership at nanoHUB have utilized a partnership with Purdue’s Krannert School of Management to collaborate on examining nanoHUB from a business perspective. The results of these collaborations are outlined in this extended abstract and will be expanded upon in the corresponding presentation.

The CIPRES Science Gateway at Year 6: Lessons learned and best practices (Mark Miller, Terri Schwartz and Wayne Pfeiffer)

The CIPRES Science Gateway was created in 2009 to provide the international phylogenetic community with easy access to parallel versions of community codes run on High Performance Computing resources available through the XSEDE project. After six years of operation, job submissions continue to increase each month, and more than 19,000 individual users have been provided with access. Here we describe the lessons learned and best practices acquired over the lifetime of the CIPRES Science Gateway.

Evidencing the Value of Australian Science Gateways (Michelle Barker) [CANCELLED]

Impact metrics play a key role in evidencing the value of science gateways, and a range of studies are now emerging that demonstrate the significance of science gateways in different ways. This paper examines some of the research approaches that are being utilized in this field, and discusses the value of different approaches in demonstrating different types of outcomes. The paper concludes by examining the Australian context, and reflects on how this is affecting the analytical approaches being employed to measure the impact of Australian-based science gateways.

Return to Lessons Learned Plenary Panel in program

Concurrent Sessions E (Thursday at 1:00pm)

Scheduling Strategies

MultiCloud Resource Management using Apache Mesos with Apache Airavata (Pankaj Saha, Madhusudhan Govindaraju, Suresh Marru and Marlon Pierce)

We discuss initial results and our planned approach for incorporating Apache Mesos based resource management that will enable design and development of scheduling strategies for Apache Airavata jobs so that they can be launched on multiple clouds, wherein several VMs do not have Public IP addresses. We present initial work and next steps on the design of a meta-scheduler for Apache Airavata, which uses Apache Mesos to present a unified view of resources available across several clouds and clusters. This meta-scheduler can potentially examine and identify the cases where multiple small jobs have been submitted by the same scientists and then redirect job from the same community account or user to different clusters. Our approach uses a NAT firewall to make cloud VMs, without a Public IP, visible to Mesos for the unified view.

Demo: Co-Scheduling HPC and BigData jobs using Apache Mesos (Shameera Yodage, Suresh Marru and Marlon Pierce)

Scientific computing, by many measures, is becoming more complicated and heterogeneous. At the application level, cloud-native and data-intensive computing based on MapReduce and its descendants require computing resources that go beyond the traditional batch queuing model. Computing resources themselves have become more heterogeneous as hybrid architectures with graphical processing units (GPUs) and coprocessors (Xeon PHI) are providing alternatives to CPU based parallelizing techniques for traditional High-Performance Computing (HPC). Science gateways have traditionally provided simplifying interfaces for end users that hide the complications of using complicated resources. Thus the core reason for science gateways to exist is becoming more important, while the challenges for building science gateway middleware increases. In this demonstration, we showcase developments in the Apache Airavata science gateway framework that address these challenges by leveraging the Apache Mesos ecosystem to schedule both HPC and BigData jobs; we use XSEDE’s next generation JetStream, Comet, Bridges and Wrangler clusters. Each of these clusters provide unique capabilities; this demonstration will highlight how gateways can take advantage of these capabilities through a unified approach to middleware.

Reactive, Interactive, and High Throughput Computation in the Agave Platform (Rion Dooley and Joe Stubbs)

The Agave Platform is an open Science-as-a-Service platform that empowers users to run code, manage data, collaborate meaningfully, and integrate easily with the world around them. In this extended abstract, we examine how new use cases have helped evolve the ways in which Agave supports code execution. We start out by identifying three new code execution paradigms identified in the past 2 years: Interactive, Reactive, and High Throughput computing. We then briefly describe the primary use cases driving each paradigm and how the platform evolved in response to them. Finally, we highlight areas of future work in the platform related to code execution.

Return to Concurrent Sessions E in Program

Gateway Workhorses

Dressing Apache Airavata Services With Automatically User-Generated Interfaces (Daniele D’Agostino, Luca Roverelli, Gabriele Zereik, Emanuele Danovaro, Andrea Clematis and Antonella Galizia)

The development of community-specific user interfaces of a science gateway can be a challenging task for non-IT experts. This contribution proposes an original, easy-to-use solution to tackle this issue based on the PortalTS Web portal. In particular, we present how PortalTS can “dress” Apache Airavata to manage job submissions.

Demo: Pegasus Scientific Workflows within HUBzero (Mats Rynge, Karan Vahi and Steven Clark)

The HUBzero platform for scientific collaboration enables tool developers to build tools that are easily shared with both researchers and educators. This enables users to login and start their analysis without worrying about setup and configuration of the tools. Once the analysis is done, researchers can analyze the results using various inbuilt capabilities for plotting and visualization. To facilitate handling of more complex workloads, we have integrated Pegasus Workflow Management System with “submit”, the main tool used by tool developers in HUBzero to submit analysis to local and remote compute resources. Pegasus WMS provides a means for representing the application workflow in an abstract form which is independent of the resources available to run it and the location of data and executables. It compiles these abstract workflows into an executable form that can be executed on local or remote distributed resources. Pegasus also captures all the provenance of the workflow lifecycle from the planning stage, through execution, to the final output data. This enables users to easily debug and monitor their computations that occur on remote resources. The advanced data management capabilities of Pegasus allow the tool developers to execute the tightly coupled parts of their workloads on a HPC cluster, while farming out remaining tasks to a distributed HTCondor based computing infrastructure. The demo will give an introduction to scientific workflows with Pegasus and focus on integration of Pegasus WMS with “submit”, and how it enables tool developers using the Rappture toolkit or “submit” directly to use scientific workflows.

The Globus Platform for Science Gateways (Steven Tuecke)

The Globus service leverages advanced network, computing, and authentication infrastructures to provide a new data and security platform on which you can create entirely new classes of portals, science gateways, and web applications for research. In this talk we will provided a brief overview of the Globus platform and Science DMZ, and explore real-world examples that demonstrate how these new technologies can be applied to a broad range of use cases such as: the National Center for Atmospheric Research’s Research Data Archive, which provides for high-speed delivery of research data to thousands of geoscientists; the Globus data portal, which provides for interactive data staging to/from experimental facilities and computing centres, and the publication of data generated at such facilities and centres; and Jetstream’s Atmosphere interface, which provides a web interface to the Jetstream cloud resources.

Return to Concurrent Sessions E in Program