Webinar: Do you have real-time, streaming-data, event-driven workflows? The Tapis Streams API can help.
July 14, 2021
Presented by Smruti Padhy, Engineering Scientist, Cloud and Interactive Computing group, Texas Advanced Computing Center
The Tapis framework, an NSF-funded project, is an open-source, scalable, API platform that enables researchers to perform distributed computational experiments securely and achieve faster scientific results with increased reproducibility. Tapis Streams API focuses on supporting scientific use cases that require working with real-time sensor data. The Streams Service, built on the top of the CHORDS time-series data service, allows storing, processing, annotating, querying, and archiving time-series data. This presentation will focus on the new Tapis Streams API functionality that enables researchers to design and execute real-time, data-event-driven workflows for their research. We describe the architecture and design choices towards achieving this new capability with Streams API. Specifically, we demonstrate the integration of Streams API with Kapacitor, a native data processing engine for time-series database InfluxDB, and Abaco, an NSF-funded project, web service, and distributed computing platform providing function-as-a-Service (FaaS). The Streams API, which includes a wrapper interface for the Kapacitor alerting system, can define and enable alerts. Finally, the water-quality use-case simulation results show that Streams API’s new capabilities can support real-time, streaming-data, event-driven workflows.
Catch the Tapis team at PEARC21:
Building Tapis v3 Streams API Support for Real-Time Streaming Data Event-Driven Workflows, July 20, 2021
Tapis User Meeting, July 22, 2021
Resources mentioned during the webinar:
Webinar: Delta—A Large-Scale GPU Resource for Science Gateways
June 9, 2021
Tim Boerner, Senior Assistant Director for the Delta Project Office, NCSA
Brett Bode, Assistant Director for the Blue Waters Project Office, NCSA
Greg Bauer, Lead for the Science and Engineering Applications Support, NCSA
Delta will be the largest GPU resource in the NSF portfolio and is scheduled to launch in fall 2021. The world of research codes and optimizing them to utilize GPUs is a rapidly evolving area. This area of research computing has grown from GPUs as accelerated computing support for special applications to a broad mix of uses, including hybrid computing, machine learning, accelerated computing, visualization, and user interactive computing. This presentation will share details about Delta and how you can get your science gateway’s research codes up and running.
SGCI is proud to be a partner in the development of Delta. Read more about the collaboration.
Webinar: CloudBank and Cloud Implications for Science Gateways
May 12, 2021
Presented by Rob Fatland, UW Research Computing Director & CloudBank Lead of Education, Outreach and Training, and Shava Smallen, San Diego Supercomputer Center and CloudBank lead architect of user portal
CloudBank supports learning about and using the cloud stack from proposal writing through to post-grant data provisioning. In this webinar we present the thesis of cloud viability in relation to the science gateway paradigm. This naturally touches on reproducibility and data sharing with extension to technical challenges of building and using science gateways. We will describe the cloud development environment in terms of services: Tools and technologies developed on top of the basics of compute, storage and networking that serve to simplify and accelerate infrastructure building.
About CloudBank: CloudBank is an NSF-sponsored collaboration among UC San Diego, University of Washington and UC Berkeley to promote and support cloud computing adoption in data science, both in the strategic sense of understanding cloud technology and the tactical sense of managing cloud accounting and spend, security and resource management. CloudBank comprehensively answers the question 'How does moving data and computation to a cloud platform work in practice over the full arc of a research grant?' CloudBank originated from the computer and information science and engineering (CISE) directorate at NSF and will expand its supportive role to additional domains. CloudBank's focus is research acceleration via the necessary technical and sociological shifts in the research community's perception.
Gateways and the Cloud Padlet (contributions are welcome!)
Q: When using public cloud provider calculators, are prospective PIs still finding it challenging to identify what resources they need to use and how much consumption there may be? How do researchers gain support services from cloud service partners?
A: Yes, absolutely, a challenge! Optimization support: CloudBank provides this directly to a degree ('first order'); and we integrate support with the vendors and are a networking resource to additional service providers.
Webinar: Starfish—Taking the Meh Out of Metadata
April 14, 2021
Presented by Jacob Farmer, Founder and Chief Evangelist, and Peter Galvin, Principal Solutions Architect and GUI Product Manager, Starfish Storage
Jacob Farmer is the Chief Technology Officer of Cambridge Computer, a position he has held since co-founding the company in 1991. Jacob has been a familiar face in the data storage industry for over 25 years. He has written for a number of trade publications and has been a featured speaker at many conferences and educational events, most notably USENIX, LOPSA, and SNIA.
In the data storage industry, Jacob is best known for authoring best practices on enterprise data protection and for helping numerous early-stage companies define their target use cases and establish their first customers. In academic circles, he is highly regarded for his work in defining best practices to manage the life cycle of scientific research data and for identifying novel solutions to reduce costs and streamline operations related to digital preservation. Jacob is a graduate of Yale University.
A: To further clarify - we are a metadata management software product, not a file system or data storage product. So your files/objects are yours and stay where they currently are unless you ask Starfish to, for example, copy or move the data. Then we’ll put it onto one of your file systems or your cloud/object stores.
A: If Starfish moves the files, we know where we put them and can bring them back. We will try to show that in the demo if we have time. When we scan a file system? Yes, you can then ask Starfish what files are new, and do things to those files. Or you could create a signal file to say, “Hey, the instrument is done with this run, Starfish it’s okay to process them now." The time of detection depends on how often you have Starfish run scans on a given file system (it’s configurable). On Lustre and GPFS we can listen to events. Others (currently) we scan...
A: "Scans" is a Starfish term. “Signal files” is a generic idea that you can create a file to indicate that says data is done being written.
Q: Do file systems have to be "public"? Otherwise, how does Starfish gain access?
A: We are a general-purpose unstructured data management system. So if you have files in a file system we can manage them. Separate from if they are public or private. You run us on site and point us at file systems, etc. We could, for example, copy files to some public gateway if you want, or use us for anything else...
Duke Health did a webinar to explain how Starfish helped them save over $1 million by enabling researchers to manage their own files and objects. The recording is available here: https://starfishstorage.com/
Anyone with questions may contact Peter Galvin at firstname.lastname@example.org.
Webinar: Simplifying Cloud HPC Integration for Gateways
March 10, 2021
Presented by Boyd Wilson, Executive & Architect, CloudyCluster Team
Transitioning high-performance computing (HPC) workloads and data to various sites is a longstanding tradition and challenge in the scientific software and gateway developer communities. Recently more institutions are looking at the ability to integrate cloud for HPC. During this session, we would like to explore ideas and options of integrating cloud for gateways and HPC workloads. A few approaches will be discussed including what led to the development of CloudyCluster. An additional focus will be improvements in latency, which have been seen over the past few years, in the Cloud. Furthermore, this webinar will be an open platform for discussion, so we can all learn from each other’s experiences.
Q: I cloud is elastic, why do we need schedulers? Can't jobs start immediately?
A: The scheduler will manage the types of compute resources needed and also manage the optimization of the compute nodes, then teardown the completed instances to keep the costs managed.
A: GCP has Billing Alerts that help designated Project Managers/Owners keep on top of certain billing thresholds that you define as an Administrator.