SGCI webinars bring together community members across the globe.

Content with Resources Webinar .

Webinar: Delta—A Large-Scale GPU Resource for Science Gateways

June 9, 2021

Presented by:
Tim Boerner, Senior Assistant Director for the Delta Project Office, NCSA
Brett Bode, Assistant Director for the Blue Waters Project Office, NCSA
Greg Bauer, Lead for the Science and Engineering Applications Support, NCSA

Delta will be the largest GPU resource in the NSF portfolio and is scheduled to launch in fall 2021. The world of research codes and optimizing them to utilize GPUs is a rapidly evolving area. This area of research computing has grown from GPUs as accelerated computing support for special applications to a broad mix of uses, including hybrid computing, machine learning, accelerated computing, visualization, and user interactive computing. This presentation will share details about Delta and how you can get your science gateway’s research codes up and running.

SGCI is proud to be a partner in the development of Delta. Read more about the collaboration.

Webinar Slides


Webinar: CloudBank and Cloud Implications for Science Gateways

May 12, 2021

Presented by Rob Fatland, UW Research Computing Director & CloudBank Lead of Education, Outreach and Training, and Shava Smallen, San Diego Supercomputer Center and CloudBank lead architect of user portal

CloudBank supports learning about and using the cloud stack from proposal writing through to post-grant data provisioning. In this webinar we present the thesis of cloud viability in relation to the science gateway paradigm.  This naturally touches on reproducibility and data sharing with extension to technical challenges of building and using science gateways. We will describe the cloud development environment in terms of services: Tools and technologies developed on top of the basics of compute, storage and networking that serve to simplify and accelerate infrastructure building.

About CloudBank: CloudBank is an NSF-sponsored collaboration among UC San Diego, University of Washington and UC Berkeley to promote and support cloud computing adoption in data science, both in the strategic sense of understanding cloud technology and the tactical sense of managing cloud accounting and spend, security and resource management. CloudBank comprehensively answers the question 'How does moving data and computation to a cloud platform work in practice over the full arc of a research grant?' CloudBank originated from the computer and information science and engineering (CISE) directorate at NSF and will expand its supportive role to additional domains. CloudBank's focus is research acceleration via the necessary technical and sociological shifts in the research community's perception. 

Gateways and the Cloud Padlet (contributions are welcome!)

Webinar Q&A

Q: When using public cloud provider calculators, are prospective PIs still finding it challenging to identify what resources they need to use and how much consumption there may be? How do researchers gain support services from cloud service partners?
A: Yes, absolutely, a challenge! Optimization support: CloudBank provides this directly to a degree ('first order'); and we integrate support with the vendors and are a networking resource to additional service providers.

Q: When the data that the PI puts in CloudBank is generated at the home institution and sensitive, do you have BAA that covers the home institution with the cloud vendor? What about security and compliance?
A: Not resolving but towards this: CloudBank places no restrictions on account access; so you can imagine granting access to research team members and IT staff from the awardee institution. As far as BAA's on Google Cloud, they are accepted by individual users to meet compliance needs. Cloud projects have Security Command Center references to any potential risks. These can be controlled by policy per organization which could cascade to projects within that organization. Each public vendor varies in their approach to confidential computing.
 
 

Webinar Slides


Webinar: Starfish—Taking the Meh Out of Metadata

April 14, 2021

Presented by Jacob Farmer, Founder and Chief Evangelist, and Peter Galvin, Principal Solutions Architect and GUI Product Manager, Starfish Storage 

Starfish is the most sophisticated platform for unstructured data management. It gives all the stakeholders—management, storage administrators, content creators, and content curators—visibility to a rich array of both current and historical file information. In this session, we will explore how Starfish can enrich data to drive value for gateway users and simplify data management for gateway administrators. 
 

Jacob Farmer is the Chief Technology Officer of Cambridge Computer, a position he has held since co-founding the company in 1991. Jacob has been a familiar face in the data storage industry for over 25 years. He has written for a number of trade publications and has been a featured speaker at many conferences and educational events, most notably USENIX, LOPSA, and SNIA.

In the data storage industry, Jacob is best known for authoring best practices on enterprise data protection and for helping numerous early-stage companies define their target use cases and establish their first customers. In academic circles, he is highly regarded for his work in defining best practices to manage the life cycle of scientific research data and for identifying novel solutions to reduce costs and streamline operations related to digital preservation. Jacob is a graduate of Yale University.

Webinar Q&A

Q: How does the time travel interact with moving files over time? You can’t necessarily access moved data, right? Or can it be “restored” from where it was moved? Asking about the actual files, not the historical metadata.
A: To further clarify - we are a metadata management software product, not a file system or data storage product. So your files/objects are yours and stay where they currently are unless you ask Starfish to, for example, copy or move the data. Then we’ll put it onto one of your file systems or your cloud/object stores.
 
Q: Can jobs be triggered based on sensing that certain files are added or updated somewhere? And apart from sensing is there scheduling for jobs, or does one use e.g. cron to call the Starfish API to start jobs?
A: If Starfish moves the files, we know where we put them and can bring them back. We will try to show that in the demo if we have time. When we scan a file system? Yes, you can then ask Starfish what files are new, and do things to those files. Or you could create a signal file to say, “Hey, the instrument is done with this run, Starfish it’s okay to process them now." The time of detection depends on how often you have Starfish run scans on a given file system (it’s configurable). On Lustre and GPFS we can listen to events. Others (currently) we scan...
 
Q: So it seems that “scans” and “signal files” are terms Starfish uses?
A: "Scans" is a Starfish term. “Signal files” is a generic idea that you can create a file to indicate that says data is done being written.

Q: Do file systems have to be "public"? Otherwise, how does Starfish gain access?
A: We are a general-purpose unstructured data management system. So if you have files in a file system we can manage them. Separate from if they are public or private. You run us on site and point us at file systems, etc. We could, for example, copy files to some public gateway if you want, or use us for anything else...

Duke Health did a webinar to explain how Starfish helped them save over $1 million by enabling researchers to manage their own files and objects. The recording is available here: https://starfishstorage.com/springclean/.

Anyone with questions may contact Peter Galvin at pgalvin@starfishstorage.com.
 
 


Webinar: Simplifying Cloud HPC Integration for Gateways

March 10, 2021

Presented by Boyd Wilson, Executive & Architect, CloudyCluster Team

Transitioning high-performance computing (HPC) workloads and data to various sites is a longstanding tradition and challenge in the scientific software and gateway developer communities. Recently more institutions are looking at the ability to integrate cloud for HPC.  During this session, we would like to explore ideas and options of integrating cloud for gateways and HPC workloads.  A few approaches will be discussed including what led to the development of CloudyCluster.  An additional focus will be improvements in latency, which have been seen over the past few years, in the Cloud. Furthermore, this webinar will be an open platform for discussion, so we can all learn from each other’s experiences.

Webinar Q&A

Q: I cloud is elastic, why do we need schedulers? Can't jobs start immediately?
A: The scheduler will manage the types of compute resources needed and also manage the optimization of the compute nodes, then teardown the completed instances to keep the costs managed.

Q: Is there an enforcement mechanism for google cloud subscriptions? Or do we have to constantly monitor?
A: GCP has Billing Alerts that help designated Project Managers/Owners keep on top of certain billing thresholds that you define as an Administrator.
 
 


Webinar: PresQT—How to achieve FAIRness in your gateway

February 10, 2021

Presented by Sandra Gesing, Associate Research Professor, Notre Dame, and SGCI Team Member

The wide uptake of the concept FAIR (Findability, Accessibility, Interoperability, and Reuse) in academia reflects the importance of sharing data and software for research and teaching. While the concept is widely used, metrics and tests to measure FAIRness are mostly still works in progress. The project PresQT (Preservation Quality Tool) serves as a boilerplate between repositories and science gateways, adding beneficial metadata and FAIR tests using FAIRsharing and FAIRShake. The PresQT services extend the science gateway landscape in a way that stakeholders can keep working in their chosen computational environment and receive additional features instead of having to switch to a different software. This webinar will introduce PresQT and FAIR and then demo the available PresQT services, especially the FAIR testing.

More about PresQT: PresQT and its standards-based design with RESTful web services have been informed via user-centered design and is a collaborative, open-source, implementation effort.  Current partners or implementations for open APIs include OSF, CurateND, EaaSI, GitHub, GitLab, Zenodo, figshare, WholeTale, Jupyter and HUBzero. To present the capabilities of PresQT to researchers, developers of computational solutions, users of PresQT services, and funding bodies, we have developed a demo user interface that allows testing of the different features of PresQT services. Visit https://presqt-prod.crc.nd.edu/ui/ to try it.

Other useful links:
Software Preservation Network: Emulation as a Service Infrastructure (EaaSI)
Try the EaaSI sandbox
EaaSI GitLab
FAIRSHAKE rubrics

PresQT Documentation
Welcome to PresQT
Target Integration
Web Services
Services
Verifying Keyword Enhancement
Keyword Assignment
Architecture/Infrastructure
QA Testing

Webinar Slides