SGCI webinars bring together community members across the globe.

Content with Resources Webinar .

Webinar: Collecting Docker metrics with Python and Prometheus

May 8, 2019

Collecting Docker metrics with Python and Prometheus

Presented by Julia Looney, SGCI Consultant and Python Developer, Texas Advanced Computing Center

Prometheus is an open source tool for collecting and displaying metrics and stats. It can be used to collect metrics such as memory usage in Docker containers as well as many other metrics that might help with managing operations or obtaining funding. Using a basic Python API, you can collect metrics in Prometheus for applications running in Docker.

Follow along in GitHub

Watch on YouTube


Webinar: DataONE—Data Partner for Science Communities

April 10, 2019

DataONE—Data Partner for Science Communities

Presented by

  • Robert J. Sandusky, Ph.D., Associate Dean for Information Technology & Associate Professor, University Library, University of Illinois at Chicago
  • Amber E. Budden, Ph.D., DataONE Co-PI and Director for Community Engagement & Outreach, Arctic Data Center Co-PI, Senior Research Fellow at the National Center for Ecological Analysis & Synthesis
  • Karl Benedict, Ph.D., Director at Research Data Services, Associate Professor at College of University Libraries & Learning Sciences, University of New Mexico

DataONE federates more than 45 independent scientific data repositories from six continents providing access and discovery, preservation, data provenance, metadata quality services, data citations, data science training, and more. This webinar will provide an overview of existing capabilities and services currently in development to support a wide range of scientific communities and data formats.

DataONE was funded in 2009 by the National Science Foundation to provide solutions for long-term access and preservation of data generated by other projects and conduct a broad education and outreach campaign to transform scientific practice. DataONE is a global leader in research data infrastructure development, research, and education.

Webinar Slides

Resources mentioned during the webinar:
DataONE API 

Watch on YouTube


Webinar: Every gateway has competition—Identifying, differentiating from, and working with competitors

March 13, 2019

Every gateway has competition—Identifying, differentiating from, and working with competitors

Presented by Juliana Casavan, Entrepreneurial Programs Manager at the Purdue Foundry and SGCI's Science Gateways Bootcamp Instructor

Every product, service, or project has competitors, but how do you identify them? Do you need to worry about competitors or maybe collaborate with them? During this session, we will first identify your audience so that you can consider your competitors from their perspective. We will then talk about how to identify competitors and find your differentiators. We will also discuss how to consider collaboration and the benefits of working with your competitors.

Webinar Slides

Watch on YouTube


Webinar: What's Wrong with Computational Science Software? Absolutely Nothing (nearly)

February 13, 2019

Q: What's Wrong with Computational Science Software? A: Absolutely Nothing (nearly)

Presented by Dr. Timothy Menzies, Professor of Computer Science, NC State University, and IEEE Fellow

Much of the data described in this presentation is drawn from computational science research software, and Tim's research team engaged deeply with the PIs funded by NSF's Software Infrastructure for Sustained Innovation (SI2) program, launched in 2010 to fund software research at multiple scales. This program has included annual PI meetings, where Tim presented and interacted with this developer community. When Tim refers to "you" in this presentation, it is this community of computational science developers/PIs, including but not limited to those in the SI2 program.

While funded with an NSF Eager grant, we have been applying empirical software engineering (SE) methods to software systems built for computational science. What we expected to see, and what we saw, were two very different things.
 
Initially, we spent time talking to computational scientists from CERN and around America. Most expressed concerns that their software was somehow not as good as it might be. Yet those concerns proved unfounded, at least for the code we could access.
 
Using feedback from the computational science community, we found and analyzed 40+ packages (some of which were very widely used in computational science). Many of those systems had been maintained by large teams, for many years. For example:
  • Lammps has 16,000+ commits from 80 developers (since 2012).
  • Trilinos (which is a more recent package) has been built by 80,000 commits from over 200 developers.
  • Elastic search is over 8 years old and has been built by 40,000+ commits from over 1100 developers.
  • Dealii has been maintained and extended since 1990 via 40,000+ commits from 100 active developers.
Note that some of these projects (e.g. Dealii) are much larger and show greater longevity than many open source projects. When we talked to the developers of these 40+ packages (particularly the post-docs), we found a group that was very well versed in current coding practices (Githib, Travis, etc).
 
LESSON 1: Many of those systems were written in modern languages (e.g. Python) or used modern programming tools (e.g. version control)
 
The reasons for this were economic and sociological: these developers are smart people who know that after their NSF-funding is over, they might get well-paid jobs in the software industry. Hence, it was in their interests to know current practices. Accordingly:
 
LESSON 2: Increasingly, computational software is being written using state-of-the-art software tools and practices.
 
When we applied standard SE defect predictors to that code, to our surprise they mostly failed since:
 
LESSON 3: Computational science code has a different (and lower) bug rate than other kinds of software.
 
Standard empirical SE methods, when applied to computational science code, failed to build useful defect predictors. In fact, to handle the computational science codes, we had to develop new methods that could handle such exemplary software. In the end, such predictors could be built, but only after significantly extending standard empirical SE methods. Hence:
 
LESSON 4: Computational science is an excellent testbed for the rest of the SE community to stress test their tools.
 
Note that the above suffers from a sampling bias (we could only examine the open source packages). But one thing is clear: the state-of-the-practice in computational science software is much healthier and insightful than what is commonly believed.
 
 

Watch on YouTube


Webinar: Authorizing Access to Science Gateway Resources

January 9, 2019

Authorizing Access to Science Gateway Resources

Presented by Jim Basney of NCSA & Trusted CI, Marlon Pierce of Indiana University & SGCI, and Tom Barton of the University of Chicago & Internet2

Data use agreements, controlled-access data sets, and restricted-access scientific instruments are just a few examples of authorization challenges faced by science gateways. There are many options for authenticating science gateway users, but fewer options for implementing complex authorization policies after users log on. The three panelists for this webinar will present their perspectives and experiences with authorization solutions applicable to science gateways.

Webinar Slides

Q&A from the webinar:

  • Q: In OAuth, can the user choose which items they allow and which not from the list of access requested by the app?
    A: In general a gateway should only request what items they need, so a user would accept or deny all.
  • Q: What is a good resource for getting started with Research & Scholarship attributes? I collect these attributes for my gateway with a custom sign-up form.
    A: InCommon's Research & Scholarship info is here: https://spaces.at.internet2.edu/display/InCFederation/Research+and+Scholarship+Category
  • Q: What is the URL for the paper "Federated Identity Management for Research Collaborations"?
    A: https://zenodo.org/record/1307551#.XDY8989KhAY
  • Q: Is there any sort of federation body that takes into consideration students in K-12?
    A: The Steward program was started to address this audience https://www.incommon.org/steward/, but there hasn't been enough perceived need to carry the project forward. As of now, there is no federation body for K-12.
  • Q: Any tips or success paths for gateways that must deal with PII information? I know this is not directly related to security but security plays a huge role in the overall infrastructure plan.
    A: That's a big question! Some things that come to mind: First, I hope that on your campus the CISO's office is viewed as a good enabling resource. If so, they should be able to provide in-depth guidance and assistance. Second, the TrustedCI Open Science Cyber Risk Profile https://trustedci.org/oscrp can be useful to help you think through how to suitably protect the PII. 
  • Q: How do I learn more about Airavata?
    A: See http://airavata.apache.org/mailing-list.html to subscribe to the Airavata dev list.

Watch on YouTube