Teaching Science Gateways: A Workforce Development Strategy
By Marlon Pierce and Suresh Marru
Overview of the Science Gateways Architecture Course
We’ve come to appreciate the challenge of running a successful gateway over the last two decades. It can take a lot of work. You need someone to write software, someone to do integration work with third-party software and services, someone to operate services, someone to provide quality assurance, and lots of other tasks.
As everyone probably knows, it is a challenge to hire people to do all that type of work in academic environments. Universities can’t match the salaries that people can make elsewhere, and not everyone wants to live in a college town.
At the Science Gateways Research Center (SGRC) at Indiana University, we develop a number of science gateways (usually ones that focus on providing software as a service) using the Apache Airavata software framework, which we run as a service. Hiring people is something we spend a lot of time on, and we need a good way to do it.
A big part of the answer is to hire students. If you do things well, you can get 6-24 months or even more of time and contributions from some very capable students. The currency we have is the opportunity of experience that we can give the students, who can demonstrate their skills and obtain some separation from their peers in the job market by making public contributions to our open source projects.
But we can’t hire just anyone. Resumes all look the same, and interviews take a lot of time. In our experience, you don’t really know until you’ve worked with someone how well they will do. How can we scalably attract a lot of interest in our projects, engage with lots of students, and find out who is talented and motivated?
We realized that there was a solution to this problem right in front of us: teach a class that covers science gateways and all the skills that it takes to develop and operate one.
We started the class, which we call Science Gateway Architectures, in Spring 2016 with 13 students and have taught a version of it every semester since then. The course website is http://courses.airavata.org/ and the GitHub repository for materials is https://github.com/airavata-courses. We regularly revise the content, but generally we focus on things that are relevant to science gateways that will also get students’ attention: microservices, messaging systems, service coordination technologies, security topics, “DevOps” tools, open source (especially Apache-style) development practices, container technologies like Docker, and so on. Science gateways need all of this as well, so we use this as an organizational framework for putting a bunch of otherwise random technologies and topics into context. We also try to point out where all of this connects to more fundamental computer science topics in distributed systems.
What are the benefits of offering this course, and what problems does it address?
The primary benefit, first and foremost, is to workforce development. We have developed course material that can be used to groom a new generation of science gateway developers, teaching them a wide range of useful information.
Another benefit to science gateway developers and operators are that we have connected science gateway systems to more traditional concepts such as distributed systems that are taught, or should be taught, in graduate computer science programs. This makes it an easier sell to Computer Science departments, while also allowing us to bridge between the more academic topics (say, consensus algorithms), the more practical topics (like virtualization or the common practices for using computing clusters), and the orphaned topics (like REST or the OAuth2 protocol) that may not be taught anywhere. Since gateways touch so many areas, it is a good way to cover the practical things that people need to know and the more theoretical things that they ought to know.
We also cover open source methodologies and teach students how to be good open source citizens. This ranges from just getting everyone on to GitHub and learning how to work in the open to interacting on developer forums, submitting pull requests, and so on. We hope we are fostering within the students an understanding that open source software only thrives if there is a community, and they need to get involved in the projects they use and like.
As with most classes, the instructors probably get the most benefit. I (Marlon) use preparation for the class as a spur to keep myself up to date on tools and technologies and to also dive a bit deeper into distributed systems myself. My personal favorite lectures to prepare included one on the RAFT protocol (used for maintaining consensus about state in distributed systems and underlying the popular Consul software of HashiCorp) and one on the Byzantine Generals Problem (it’s RAFT plus security).
How can this course be implemented at other institutions, and what would be some benefits in doing so?
This is an open question. We would like for the course to be more widely offered, but for now, it is taught only at Indiana University. We have worked with Linda Hayden’s SGCI workforce development teaching materials for short courses, but the current course is really designed for a full semester, plus an additional semester of advanced topics.
We would really like to find a way to offer the full course to a broader audience. The issue is the time commitment and the return on investment. Again, a large reason for us to do this is to find students who are particularly good and motivated to work on our projects. This takes a lot of contact time, so remote instruction is not so useful.
We put all of our materials on our course website and its associated GitHub repository, so one possibility is to franchise the course, encouraging others to use our materials to teach their own local versions of the course. We would like to see, however, some better coordination than this.
We have found that teaching a course on science gateways, if positioned properly with the right content so that students see the benefit to their job searches, is a good way to reach a lot of students. It is essentially a recruiting device for our team. It also works as an initial gate or filter for finding the best students, who we then hire as hourlies, research assistants, and so on. This has proven to be a much better way than just generic student outreach, or relying on personal networks to find people to hire. We have been very pleased with the work by many of the students we have met this way. As of Spring 2018, after offering the course since Spring 2016, we’ve had seven students make it all the way to “committer” status in the Apache Airavata project after taking the course, and they have made contributions that we use every day.
What does it take to offer the course at Indiana University, and how is the course presented?
We offer an introductory version of the course in Indiana University’s Fall semester, followed by a by-invitation advanced course in the Spring semester. The courses are project-based, with grades based primarily on project work that the students do either individually or in teams. We sometimes also reserve points to encourage students to grade other teams, participate in mailing list discussions, and demonstrate good open source practices such as making pull requests to open source code bases.
Classes are lecture-based, in which we present a wide range of topics that are part of the science gateway ecosystem and which are relevant to the current assignments (although timing can be tough, as the students sometimes point out). Students also give presentations. In the introductory course, this is limited to mid-term and final project presentations, but in the advanced course, we flip it and let students do most of the presentations. This is because students in the advanced course are working on separate topics (rather than the same projects as in the introductory course), so the student-led presentations keep everyone informed about the different projects.