By Marcus Christie
Explanation of the Airavata tool, Keycloak:
Apache Airavata (an SGCI Affiliate, check the Gateway Catalog to learn more) is middleware that provides abstractions for managing and utilizing computational resources, with an emphasis on running high-performance scientific applications. Airavata can thus be used to build a science gateway that provides higher-level user interfaces for users to configure, execute, monitor and analyze the outputs of a scientific application. Here are some of the major components in Airavata:
Registry: a metadata repository where gateway developers can register computational resources and data storage resources. The registry also stores a list of applications, the interface for each application (i.e., what inputs are required, what outputs are produced) and on which computational resources each application is deployed.
Execution management: Manages all aspects of running a scientific application on a computational resource - staging input data on the resource, submitting an execution of the application in the resource’s scheduler, monitoring the application execution, and staging out the data generated by the application.
API server - Airavata exposes these capabilities via an RPC (remote procedure call) API. We use a tool called Apache Thrift to describe the RPC interface and Thrift allows us to easily generate API clients in several popular programming languages, making it simple to integrate Airavata into new or existing science gateway applications.
PGA web application - PGA is a web application that is our showcase API client that demonstrates the capabilities of Airavata. PGA can be used as is, or customized, or used simply as inspiration for building a new gateway application from scratch.
Keycloak is an identity and access management (IAM) server. It handles authentication and authorization of users of an application. Keycloak is an open source project and can be utilized in a number of different ways. For Airavata we use Keycloak to handle
User authentication to PGA
API access authorization in the API server
The way it works is that a user logs into PGA via Keycloak (this is generally transparent to the user; they don’t ever “see” Keycloak). Once logged in, Keycloak generates an access token which PGA will then manage on behalf of the logged in user. When the user initiates an action in PGA that requires calling the Airavata API, this access token is sent along in the request and the API server uses the access token to verify the user’s identity and the user’s authorization to use that API method. See the following figure:
The main impetus for using Keycloak in Airavata is that it integrates very well with other identity providers like CILogon. CILogon is an InCommon project that federates authentication to several different universities and other institutions. By integrating with CILogon we are able to allow users to use their own institutional login credentials to log into PGA. This is important for “campus gateways” which are like science gateways but are more geared toward providing a gateway to a campus’ computational resources to members of that institution.
What are the benefits to science gateway developers? What problems does it address?
For a science gateway developer, Keycloak manages user accounts and their passwords so that this functionality doesn’t need to be implemented in the science gateway application itself. For example, securely storing user password hashes is a tricky problem to get right; it’s nice to be able to lean on a popular third party software like Keycloak to get it right. Keycloak makes it easier to add new identity providers through some configuration in its admin dashboard. For example, if a science gateway wants to provide the option of allowing their users to log in with their Google accounts, that option can be added by a science gateway developer simply by filling out a form in the Keycloak admin dashboard.
How can/has it been implemented by SGCI staff?
We’ve integrated Keycloak as the Identity and Access Management (IAM) service for Apache Airavata. Airavata is used in several science gateways that are supported by the work of SGCI staff.
On the backend we implemented our TenantManagementInterface for Keycloak. This uses the Keycloak REST API to automate creating Keycloak realms (which map to “tenants” in Airavata) and to create and manage users.
On the frontend we implemented a Laravel controller to consume the authorization code in the OpenID Connect flow after Keycloak redirects back to the portal. This code demonstrates how to exchange the authorization code for an access token and use that token to get the user’s profile from the identity provider (things like first name, last name, email, etc.).
What would make someone choose this solution over another?
There are a few criteria that we used that caused us to pick Keycloak over some of the other options:
Keycloak uses standard protocols like OpenID Connect and OAuth 2.0. This means that we aren’t too tightly bound to Keycloak should we decide to migrate away from it.
Keycloak has a very thorough REST API. In fact, their entire admin dashboard application uses the REST API so everything you can do in the admin dashboard you can do in the API. This is important for us because Apache Airavata is “multi-tenanted” (which means Airavata runs several different gateways at once on a single installation) and Keycloak’s REST API allows us to automate both the setup of each gateway in Keycloak and the ongoing administration of the gateway’s user accounts.
Keycloak is multi-tenanted like Airavata
Keycloak is open source and licensed under the Apache License 2.0. As an Apache project, it’s necessary that third-party software that Airavata uses have a license that is compatible with the Apache License. Since Keycloak uses the same license, it’s trivially compatible!
Keycloak has a very active and diverse development community that still manages to make frequent releases, as can be seen from their GitHub statistics.
Steps of implementation:
Downloaded and installed Keycloak on one of our servers. Created a MariaDB database for it to use.
For each science gateway that we host we create a “realm” in Keycloak. Keycloak is multi-tenanted and realm is the term Keycloak uses for a tenant. Each realm is completely separate from other realms and can have their own configuration and their own user store.
We then wrote code to integrate PGA with Keycloak for doing log in and log out, as well as for getting user information like first name, last name, and email address. This integration involves using the OpenID Connect protocol for which libraries exist that can make integration fairly simple.
We also wrote code on the backend, in the Airavata middleware, to use the Keycloak REST API to do things like create users, enable users, create new realms (which we do when creating a new gateway), get a list of roles for each user, search for users by username, etc.
We had users in another Identity and Access Management service called WSO2 Identity Server so we also wrote some code to export users from WSO2 Identity Server and import them into Keycloak. For a new gateway this won’t be an issue but for a gateway with an existing user base, you will likely need a way to migrate users into Keycloak. Since we weren’t able to export the users’ passwords (the passwords aren’t stored in WSO2 Identity Server) what we did is require each user to do a password reset.
Additionally, to use CILogon, for each gateway we registered Keycloak as a CILogon client. CILogon returns a client ID and secret that can be entered in the Keycloak admin dashboard to add a new Identity Provider.