Science Node article "One-stop data shop" highlights the work of Bootcamp alumni
We recently reported that Carol Song of Purdue University and four co-PIs, including Jack Smith of Marshall University, were awarded a five-year, $4.5 million grant from the National Science Foundation. Both Song and Smith participated in SGCI’s Science Gateways Bootcamps while working on separate gateway projects.
The grant will allow the team to build a “plug and play” platform, called GeoEDF, that will give researchers the ability to easily access and process geospatial data.
The article below, which was published by Science Node on January 14, 2019, provides more detail about the important work being done by the GeoEDF team.
One-stop data shop
- Geospatial data includes everything from traditional maps to geotagged tweets
- This diversity makes it hard to share and compare datasets
- GeoEDF gateway incorporates diverse data and easy-to-use tools in a single platform
What if a farmer could scan a soybean leaf and discover that her crop isn’t flourishing because the soil needs more potassium?
What about a scientist in Appalachia accessing sensor data from a mountain stream that indicates it’s contaminated with coal ash—and alerting local householders before they get sick?
That—and other feats—is what a new NSF-funded project hopes to do. Led by computer scientist Carol Song of Purdue University, the GeoEDF platform brings together many different types of geospatial data. This one-stop-shop will help scientists expand their research and public officials fine-tune policies.
“We want to help with decision making,” says Song. “To let people compare the data and be able to say, ‘If we do this, this might happen. If we do that, that might happen.’”
Putting it on the map
Geospatial data is any information that has a geographic aspect. This includes obvious examples like maps that show roads, rivers, and boundaries, but also measurements from sensors (such as seismometers or water quality meters) that include a geographic location.
Government agencies like NOAA, NASA, and the USGS already collect valuable information such as satellite remote sensing, land elevation, census, agricultural, economic, and other data. Citizens contribute data too, through citizen science platforms or geotagged tweets.
Such a wealth of information could help scientists solve a lot of problems. But because there are so many different types of data, it can be hard to collect, compare, and share it all.
“We're trying to help solve new problems,” says Song. “Right now, people working with these very diverse data sets have an army of students that collect relevant data from different repositories and then massage it into something actually usable. It's a long process—sometimes six months just wrangling the data. We want to speed that up.”
One necessary component of reaching that desired speed is computers that can handle demands for all that data.
GeoEDF is an expansion of the existing MyGeoHub science gateway which draws on local computing resources at Purdue, but is also connected to supercomputers like Comet and Stampede through XSEDE (Extreme Science and Engineering Discovery Environment).
As the project scales up with the new GeoEDF platform, the developers will need to rely even more on XSEDE. “We are using Jetstream to do our development and test our hypotheses,” Song says. “To find out what we need and how we need it to work.
Song’s team also depends on the open-source HUBZero platform, which supplies the gateway’s infrastructure—from hosting the website to providing user-oriented elements such as discussion forums and collaborative features for sharing tools and datasets.
But ultimately, science is what this project’s all about. Even in her early work as a graduate student at the National Center for Supercomputing Applications (NCSA), Song was drawn to working with applications that could bring computation to domain scientists.
“Although I’m not an expert in any of the domains I’m working with, I’ve learned enough to appreciate their problems and to see how what we do can directly impact them,” says Song. “That’s where I get my sense of accomplishment and satisfaction.”
From data to science
Song provided a sneak peek at what some of those science partners are up to:
- Agricultural economist Uris Baldos will integrate socio-economic and environmental data to examine long-term food sustainability issues and the consequences of changing land use.
- Jack Smith, of Marshall University, is converting data from field sensors into a standard EPA format. A resulting water quality data portal supporting the Appalachian Freshwater Initiative will focus on understanding and detecting the ecological and biological effects of contaminants in water under varying climate change scenarios.
- Civil engineer Venkatesh Merwade will use GeoEDF to examine increased flooding risks in response to climate change, deforestation, and urban development. The advanced hydrologic model will provide real-time, street-level visualization down to the smallest streams and streets in both rural and urban areas.
- Farmers will appreciate Jian Jin’s multispectral handheld crop scanner that diagnoses plant health and provides stress and disease early warnings, and watering and fertilizing advice. GeoEDF will expedite the upload and storage of data from these smartphone-connected sensors to provide county and statewide estimates of expected crop yield performance.
Data alone can’t solve our problems. But we live in a world in which information is streaming data from every corner—from every back alley, mountain stream, and soybean field. It’s up to us to take advantage of the available data and transform it into safer homes, drier basements, stronger crops, and clean, drinkable water.