Science Node article "One-stop data shop" highlights the work of Bootcamp alumni

We recently reported that Carol Song of Purdue University and four co-PIs, including Jack Smith of Marshall University, were awarded a five-year, $4.5 million grant from the National Science Foundation. Both Song and Smith participated in SGCI’s Science Gateways Bootcamps while working on separate gateway projects.

The grant will allow the team to build a “plug and play” platform, called GeoEDF, that will give researchers the ability to easily access and process geospatial data.

The article below, which was published by Science Node on January 14, 2019, provides more detail about the important work being done by the GeoEDF team. 

One-stop data shop

Speed read
  • Geospatial data includes everything from traditional maps to geotagged tweets
  • This diversity makes it hard to share and compare datasets
  • GeoEDF gateway incorporates diverse data and easy-to-use tools in a single platform

What if a farmer could scan a soybean leaf and discover that her crop isn’t flourishing because the soil needs more potassium? 

 <strong>Raw data.</strong> An agricultural research technician collects weekly water samples from a creek to determine the effects of farming practices on water quality. A new platform, GeoEDF, may help officials combine this information with other data to make better decisions. Courtesy USDA.What about a scientist in Appalachia accessing sensor data from a mountain stream that indicates it’s contaminated with coal ash—and alerting local householders before they get sick?

That—and other feats—is what a new NSF-funded project hopes to do. Led by computer scientist Carol Song of Purdue University, the GeoEDF platform brings together many different types of geospatial data. This one-stop-shop will help scientists expand their research and public officials fine-tune policies.

“We want to help with decision making,” says Song. “To let people compare the data and be able to say, ‘If we do this, this might happen. If we do that, that might happen.’”

Putting it on the map

Geospatial data is any information that has a geographic aspect. This includes obvious examples like maps that show roads, rivers, and boundaries, but also measurements from sensors (such as seismometers or water quality meters) that include a geographic location. 

 <strong>Geospatial riches. </strong>This directory of geospatial data for Montana includes information about elevation, soils,wetlands, climate, population, transportation, wildlife, land use, and much more. Courtesy USDA. Government agencies like NOAA, NASA, and the USGS already collect valuable information such as satellite remote sensing, land elevation, census, agricultural, economic, and other data. Citizens contribute data too, through citizen science platforms or geotagged tweets.

Such a wealth of information could help scientists solve a lot of problems. But because there are so many different types of data, it can be hard to collect, compare, and share it all.

“We're trying to help solve new problems,” says Song. “Right now, people working with these very diverse data sets have an army of students that collect relevant data from different repositories and then massage it into something actually usable. It's a long process—sometimes six months just wrangling the data. We want to speed that up.” 

One necessary component of reaching that desired speed is computers that can handle demands for all that data.

GeoEDF is an expansion of the existing MyGeoHub science gateway which draws on local computing resources at Purdue, but is also connected to supercomputers like Comet and Stampede through XSEDE (Extreme Science and Engineering Discovery Environment).

As the project scales up with the new GeoEDF platform, the developers will need to rely even more on XSEDE.  “We are using Jetstream to do our development and test our hypotheses,” Song says. “To find out what we need and how we need it to work. 

One GeoEDF project will aggregate data from multispectral handheld scanners that diagnose plant health and provide county- and state-wide estimates of expected crop performance. Courtesy Purdue University.

Song’s team also depends on the open-source HUBZero platform, which supplies the gateway’s infrastructure—from hosting the website to providing user-oriented elements such as discussion forums and collaborative features for sharing tools and datasets.

But ultimately, science is what this project’s all about. Even in her early work as a graduate student at the National Center for Supercomputing Applications (NCSA), Song was drawn to working with applications that could bring computation to domain scientists.

“Although I’m not an expert in any of the domains I’m working with, I’ve learned enough to appreciate their problems and to see how what we do can directly impact them,” says Song. “That’s where I get my sense of accomplishment and satisfaction.”

From data to science

Song provided a sneak peek at what some of those science partners are up to:

  • Agricultural economist Uris Baldos will integrate socio-economic and environmental data to examine long-term food sustainability issues and the consequences of changing land use.
  • Jack Smith, of Marshall University, is converting data from field sensors into a standard EPA format. A resulting water quality data portal supporting the Appalachian Freshwater Initiative will focus on understanding and detecting the ecological and biological effects of contaminants in water under varying climate change scenarios.
  • Civil engineer Venkatesh Merwade will use GeoEDF to examine increased flooding risks in response to climate change, deforestation, and urban development. The advanced hydrologic model will provide real-time, street-level visualization down to the smallest streams and streets in both rural and urban areas.
  • Farmers will appreciate Jian Jin’s multispectral handheld crop scanner that diagnoses plant health and provides stress and disease early warnings, and watering and fertilizing advice. GeoEDF will expedite the upload and storage of data from these smartphone-connected sensors to provide county and statewide estimates of expected crop yield performance.

Data alone can’t solve our problems. But we live in a world in which information is streaming data from every corner—from every back alley, mountain stream, and soybean field. It’s up to us to take advantage of the available data and transform it into safer homes, drier basements, stronger crops, and clean, drinkable water.

Read more:

This article was originally published on Read the original article.

Updates from around the world

We welcome your suggestions. Have news you'd like to share? Please email the details with the subject line "News Submission" to