In Science Node article "Staying ahead of the data tsunami," Joe Yun says Macroscope project wouldn't exist if it weren't for the Focus Week experience
Staying ahead of the data tsunami
The Social Media Macroscope helps researchers go with the big data flow
- Social media platforms create a lot of data that scientists can use
- Accessing this information often requires advanced computer science knowledge
- Social Media Macroscope gateway allows researchers outside of STEM to navigate social media data
Scientists spend years studying to become experts in their field—but every once in a while a tidal wave comes along and upsets everything they think they know. For many right now, that tsunami is big data.
“In academia, there’s so much pressure for everyone to present themselves as really smart in everything,” says Joseph Yun. “But this whole computational data science wave has made a lot of researchers feel like their field is being totally turned upside down and they don’t know anything anymore.”
Helping them stay on top of that wave is what Yun had in mind when creating the Social Media Macroscope. A research assistant professor at the University of Illinois Urbana-Champagne (U of I), Yun has a history with both computer science and social psychology. He believes this helps him understand the struggle many social science researchers go through.
“The typical example with the social science researcher is that their primary background is not computer science,” says Yun. “And yet, they want to use social media data to answer a research question, such as how do conversations on Twitter affect people's psychologies towards their mood and emotion?”
Yun hopes the Social Media Macroscope will help bring resources to scientists who otherwise wouldn’t be able to access them.
Making research easier
“The Social Media Macroscope is a science gateway to make analyzing social media data accessible for those without computational skills,” says Yun. “To collect data, you need to understand how to program against the APIs. Once you pull the data, then you need to understand how to build a machine-learning model to correlate with psychological data. But the reality is, a lot of that work has already been done.”
For instance, the Social Media Macroscope includes a tool called Social Media Intelligence & Learning Environment (SMILE.) This resource helps scientists collect and analyze data through text-preprocessing, phrase mining, text classification, and more. SMILE currently focuses on Reddit and Twitter, but more social media platforms will be included in the future.
The Macroscope is working so well that it’s already seeing usage all over the globe.
“It’s being used at about 90 different research institutions that I know of,” says Yun. “I know one person is using it to study conversations about Zika virus in Columbia and trying to geolocate tweets and then analyze bodies of conversation. It's also been used quite a bit in classroom teaching.”
Moving beyond higher ed, Yun confesses that he would like to see young children using the Macroscope to advance their education. Because after all, social media data is inherently interesting to them, because many of them are producing the content.
“Data science concepts aren’t that complicated,” Yun says. “I've always felt like the Social Media Macroscope could be a platform where even elementary school students could be exposed to data science in a way that they're interested in.”
The secret to his success
Despite his belief in the Macroscope’s potential, Yun found that building the platform from scratch and making it available on a large scale stretched beyond his own skills.
“I didn't know what a science gateway was. But then I went to Gateway Focus Week and they started talking about gateways and I recognized my own project. I was like ‘That's what I'm building right now!’” says Yun. “That's when I started to believe that maybe this could actually be more than just a little tool that I put out there that helps a few researchers on my own campus.”
In fact, Yun credits SGCI with the Macroscope’s entire existence. “I don’t know if any other gateway has been impacted as dramatically as mine,” he says. “Because—I really mean this—the Macroscope wouldn’t exist if it weren’t for that experience.”
But once he started to hammer out the details, Yun realized he’d also need some help on the back end. Luckily that was available from the National Center for Supercomputing Applications (NCSA) at U of I.
“The Macroscope has gone way beyond my capabilities as a computer scientist,” says Yun. “NCSA really knows how to build production software, and they’ve made all of the code fully open-source. My job now is just driving the direction and the vision of where it’s going.”
The next step for Yun will be exploring ways to tag social media data with security information that will allow researchers to more easily comply with data privacy regulations.
“When I talk to researchers, they say the Macroscope has given them hope for their future research because now they can go into the computational realm without having to try to figure out how to do a whole other degree in computer science,” says Yun. “That’s been really rewarding for me.”