Skip to:

News & Events

News & Events

Challenges in Mining Social Media Data


Challenges in Mining Social Media Data

Researchers must overcome a number of hurdles in order to find insight in social media data.

Social media and Big Data have radically changed how people communicate, interact, work, conduct business, and more. The buzz around these phenomena continues to grow as people and organizations begin to tap into the potential of a socially connected, data-rich world. The excitement is hard to overstate, and there are many examples to illustrate the insights and benefits that can be gained by examining social media data. (See our research page for one source of examples.) However, some very real hurdles stand between the unending supply of data and those who would like to mine and use it.

Dr. Huan Lui is one of the scholars tackling the challenges inherent in mining social media data for meaning, insight, and value. Lui is a professor and researcher at Arizona State University. Last week he visited the University of Minnesota as part of the speaker series hosted by the Social Media and Business Analytics Collaborative. During his time on campus, Lui articulated some of the problems that come with mining social media data, and shared solutions he and his colleagues have devised to address those problems.  

Real Challenges

Lui discussed several concrete challenges researchers encounter when they mine social media data for answers to business or other questions:

  • Evaluation dilemma:Traditional data mining often segregates a portion of the dataset for testing. This provides a means to develop and evaluate models against some kind of ground truth. But with social media data, traditional test data may not be viable, forcing researchers to consider how they will evaluate their claims in the absence of an identified ground truth.
  • Sampling bias. Because Big Data is so big, APIs and scraping data often return only a small sample of the whole. A relatively small sample size can be biased, threatening the credibility of research results derived from the sample.  
  • Noise-­removal fallacy: Posts about what people eat for breakfast are just one source of noise in social media data. But removing the noise can render data from Twitter and other sources useless, according to Lui. The inherently linked nature of social media data further complicates the task and requires researchers to approach noise-removal differently than they would with attribute-value data.
  • Studying distrust in social media: Trust is a critical human construct that plays a role in many decisions and actions. Says Lui, "Distrust may play an equally important, if not more critical role in consumer decisions." Challenges arise from a lack of computational understanding of distrust with social media data, as well as the absence of information social scientists use to study distrust.
  • Deception detection: Information intended to deceive can spread though social media the same as valid information. This raises questions of how to detect different types of deception (e.g., manipulating information, changing context, or outright fabrication) in different social channels and formats (e.g., text, link, audio, photo, video, multimedia).

Resources for Mining Social Media Data

To help students, researchers, and organizations grapple with these and other challenges in mining social media data, Lui offered the following resources, available for free download in whole or in part:

Social Media Mining book cover


Social Media Mining: An Introduction

By Reza Zafarani, Mohammad Ali Abbasi, Huan Liu

Social Media Mining integrates social media, social network analysis, and data mining to provide a convenient and coherent platform for students, practitioners, researchers, and project managers to understand the basics and potentials of social media mining. (Cambridge University Press, 2014)



Twitter Data Analytics book cover



Twitter Data Analytics

By Shamanth Kumar, Fred Morstatter, Huan Liu

A guide to harnessing data through the Twitter API. Examples show real-world applications in the context of various intriguing questions. (Springer, 2013.)



Provenance in Social Media book cover




Provenance in Social Media

By Geoffrey Barbier,Zhuo Feng,Pritam Gundecha,Huan Liu

Provenance data associated with a social media statement can help dispel rumors, clarify opinions, and confirm facts.​ (Morgan & Claypool, 2013)


Trust in Social Media book cover


Trust in Social Media

By Jiliang Tang and Huan Lui

The study and understanding of trust can lead to an effective approach to addressing both information overload and credibility problems. (WWW2014 Tutorial)




Dr. Huan Liu is a professor of Computer Science and Engineering at Arizona State University. He obtained his PhD in Computer Science at University of Southern California and B.Eng. in EECS at Shanghai JiaoTong University. He was recognized for excellence in teaching and research in Computer Science and Engineering at Arizona State University. His research interests are in data mining, machine learning, social computing, and artificial intelligence, investigating problems that arise in real-world applications with high-dimensional data of disparate forms. His well-cited publications include books, book chapters, encyclopedia entries as well as conference and journal papers. He serves on journal editorial/advisory boards and numerous conference program committees. He is a Fellow of IEEE and a member of several professional societies.

Related Media

From April 22, 2014. Speaker Huan Lui, Arizona State University, lectures at the University of Minnesota: