News & Events
Faculty, students, and industry professionals will discuss issues and research findings related to social media and data analytics.
As the largest database of encyclopedic knowledge in all of human history, Wikipedia has had a transformative effect on computer science research. However, it remains difficult for software developers in academia and industry to leverage Wikipedia as the knowledge resource computer science has shown it to be. WikiBrain is a software library that seeks to remedy this problem by allowing researchers and practitioners to easily incorporate Wikipedia-based intelligence into their systems and studies. WikiBrain not only provides simple access to the key structures of Wikipedia (e.g., link graph, text models), it also includes easy-to-use implementations of state-of-the-art artificial intelligence algorithms that have been developed using Wikipedia data.
Demand for online privacy is at an all-time high. Sensitive to this, many online venues offer users privacy control mechanisms that enable them to specify what pieces of their information are publicly visible, at the content- and activity-level. We explore the impact of these mechanisms on user behavior, acknowledging possible positive (e.g., comfort) and negative (e.g., privacy priming) impacts. Employing a randomized control trial at an online crowdfunding platform, we demonstrate these competing effects. Reducing access to privacy controls induces a net increase in fundraising, yet this outcome results from two competing effects – treatment increases willingness to transact (a 6% increase in probability of contribution) and simultaneously decreases the average contribution amount (~$3 or 5.5% decline). This decline derives from a publicity effect, wherein contributors respond to a lack of privacy by tempering extreme contributions. We discuss the implications of our findings for the design of online platforms.
Firms competing to get consumers to adopt new platforms have incentives to charge low prices to promote adoption, followed by higher prices later on. This study explores Amazon’s dynamic pricing strategy by comparing its contemporary pricing on e-books, a new product with complementary hardware, a proprietary file format, and switching costs, with its current pricing on physical books, a now-mature product without complementary hardware or switching costs. Using more than 150,000 hourly observations on prices and sales ranks for electronic and physical bestseller books in late 2012 and early 2013, in conjunction with actual quantity data, we estimate the price elasticities of demand for books at Amazon. Not surprisingly, e-books appear to be priced below the static profit maximizing levels. More surprisingly, we find that physical book prices also fall substantially short of the static profit maximizing level two decades after Amazon’s launch. These findings raise questions for both policymakers and shareholders.
Social media platforms such as Facebook and Twitter provide new venues for companies to engage consumers and build brand loyalty. By 2013, there were about 18 million business pages on Facebook, with 1 million being added each month. To leverage these new platforms, companies have been exploring new ways to curate content, stimulate word-of-mouth, and create customer engagement. Despite the sheer amount of traffic, engagement level on Facebook business pages is surprisingly low. According to one source, only 1% of Facebook fans engage with the brand through posts or comments. There is also very little research on the business impact of this new form of word-of-mouth. Our work includes research findings from two projects that examine (a) the challenge of building a vibrant brand community on Facebook, and (b) the impact of Facebook word-of-mouth on brand evaluation. The first study combines text mining and statistical modeling to analyze half a million posts from Facebook. The second study includes two lab experiments. Findings from both studies should provide practical guidance to help companies effectively engage customers and manage social media conversations.
Recommendation models use past user choices to infer their preferences, which form the basis for making future recommendations. Changing preferences is a significant challenge for these systems as they lack reliable predictors of changing preferences. In this work, we propose a novel model of satiation for familiar content. Our model identifies three latent preference states for items, called the Sensitization, the Boredom, and the Recurrence states. Dynamics in a user’s preferences for items are attributed to the dynamics in these item states. This allows us to generate better state-dependent recommendations for the users which are shown through a pilot study.
We have developed a context-based search engine for research datasets. Although search engines like Google and Bing are popular for general purpose search, these might not be the best options for searching datasets. We present DataGopher.org, a context-based search paradigm. This search engine is useful to search datasets when you are not aware of their name or location. As an input, it takes on the research topic (context) as keywords and retrieves relevant datasets.
Top-N recommender systems are designed to generate a ranked list of items that a user will find useful based on the user’s prior activity. These systems have become ubiquitous and are an essential tool for information filtering and (e-)commerce. We present our recent work that focuses on three important aspects of top-N recommender systems. First, effectively handling sparse datasets by learning item relationships in low dimensional space (FISM). Second, learning higher-order relationships between items and item-sets to improve recommendation quality (SLIM++). Third, learning feature-based similarity models to handle cold start problem (items with no preference history) in top-N recommender systems (FSM).
Many real-world processes evolve in cascades over networks, whose topologies are often unobservable and change over time. However, the so-termed adoption times when, for instance, blogs mention popular news items, are typically known, and are implicitly dependent on the underlying network. To infer the network topology, a dynamic structural equation model is adopted to capture the relationship between observed adoption times and the unknown edge weights. Assuming a slowly time-varying topology and leveraging the sparse connectivity inherent to social networks, edge weights are estimated by minimizing a sparsity-regularized exponentially-weighted least-squares criterion. To this end, a solver is developed by leveraging (pseudo) real-time sparsity-promoting proximal gradient iterations. Numerical tests with synthetic data and real cascades of online media demonstrate the effectiveness of the novel algorithm in unveiling sparse dynamically-evolving topologies, while accounting for external influences in the adoption times.
We report the development of a predictive model that can estimate ex ante the hazard of technological innovation failure under different product and market conditions. The central question that motivates this study is: Can user-level market feedback related to episodic device failures predict product-level innovation failures? The theoretical lenses we have identified to frame our research questions into testable hypotheses are signal detection and system neglect. The primary dataset we have used for this study is the Food and Drug Administration’s (FDA) “Manufacturer and User-facility Device Experience (MAUDE)” dataset. MAUDE represents “big data” generated through reports of adverse incidents involving the usage of medical devices. We have integrated the MAUDE dataset with several other datasets. The time period over which we collected data from all the data sources was 1998-2010.The significant contribution of this study is in demonstrating the application of predictive analytic methods to analyze a “big” dataset on market failures of medical devices to yield the following novel insights: First, it is possible to reliably predict innovation failure using field (i.e., market) failure data. Second, the precision of prediction can be improved by incorporating information on factors related to products, firms and industries in the model. Third, the existence of systematic judgment bias in detecting innovation failures is established; we identify conditions under which over-reaction bias and under-reaction bias are more likely to happen, and thereby improve the consistency of prediction process. Finally, we identify several fine-grained nonlinear variable relationships in the context of innovation failure. We believe that the findings of this study will provide high tech firms and decision makers an improved understanding of technological innovation failures, and, in turn, help improve decision making pertaining to key strategic issues such as product investments, globalization, supply chain and manufacturing process changes, and firm product scope and portfolio.
The proliferation of smartphones and other mobile devices has led to numerous companies developing mobile alternatives for consumers, in domains ranging from job search websites to online dating platforms. An estimate by Morgan Stanley says that based on the current rate of change and adoption, the mobile Internet will be bigger than the desktop Internet by 2015. While it has been widely documented that mobile users tend to differ from PC users in their observed behavior, such differences cannot necessarily be attributed to adoption of the mobile app because of endogeneity issues in drawing inferences from observing users who decide to adopt the mobile app. In this paper, we causally explore the changes in user behavior, in terms of ubiquity in use and social engagement, as well as matching outcomes due to adopting a mobile app in the online dating context. We do this by utilizing a novel propensity-score matching technique, where we match current adopters to similar future adopters who have not yet adopted the mobile app. We demonstrate that once users have adopted the mobile app, they become more ubiquitous in their use. They also become more socially engaged by viewing more profiles and sending more messages. Looking at outcomes, we find that female mobile app adopters are able to achieve more matches as compared to the control group of similar users who do not adopt a mobile app and become more efficient in achieving matches per each message sent, indicating higher returns on engagement for females. As mobile app adoption becomes widespread, understanding the causal impact on social engagement and outcomes has implications for both end users as well as businesses investing in app development.
The two largest U.S. wireless ISPs have recently moved towards usage-based pricing to better manage the growing demand on their networks. Yet usage-based pricing still requires ISPs to over-provision capacity for demand at peak times of the day. Time-dependent pricing (TDP) addresses this problem by considering when a user consumes data, in addition to how much is used. We present the architecture, implementation, and a user trial of an end-to-end TDP system called TUBE. TUBE creates a price-based feedback control loop between an ISP and its end users. On the ISP side, it computes TDP prices so as to balance the cost of congestion during peak periods with that of offering lower prices in less congested periods. On mobile devices, it provides a graphical user interface that allows users to respond to the offered prices either by themselves or using an autopilot mode. We conducted a pilot TUBE trial with 50 iPhone or iPad 3G data users, who were charged according to our TDP algorithms. Our results show that TDP benefits both operators and customers, flattening the temporal fluctuation of demand while allowing users to save money by choosing the time and volume of their usage.
Our work is an analysis of deviant behavior in a popular online game, League of Legends. Deviance in online systems is a difﬁcult problem to address, with consequences that include driving users away and tarnishing the system’s public image. We develop a metric to identify deviant players, and look at the effects of interacting with deviant players, including effects on retention. Based on our ﬁndings, we suggest methods to better identify and counteract the negative effects of deviance.
This research utilized individual level usage data from a mobile game app to study the effectiveness of influential users to promote new adoptions. Specifically, three types of influential users were examined: high engagement users, high connectivity users, and high mobility users. Regression analyses showed that all three types of users are associated with significantly more new adoptions. Temporary results were presented and discussed.
The recommender systems research community faces a problem where a significant number of research papers lack the rigor and evaluation to be properly judged and, therefore, have little to contribute to collective knowledge. To make recommender research results cumulative, research needs to be documented thoroughly, conducted on data made available to others, and follow best practices. We are developing a community website where members can propose, discuss, and adopt best practice guidelines; a checklist tool for algorithmic recommendation paper authors and reviewers; and research tools to simplify adhering to community guidelines.
Pinterest is a popular social curation site where people collect, organize, and share pictures of items. We studied a fundamental issue for such sites: what patterns of activity attract attention (audience and content reposting)? We organized our studies around two key factors: the extent to which users specialize in particular topics, and homophily among users. We also considered the existence of differences between female and male users. We found: (a) women and men differed in the types of content they collected and the degree to which they specialized; male Pinterest users were not particularly interested in stereotypically male topics; (b) sharing diverse types of content increases your following, but only up to a certain point; (c) homophily drives repinning: people repin content from other users who share their interests; homophily also affects following, but to a lesser extent. Our findings suggest strategies both for users (e.g., strategies to attract an audience) and maintainers (e.g., content recommendation methods) of social curation sites.
LensKit is an open-source toolkit for building, studying, and researching recommender systems. We have used it to support several recommender systems research projects, as well as the live MovieLens and BookLens recommender systems. It is intended to make it easier for recommender systems researchers to conduct their work in a robust and reproducible fashion.
We study user-generated content communities through synthetic and applied research to understand how these communities work and what technological solutions can be developed to assist community members on a daily basis.
Our study examines how reputation affects online employers' ability to solicit quality work force in an online labor market, Amazon Mechanical Turk (AMT). The study is designed as a randomized field experiment that creates our own employers and exogenously manipulates their rankings in order to assess the impact of these ratings on the quality and the speed of the work done by the real workers. We hypothesize that good reputation employers will be able to solicit work of higher quantity and quality since workers screen employers.
Our past research in the field of community question-and-answer domain has been primary done with data provided by other systems. Access to our own site, GopherAnswers, opens up the ability to monitor factors normally hidden in public sites. Running control experiments while building deep logging behavior of users on a live site allows us to answer questions such as extrinsic vs intrinsic motivation, algorithmic-based question recommendation vs. user-centered question referral systems, and use of explicit user behavior to overcome system/user/item cold start problems.