Loyola Digital Ethics presentation: “The Ethics of Twitter Research: A Topology of Disciplines, Methods and Ethics Review Boards”

Today I have the great privilege of presenting the preliminary results of a research project exploring the ethics of Twitter-based research, co-authored with Nick Proferes, at the second annual International Symposium on Digital Ethics, hosted by the Center for Digital Ethics & Policy at Loyola University Chicago.

The abstract and slides are available below. Look for the full paper soon.

The Ethics of Twitter Research: A Topology of Disciplines, Methods and Ethics Review Boards

In the five years since its launch, the social networking and microblogging service Twitter has quickly grown to over 300 million users, generating over 300 millions tweets each day. By providing a simple platform for users to explain “what’s happening” in 140 characters or less, Twitter has become the Internet’s de facto public forum to sharing “pretty much anything [users] wanted, be it information, relationships, entertainment, citizen journalism, and beyond” (Dybwad, 2009). This sharing of information, relationships, entertainment, journalism, and beyond has made Twitter a cultural phenomenon.

Beyond the utility Twitter provides its millions of users, it also has emerged as a valuable resource for tapping into the zeitgeist of the Internet and its users. There is cultural and historical value in the information that flows across Twitter’s servers, notes Dylan Casey, a Google product manager: “Tweets and other short-form updates create a history of commentary that can provide valuable insights into what’s happened and how people have reacted” (Singel, 2010). Researchers have been quick to recognize the value in studying Twitter users and activities to gain a better understand of its users, uses, and impacts on society and culture from a variety of perspectives (see, for example, The Library of Congress recognized this importance of Twitter when it announced in 2010 that, “Every public tweet, ever, since Twitter’s inception in March 2006, will be archived digitally at the Library of Congress” (Raymond, 2010, ¶2).

The Library of Congress’s announcement clearly validates the research importance of Twitter, but it also prompted concerns about creating a permanent archive of tweets, and whether such a proposal was properly aligned with users’ understanding of how the platform worked and their privacy expectations. Even in broadcasting the news, the language Wired Magazine chose underscored the apparent transition from a fleeting existence for tweets to a newly instilled sense of permanence when it stated, “While the short form musings of a generation chronicled by Twitter might seem ephemeral, the Library of Congress wants to save them for posterity” (Singel, 2010).

In the wake of the Library of Congress announcement, increased debates over the appropriateness of archiving public Tweets for research purposes have arisen (see, for example, Vieweg, 2010; Zimmer, 2010). Particularly relevant are numerous questions regarding how academic research on Twitter has proceeded thus far, such as: What disciplines are engaging in Twitter research and what amount of scrutiny of research ethics is typical within these fields? What research questions are being investigated, what data is being gathered, and how? Are subjects notified or given the opportunity to opt-out of being studied? How are research ethics boards evaluating such projects?

The goal of this paper is to seek initial answers to these questions by surveying academic research that relies on the collection and use of Twitter data. The body of research articles to be surveyed includes over 200 scholarly articles, dissertations and theses from disciplines ranging from communications, political science, health sciences, economics and computer science, among others. In building this corpus, this project will create a topology of disciplinary approaches to research around Twitter, methods used to collect and analyze Twitter data, and accounts of research ethics boards’ oversight of these projects. Through this analysis, we will gain an insight into the current state of research on Twitter, providing a better understanding of the methodological and ethical challenges before us.

How do you address the gap between public and private sector research? The private sector has been vacuuming up as much data as it can to drive market research and other things. The (beneficent) public sector tip-toes around alarmist ethicists who concoct any possible way harm can come to the users.

Instead of stifling research, perhaps we need to educate the public on the permanent nature of electronic communications. If someone is intent on prowling the internet for other peoples’ sensitive information, and other people are flippant about how they use it (i.e. posting about illegal behaviors publicly), that poses a problem for the way people use the services. It does not pose a problem for those that wish to study communication on those services.

Most of all, if the research consists of doing broad analyses (such as RAND’s LIWC analysis on twitter) that present no raw electronic content, why should they worry about the sensitive nature of their data-set?

If your objection is that such a data-set cannot be anonymized, I agree completely. The standard of sharing data-sets from publicly funded research does not account for the nature of electronic communications and should not apply to it. The kicker here is that such information is already available to all. We can’t coddle to the most irresponsible users of public internet venues.

If you’re so intent on protecting the internet-using public, perhaps your efforts would have more impact if you spent them educating the public about the nature (the NATURE) of the internet, which won’t change no matter how much you talk about ethics. Maybe then researchers can get back to benevolently analyzing the unprecedentedly immense and rich data set that has been so easily made available to them.


Thanks for this comment, Future.

I’m very familiar with the private sector’s ability to acquire similar data for their own purposes, and I’ve been involved in efforts to better understand, manage, and mitigate the privacy concerns related to commercial surveillance for quite some time.

That said, the common argument that “but companies are doing this, so why can’t researchers?” isn’t valid since, as scholars, we should certainly be willing to hold ourselves up to a higher standard than mere profit-seeking.

If you are including me in your concern over “alarmist ethicists to concoct any possible way harm can come to the users”, then you’re not understanding my position. My writing and presentations on issues of Internet research ethics always stress the need for balancing harm vs. benefit, as well as the need to enable research to take place. Part of my role as a scholar and advocate is to illuminate the conceptual gaps that I see occurring in how we understand and approach these important ethical issues, and then to work collaboratively to fill these gaps.

Certainly, your suggestion that greater user education and digital literacy is correct. Users definitely need to gain a better understanding of the tools they’re using and how their data flows (both visibly and invisibly). But that responsibility does not end with the user. The platforms themselves must do a better job explaining how the technology works, and provide meaningful opportunities to consent or adjust settings. And researchers have a duty to ensure consent is informed, and harm is reasonably avoided.

It sounds like you might be a researcher or similarly involved in projects that are impacted by these concerns. Please feel free to contact me privately so we can continue to discuss these important issues.

