Open Questions about Library of Congress Archiving Twitter Streams

Posted Posted by michaelzimmer in Privacy, Social Media     Comments 5 Comments
Apr
14

(See update below referencing Twitter’s announcement; this post about how your private tweets might end up in the archive; and this post where more details about the agreement have been provided)

The Library of Congress tweeted today that they are acquiring the entire archive of public Twitter activity since March 2006. (The official blog post is down, but a copy is on the LOC’s Facebook page.)

Have you ever sent out a “tweet” on the popular Twitter social media service? Congratulations: Your 140 characters or less will now be housed in the Library of Congress.

That’s right. Every public tweet, ever, since Twitter’s inception in March 2006, will be archived digitally at the Library of Congress.

… We will also be putting out a press release later with even more details and quotes. Expect to see an emphasis on the scholarly and research implications of the acquisition. I’m no Ph.D., but it boggles my mind to think what we might be able to learn about ourselves and the world around us from this wealth of data. And I’m certain we’ll learn things that none of us now can possibly conceive.

This is big. Huge.

And while the LOC stresses that they’re doing this for historical and scholarly reasons, there are major implications regarding the privacy and contextual expectations of Twitter users. Now, suddenly, all their tweets are being archived by the world’s largest library. Yes, the tweets were always public and discoverable, but the searchability and accessibility will increase drastically if/when the LOC processes this archive.

Here are some immediate questions that need to be addressed:

  1. Will user profile information also be archived and made accessible? And historical changes to user profile information? If so, can users update the profile information that might be archived at LOC?
  2. Will lists of followers and who is followed be included? If so, how will the be updated?
  3. Will geo-locational data be included?
  4. Will the LOC allow automated scraping of the database (by search engine crawlers or other bots)?
  5. Will the LOC allow commercial use of the archive?
  6. Will the LOC process the archive in such a way to create categories of users or tweets? Essentially, are we going to see a Library of Congress Classification scheme for tweets?
  7. Currently users can delete tweets from Twitter, which (presumably in a reasonable time) are deleted from Twitters logs, and no longer discoverable. Will users have the ability to remove unwanted tweets from the LOC?  (I presume not)

I look forward to seeing more information as the day progresses.

UPDATE: Twitter has now posted its own announcement, which provides some further details:

It is our pleasure to donate access to the entire archive of public Tweets to the Library of Congress for preservation and research. It’s very exciting that tweets are becoming part of history. It should be noted that there are some specifics regarding this arrangement. Only after a six-month delay can the Tweets will be used for internal library use, for non-commercial research, public display by the library itself, and preservation.

Interestingly, they’re enforcing a 6 month delay before public Tweets are made available to the Library of Congress. What remains unclear is whether the LOC are given live feed streams, and must simply embargo them for 6 months, or whether Twitter is only providing the LOC the archives after 6 months have passed. (The latter would provide users more opportunity to delete tweets that they might want taken out of public circulation — see item 7 above).

This is also a bit odd since Google apparently is providing real-time access to all Tweets without any such delay. Why a commercial entity is allowed immediate access, while the Library of Congress must wait, remains a mystery.

Finally, while Twitter notes that the archive can be used only for non-commercial research (good), it remains unclear whether the restriction for “internal library use” is meant to mean that only the library can use the archive, or whether the “public display” provision also means a searchable database will be made available.

Time to request of a copy of the agreement.

Print Friendly