Online Musings of a Public Historian

Archive for February, 2014

The Legalities of Culturomics

In previous posts, we’ve discussed issues of fair use and taken a look at the Google Books corpus and the new trend in culturomic analysis. Now, let’s do a mash-up of the two as we examine the legality of the TIME Magazine Corpus of American English:

Entry Page for the TIME Magazine Corpus

Entry Page for the TIME Magazine Corpus

Working with Mark Davies, a corpus linguistics professor at Brigham Young University, TIME Magazine has put together its own text database through which users can:

…quickly and easily search more than 100 million words of text of American English from 1923 to the present, as found in TIME Magazine.  You can see how words, phrases and grammatical constructions and see how words have changed meaning over time.

Compared with the size and scope of the Google Books corpus project (5.2 million digitized books spanning a period of several hundred years), the 100 million words and eighty year timespan (1923-2006) featured in the TIME corpus appears positively miniscule.  This small scale is not necessarily a setback, however, particularly when it comes to matters of copyright and issues of fair use.

Part of the reason for the smaller scope of the TIME corpus, for example, is due to the fact that all of its featured data is culled from the TIME Magazine archives. As such, all of the data within the corpus is also owned by the entity maintaining it. This allows TIME to share such data without worry of violating copyright ownership, and to additionally provide users of the corpus the opportunity to read the highlighted text in its original context, offering access to full articles as they were initially published.

With ownership of all of its content, along with the resultant ability to offer further access to previous publications, the TIME corpus functions safely within the parameters of fair use. This is beneficial for corpus operators and potential users alike, allowing for the corpus to provide a meaningful research experience for all involved.

Quantifying Culture? Culturomics and the Google Books Corpus

Can tracing linguistic changes over time reflect shifts in cultural trends?

According to Jean-Baptiste Michel and the other minds behind the culturomic analysis movement, the answer is a resounding “yes.”

Working with the team responsible for the Google Books online collection, Michel and his fellow researchers constructed a corpus of almost 5.2 million digitized books.  Using this Google Books corpus, the team of scholars conducted a quantitative study analyzing the relationship between shifting linguistic and cultural changes over the period between 1800 and 2000.  Referring to this quantitative approach to measuring cultural trends as “culturomics,” Michel and company used their findings to produce the Google Ngram Viewer, an online tool of research through which everyday users can conduct their own studies within the Google Books dataset.  Users are instructed to simply enter a word or phrase (called an “ngram”) into the Viewer’s search bar, resulting in the creation of a line graph data chart chronicling that particular ngram’s level of usage within the corpus throughout the two-hundred year timeframe the study samples.

Sample Ngram Viewer study tracing the name "Abraham Lincoln"

Sample Ngram Viewer study tracing the name “Abraham Lincoln”

As seen in the sample Ngram Viewer study above, users can additionally use the data provided in the line graph to link particular peaks in ngram usage to significant historical events and/or cultural movements.  Using the name of one our nation’s more well-renowned leaders, “Abraham Lincoln,”  as an example, we can see that the initial spike in usage of his name in published works falls (predictably) within the period of his election, presidency, and duration of the Civil War.  Subsequent spikes occur in the years following World War I – a period of intense nationalism, during which Lincoln and other figures came to looked upon national heroes – and during the time surrounding the Civil Rights Movement, when associations with the Emancipation Proclamation and the ending of slavery in the U.S. were strongly linked to the mid-20th century struggle for racial equality.

While the arguments laid out by Michel and company highlighting the benefits of using the qualitative methods associated with culturomics in gaining fuller insight into traditionally humanist topics certainly make a strong  point, it is clear that the field still has a long way to go.  Glancing at the Culturomics FAQ page set up by the project’s participants,  one can see that there are still several kinks to be worked out with this particular method of study, particularly in relation to the quality of data.

Despite the undoubtedly large size of the Google Books corpus, the 5.2 million digitized works still only make up for around four percent of all published materials.  Similarly, the study focuses primarily on the years between 1800 and 2000 (despite the presence of materials dating as far back to the 16th century), since the data originating in those periods has proven most reliable.

Do these constraints undermine the quality of data produced by the Ngram Viewer? What can be done to widen these parameters? What should we as historians bear in mind while using the culturomic approach in our own work?

Final Project Proposal: Resurrecting Resurrection City

1968 was a big year in American history.  In April, the assassination of civil rights leader, Martin Luther King, Jr., rocked the country. Tragedy struck again in June, with another assailant taking the life of popular politician, Robert F. Kennedy.  That year, the war in Vietnam experienced several major developments, particularly in the form of the infamous Tet Offensive. With so many substantial events occurring within the space of that one year, it is not surprising that other issues of significance run the risk of becoming eclipsed in the broader scheme of the national historical narrative.

One such instance is that of the Poor People’s Campaign, the brainchild of Martin Luther King, Jr.,  intended as a “multiracial coalition of poor people who would confront Congress and the White House in…a nonviolent insurrection in the nation’s Capitol” (Terry Messman, “The Poor People’s Campaign: Nonviolent Insurrection for Economic Justice,” Race, Poverty and the Environment 14, no. 1, Spring 2007: 31).  Working with the Southern Christian Leadership Conference, King envisioned the Poor People’s Campaign as a second March on Washington of sorts, culminating with the construction of several “shantytowns near the White House, to make poverty visible” (Messman, 31).

An unexpected wrench was thrown into preparations for the Poor People’s Campaign march, however, upon King’s assassination on April 4th, 1968. Following the appointment of Ralph Abernathy as leader of the SCLC in the aftermath of King’s death,  plans for the Poor People’s Campaign and its occupation of Washington resumed at a heightened pace.  The first protestors arrived in Washington a little over a month following the King assassination, and within a week a shantytown consisting of “tents made of plywood and yellow tarp [was] constructed on a sixteen acre site near the Lincoln Memorial” on the grounds of the National Mall (Robert Houston and Aaron Bryant, “Most Daring Dream: Robert Houston Photography & the 1968 Poor People’s Campaign,” Callaloo 31, no. 4, Fall 2008: 1273). Popularly referred to as “Resurrection City”, this encampment grew to house thousands of people through the campaign’s duration.  Featuring its own zip code and mayored by Jesse Jackson, Resurrection City quickly became its own community, with an identity separate from that of the city on whose grounds it stood (John Kelly, “Before Occupy D.C., there was Resurrection City,” The Washington Post, December 3rd, 2011).

Jill Freedman, Resurrection City, 1968. Higher Pictures.

Jill Freedman, Resurrection City, 1968. Higher Pictures.

Unfortunately, Resurrection City was not devoid of the “riots, protests, and violent repression that had followed King’s assassination” throughout the country (Messman, 32). One particularly nasty encounter on the evening of June 20th, a few days before the camp’s land use permit was set to expire, included the deployment of a Molotov cocktail within the vicinity (whether it was thrown at or by Poor People’s Campaign members remains a matter of debate to this day) and culminated with D.C. police officers “fir[ing] more than a dozen tear-gas canisters into the encampment (Kelly).  Four days later, on June 24th, the permit authorizing the Resurrection City encampment expired and the site was deconstructed and cleared.

Today, there are no traces of the Poor People’s Campaign tent city that once sprawled across the Mall’s grounds near the Lincoln Memorial.  The site once occupied by Resurrection City now houses a portion of Korean War Veterans’ Memorial, completed in 1995. Recognizing that it would be a shame for the memory of the physical embodiment of one of  Martin Luther King, Jr.’s final efforts toward universal equality to fade into obscurity, the National Park Service — the agency with jurisdiction over the National Mall and its associated monuments — hopes to commemorate the Poor People’s Campaign and the ideals represented by the Resurrection City demonstrators.

For our final project, myself, Kristen Horning, and Joanna Capps are partnering with the National Park Service to create a digital forum through which the Resurrection City experience can be interpreted to the public.  We hope to create a webpage to be featured in the “History and Culture” section of the National Mall’s public website.  Additionally, we aim to increase the opportunity for interpretive connection through the development of a mobile phone application featuring a walking tour following the layout of the original Resurrection City tonight in conjunction with photographs and interpretive texts and/or audio recordings providing additional information.  In conducting the research necessary to complete this project, we will be collecting data through a number of different sources, from utilizing the collections of the National Archives, Library of Congress, and other public cultural institutions, as well as conducting oral interviews with individuals who experienced Resurrection City firsthand, whether as active participants in the Poor People’s Campaign, or as locals living in the area at the time.

Ultimately, we imagine the finished product will serve as a digital means of connecting audiences to the cultural resource embodied in the key issues, beliefs, and values associated with Resurrection City and the Poor People’s Campaign.  A final goal for this project is the hope that the issues raised in our digital discussion of Resurrection City will resonate with modern audiences and encourage a deeper appreciation not only for the equal rights movements of the past, but also what more can be done to assist those facing similar struggles in today’s society.

Image Manipulation: Friend or Foe?

Photo manipulation: we all do it. Whether it’s choosing the perfect filter for your Instagram, resizing images to crop out visual clutter, or cosmetic touches, ours is a generation of posed perfection.  These actions are commonplace, so routine that we engage in them on an almost subconscious level, barely recognizing that we are actively altering an image from its original state. I mean, what’s the harm, right? It’s the final product that matters, the one you put on display for all to see. With those small changes, that final product simply turns out looking better, carries a message more effectively.  If the end goal is to have a memorable picture that attracts attention, and manual adjustments help to do that, what’s the big deal?

New York Times columnist Errol Morris tackles this issue of photographic manipulation and its effects on public perception and the historic record in his thought-provoking opinion piece, “Which Came First, the Chicken or the Egg?.” In this series of three articles, Morris examines a pair of hotly-debated images taken by British photographer Roger Fenton in the midst of the Crimean War. Depicting the same stretch of road in a particularly battle-riddled area, the presence of fallen cannonballs off the side of the road in one picture and their seemingly-subsequent placement on the road in the other lie at the center of the controversy:

Fenton, Roger. Valley of the Shadow of Death. Harry Ransom Humanities Research Center. The University of Texas at Austin.

Fenton, Roger. Valley of the Shadow of Death. Harry Ransom Humanities Research Center. The University of Texas at Austin.

Fenton, Roger. Valley of the Shadow of Death. Harry Ransom Humanities Research Center. The University of Texas at Austin.

Fenton, Roger. Valley of the Shadow of Death. Harry Ransom Humanities Research Center. The University of Texas at Austin.

Were the cannonballs placed on the road by Fenton and his assistant in the hopes of achieving a more dramatic image? Or were they moved off the road and harvested by British soldiers  to be fired again at the next battle? Seeking to definitively determine the photographs’ ordering, Morris travels to the modern-day Crimea to investigate.  Through personal examination of the valley’s terrain along with the consultation of experts from various fields from museum curators to forensic photography analysts, Morris ultimately draws the conclusion that the cannonballs were, in fact, placed on the road after the photograph of the cleared road was taken.  The second image was, in a sense, staged.

Do we know Fenton’s reasoning behind staging the second photograph? Not definitively.  From viewing the two images, the “cannonballs-on” one certainly carries a more dramatic impact than its cleared-road counterpart.  The possibility that Fenton was aiming primarily for an emotional impact, sending a specific message to his viewers, is very real.

Does the knowledge that Fenton staged this image affect its credibility as a resource? Perhaps not entirely.  Despite its artificial origins, the image itself remains an evocative piece, depicting the grimmer aspects of a war-plagued landscape.

So how do historians deal with altered or staged images? Do we write them off as “fake” and thus cast them aside? In a separate article, “Photography as a Weapon,” Morris argues that we actually stand to benefit from further examination of such photos. Studying these images, according to Morris, offers a prime opportunity to investigate a photographer’s motivations, audience reactions, and the relationship between photography’s dual roles as an artistic and social medium, and as a purported purveyor of historic truth.

Do you agree? Can historians stand to learn as much from an altered image than from an untouched one?  What does the present commonplace nature of photographic manipulation tell us about today’s society?

Review: Biography Through a Digital Lens

The Creation of Anne Boleyn: A New Look at England’s Most Notorious Queen by Susan Bordo. New York: Houghton Mifflin Harcourt Publishing, 2003; 343 pp.

Homepage for the Creation of Anne Boleyn website,

Homepage for the Creation of Anne Boleyn website,

  Most scholarly works on Anne Boleyn are biographical ones, tracing the details of her life (with particular emphasis on her time as Queen of England and her death by beheading) through the consultation of various primary sources: letters, diaries, documents of state, et cetera. Drawing upon a mixture of traditional “hard” primary sources and modern digital-based methodologies,  Susan Bordo’s The Creation of Anne Boleyn: A New Look at England’s Most Notorious Queen brings a fresh angle to the already dense body of literature surrounding the elusive personage embodied by Anne Boleyn, second wife of King Henry VIII and mother of Queen Elizabeth I. 

The Creation of Anne Boleyn differs from the traditional narrative model in Bordo’s examination of both Anne’s life and the changing cultural representations of her as society evolves. Divided into three parts, the book begins with Bordo’s own retelling of Anne Boleyn’s experience.  Here, she examines the usual batch of primary sources used by scholars and fiction authors alike, coupled with an investigation into the manner by which such individuals manipulate these sources to present a specific image of Anne to their audience.  Next, Bordo includes a second section exploring the shifting representations of Anne’s character in popular culture, from contemporary accounts through the first half of the twentieth century. Picking up where the previous section leaves off, the final chapters of The Creation of Anne Boleyn focus on popular notions on Anne in late-twentieth and early twenty-first century society, specifically her portrayal via film and online.

From inception to publication, The Creation of Anne Boleyn serves as an exemplary representation of the digitized research process.  The project’s origin within an email correspondence between Bordo and an English journalist (Bordo xii) is, itself, indicative of the increasingly expansive influence of digital media in modern scholarship.  As an American author writing on an English subject, Bordo conducted much of her research digitally via online communication and database collections of images, text transcriptions, and articles (Bordo, 117).  The book itself is accompanied by both a Facebook page and a WordPress blog, meant to chronicle the research and production processes, as well as provide a forum for readers to contribute to discussions and offer feedback.  This use of social media as a means of directly connecting with the public allowed Bordo to work collaboratively with her audience, and to incorporate their input into the final product.

Bordo’s extensive use of digital tools in The Creation of Anne Boleyn puts a unique spin on the traditional narrative featured in most other scholarship surrounding Anne Boleyn.  In fact, the completion of the book itself would have been considerably more difficult without the aid of digital media.  Some of Bordo’s strongest writing features in her analysis of the perpetually-changing cultural representations of Anne over time.  Much of that research involved watching a number of films about Anne (many of which are, I’m sure, are available for viewing or rental on streaming sits such as Netflix) perusing other popular blogs and websites, and online interactions with audience members, whether in the form of formal interviews or casual Facebook conversations.  The wide variety of digital means through which scholars can connect more easily not only with their topic of research, but also with the public, opens several new doors in terms of the production and presentation of scholarly works.  With its combination of such digital aides and traditional methodologies, The Creation of Anne Boleyn offers readers a glimpse into what increased digitization holds in store for the research process, along with an interesting view of the production and publication of scholarly works in the modern digital age.

All’s Fair Use in Love and War

With all this talk of fair use and the intricacies of copyrighting in this week’s readings, I couldn’t help but think of the ongoing plagiarism scandal surrounding film star, Shia LaBeouf. To briefly recap: back in December, Mr. LaBeouf (of Transformers and Even Stevens fame, among other film and television projects) released a short film that was revealed to be largely lifted from a graphic novel authored by a guy named Daniel Clowes. In the months following, LaBeouf has issued a number of public statements apologizing for his various plagiarisms which in turn appear to plagiarize other public apologies.  This has all led to the most recent development in the saga, in which the actor has launched a performance art piece entitled #IAMSORRY involving him sitting silently in a room wearing a tuxedo and a paper bag featuring the words “I AM NOT FAMOUS ANYMORE” over his head (a look LaBeouf debuted last weekend at a Berlin film premiere).

Getty Images; taken from TIME Article “A Brief History of Shia LaBeouf Copying the Works of Others.”

Now, you’re probably asking “What does Shia LaBeouf copying a bunch of stuff and generally acting like kind of a crazypants have to do with digital history and fair use?” The answer: quite a bit, actually. Although we as historians may not be plagiarizing wholesale from anyone else’s works, nor exhibiting such erratic behavior after doing so, we can still use the LaBeouf scandal to our advantage in better understanding what does and does not constitute “fair use” in the online world.

Defined by Roy Rosenzweig and Daniel Cohen in “Owning the Past?” as the practice of  “limited borrowing from the work of others [that is] acceptable when that borrowing produces something new and useful,” fair use is regarded by authors Patricia Aufderheide and Peter Jaszi as a “tool of creative freedom.” In their book, Reclaiming Fair Use: How to Put Balance Back in Copyright, Aufderheide and Jaszi argue that the practice of fair use serves as a shield of sorts, defending the creative process from the clutches of power-hungry copyright holders seeking to become “chiefs of private fiefdoms of culture, and private censors of future culture.” While this description appears a tad extreme, and clearly does not speak to the aims of all copyright holders – several of whom simply seek to protect the rights to what is intellectually theirs – Aufderheide and Jaszi make a good point in characterizing fair use as an important tool in the creative process. 

So was Shia LaBeouf simply practicing his right to fair use in this string of copyright scandal? If you’re going by Rosenzweig and Cohen’s 4 Factors of Fair Use, then probably not.  Looking at all of LaBeouf’s plagiarism snafus, he collectively violates each of these four factors in one form or another.

Of course, not all cases are as high-profile as the LaBeouf scandal, and some areas are significantly grayer than others.  At what point does exercising a right to fair use turn into wholesale breach of copyright? For that matter, to what extent does a copyright protect certain materials? As indicated by Rosenzweig and Cohen, the intricacies of copyright law are constantly changing over time.  Can a conclusive decision ever be reached? What would this set boundary look like and how would it be enforced?

More importantly, will Shia LaBeouf ever take that bag off his head?

An Ode to Buzzfeed: 3 Reasons Why Buzzfeed Deserves Digital Preservation

Like many members of my generation, I have an online addiction.  In this case, my cyberdrug of choice comes in the form of popular news and entertainment website, Buzzfeed. One of my computer’s top-visited websites, Buzzfeed meets many of my digital needs whether I’m aiming to brush up on current events or simply searching for an engaging diversion to alleviate boredom.  Upon first glance, Buzzfeed’s homepage appears as a jumble of content ranging from the serious to the absurd:


Moving beyond this frenzied appearance, the content presented by Buzzfeed offers visitors an opportunity to simultaneously experience several aspects of contemporary young adult culture. This, coupled with its ranking among the top 50 websites visited in the U.S. makes Buzzfeed, at least in my opinion, a suitable candidate for online archiving by a cultural institution. Using the “listicle” thematic structure for which Buzzfeed articles are notorious, let’s take a closer look at some of the reasons why the Buzzfeed website should be digitally archived for future study:

1. It provides an inclusive snapshot of 21st century society.

Where else can you go to catch up on Olympic coverage, find out the latest news storiesand take a quiz determining which Mean Girls character best reflects your personality? A self-described “social news and entertainment company,” Buzzfeed serves as a one-stop shop for all of these and more.  Simply reading through the site’s headings and sub-headings indicates that the posts offered cover a myriad of subjects, ranging from world news to food recipes. As illustrated by the “About” video above, Buzzfeed’s aim is to combine news reporting, advertising, and storytelling into one medium, where users can satisfy curiosity, increase knowledge, and digitally contribute to cultural trends through the creation of their own posts and listicles. This level of interactivity coupled with the site’s diverse content allows Buzzfeed visitors an opportunity for an “inside look” into 21st century American culture. Through browsing the various posts, whether current or housed in the site’s in-house archive, readers can track social changes over time from personalities and events regarded as culturally significant to popular food types or clothing styles. This insight into both nationally (or globally) significant content and that which pertains more to the nuances of the everyday provides a relatively inclusive representation of 21st century life in the United States, making Buzzfeed a site worthy of digital preservation.

2. It’s not going to be popular forever.


Just as former social networking supersite Myspace became slowly eclipsed by the rise of Facebook (which in turn is beginning its slow descent as users flock to other forms of social media, such as Twitter and Instagram), Buzzfeed also faces similar challenges. Competitors, such as rival site EliteDaily (whose homepage is pictured above), use page models and article structures resembling those made popular by Buzzfeed to promote their own content.  While Buzzfeed remains the top site of its kind for the time being, it is inevitable that the fast-paced nature of the web and continued technological advances will eventually produce a new social medium that will ultimately take its place.  With this in mind, it is important that Buzzfeed and its already-archived content be preserved on a larger scale by an institution or organization before this decline occurs.

3. It serves as an example of the ongoing issue of fair use and other questions of “netiquette.”


No website is perfect, and in this new digital era the question of ownership and fair use is murkier than ever. Buzzfeed, like other sites, has run into its own issues of fair use over the years as evidenced by this Gawker article chronicling controversies surrounding the page’s earlier days.  While new management and a re-evaluation of site goals has led to better fair-use practice among Buzzfeed writers and contributors, the accusations regarding content plagiarism from other community-based sites such as Reddit, provide a learning opportunity in terms of the extent of intellectual ownership and the parameters of authority over online content.

Of course, archiving a site operating on real-time (as Buzzfeed does) comes with its challenges.  The top stories featured on the Buzzfeed’s homepage change every few minutes, based on post popularity and visitor traffic.  This means that the page layout and posts themselves would have to be archived as the appearance changes in order to maintain an accurate representation of the site.  As such, an institution or organization (like the Internet Archive, for example)  with the capacity to preserve whole websites and pages in addition to video, audio, image, and text content would most likely be ideal for taking on the Buzzfeed preservation project.

What do you think? Should Buzzfeed be archived? What other sites do you think deserve preservation?

Archiving the Web: A Question of Content

From my understanding of Abbie Grotke’s “Web Archiving at the Library of Congress,” the main idea behind archiving online content is “to reflect the true evolution of society, government, and culture online…[in order to] ensure that a representative sample of the web is preserved.”  While this seems a simple enough notion, the actual implementation of this preservation process carries with it a number of challenges and questions along the way.

The largest of these challenges is the question of what exactly should be preserved in order to fully realize the goal of presenting the changing ideas, values, and beliefs of contemporary society for future study. In some cases, such as Roy Rosenzweig’s case study of the online collection effort following the terrorist attacks of September 11th, 2001, the answer is obvious.  Events carrying such heavy global, national, and historic significance clearly require documentation, as do any digital content associated with them.  The question of what sort of content is worthy of preservation is where the proverbial waters tend to get murky.  Functioning as a simultaneously public and personal space, the web serves as a platform for not only “official” responses to such high-profile events (in the form of online newspapers, web pages for government organizations, and the like), but also the reactions of everyday individuals through social media networks, interpersonal messages, and online journals and blogs.

This combination of professional and recreational content offers historians a vast array of material through which can be gathered a well-rounded snapshot of the core ideals, beliefs, and values of contemporary society. As pointed out by Jinfang Niu in “An Overview of Web Archiving,” this snapshot is not limited to what can be considered the “better” aspects of society (“literature, scientific publishing”), but also representative of some of the “worst” (“advertising, pornography”).  Such an array of material ranging from the ephemeral to the long-lasting, from the enlightening to the unsavory form the basis for the challenge of deciding which content to preserve within the digital archiving system.  This initial question spawns further issues, such as those of copyright, ownership, and fair use.

How far should an institution go when it comes to archiving online material? Should a certain type of content take priority for preservation over others (scholarly vs. personal, for example)? If so, does this skew the “well-rounded” snapshot of society that historians are striving to preserve?

Check out some video interviews for Association of Research Libraries’ Code of Practices in Fair Use here. Does this Code of Practices ease some of the challenges faced in web archiving?