Online Musings of a Public Historian

Can tracing linguistic changes over time reflect shifts in cultural trends?

According to Jean-Baptiste Michel and the other minds behind the culturomic analysis movement, the answer is a resounding “yes.”

Working with the team responsible for the Google Books online collection, Michel and his fellow researchers constructed a corpus of almost 5.2 million digitized books.  Using this Google Books corpus, the team of scholars conducted a quantitative study analyzing the relationship between shifting linguistic and cultural changes over the period between 1800 and 2000.  Referring to this quantitative approach to measuring cultural trends as “culturomics,” Michel and company used their findings to produce the Google Ngram Viewer, an online tool of research through which everyday users can conduct their own studies within the Google Books dataset.  Users are instructed to simply enter a word or phrase (called an “ngram”) into the Viewer’s search bar, resulting in the creation of a line graph data chart chronicling that particular ngram’s level of usage within the corpus throughout the two-hundred year timeframe the study samples.

Sample Ngram Viewer study tracing the name "Abraham Lincoln"

Sample Ngram Viewer study tracing the name “Abraham Lincoln”

As seen in the sample Ngram Viewer study above, users can additionally use the data provided in the line graph to link particular peaks in ngram usage to significant historical events and/or cultural movements.  Using the name of one our nation’s more well-renowned leaders, “Abraham Lincoln,”  as an example, we can see that the initial spike in usage of his name in published works falls (predictably) within the period of his election, presidency, and duration of the Civil War.  Subsequent spikes occur in the years following World War I – a period of intense nationalism, during which Lincoln and other figures came to looked upon national heroes – and during the time surrounding the Civil Rights Movement, when associations with the Emancipation Proclamation and the ending of slavery in the U.S. were strongly linked to the mid-20th century struggle for racial equality.

While the arguments laid out by Michel and company highlighting the benefits of using the qualitative methods associated with culturomics in gaining fuller insight into traditionally humanist topics certainly make a strong  point, it is clear that the field still has a long way to go.  Glancing at the Culturomics FAQ page set up by the project’s participants,  one can see that there are still several kinks to be worked out with this particular method of study, particularly in relation to the quality of data.

Despite the undoubtedly large size of the Google Books corpus, the 5.2 million digitized works still only make up for around four percent of all published materials.  Similarly, the study focuses primarily on the years between 1800 and 2000 (despite the presence of materials dating as far back to the 16th century), since the data originating in those periods has proven most reliable.

Do these constraints undermine the quality of data produced by the Ngram Viewer? What can be done to widen these parameters? What should we as historians bear in mind while using the culturomic approach in our own work?

Comments on: "Quantifying Culture? Culturomics and the Google Books Corpus" (5)

  1. Alex, what an interesting post! I agree the ngram seems to offer a lot of great possibilities for historical research. However, I think that historians also need to be aware of the limitations that you mentioned and remember that while culturomics can be a valuable tool, they should be used alongside other forms of interpretation in order to better understand what we are looking at and what can account for the variations and patterns that we see in the ngram.

  2. Alex, I think you raise a great question about the constraints of the Ngram viewer. I feel like any studies or analysis that derives from the Ngram tool will have to be published with acknowledgement of the constraints of the corpus. It’s difficult to tell if this will hinder the legitimacy of studies done in the future, however, I agree that historians will need to think of ways to expand the corpus.

  3. […] discussed issues of fair use and taken a look at the Google Books corpus and the new trend in culturomic analysis. Now, let’s do a mash-up of the two as we examine the legality of the TIME Magazine Corpus […]

  4. […] few posts ago, we examined the concept of culturomic analysis and explored the Google Ngram Viewer, a digital research tool that uses a document’s text […]

  5. As we heard from Trevor Owens, any use of data by humanists will need to take into account the limits and constraints of that data. So there discussion on those constraints is less a problem than a model for how we can move forward using data in or interpretations.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: