In previous posts, we’ve discussed issues of fair use and taken a look at the Google Books corpus and the new trend in culturomic analysis. Now, let’s do a mash-up of the two as we examine the legality of the TIME Magazine Corpus of American English:
Working with Mark Davies, a corpus linguistics professor at Brigham Young University, TIME Magazine has put together its own text database through which users can:
…quickly and easily search more than 100 million words of text of American English from 1923 to the present, as found in TIME Magazine. You can see how words, phrases and grammatical constructions and see how words have changed meaning over time.
Compared with the size and scope of the Google Books corpus project (5.2 million digitized books spanning a period of several hundred years), the 100 million words and eighty year timespan (1923-2006) featured in the TIME corpus appears positively miniscule. This small scale is not necessarily a setback, however, particularly when it comes to matters of copyright and issues of fair use.
Part of the reason for the smaller scope of the TIME corpus, for example, is due to the fact that all of its featured data is culled from the TIME Magazine archives. As such, all of the data within the corpus is also owned by the entity maintaining it. This allows TIME to share such data without worry of violating copyright ownership, and to additionally provide users of the corpus the opportunity to read the highlighted text in its original context, offering access to full articles as they were initially published.
With ownership of all of its content, along with the resultant ability to offer further access to previous publications, the TIME corpus functions safely within the parameters of fair use. This is beneficial for corpus operators and potential users alike, allowing for the corpus to provide a meaningful research experience for all involved.