Online Musings of a Public Historian

Posts tagged ‘digitization’

Archiving the Web: A Question of Content

From my understanding of Abbie Grotke’s “Web Archiving at the Library of Congress,” the main idea behind archiving online content is “to reflect the true evolution of society, government, and culture online…[in order to] ensure that a representative sample of the web is preserved.”  While this seems a simple enough notion, the actual implementation of this preservation process carries with it a number of challenges and questions along the way.

The largest of these challenges is the question of what exactly should be preserved in order to fully realize the goal of presenting the changing ideas, values, and beliefs of contemporary society for future study. In some cases, such as Roy Rosenzweig’s case study of the online collection effort following the terrorist attacks of September 11th, 2001, the answer is obvious.  Events carrying such heavy global, national, and historic significance clearly require documentation, as do any digital content associated with them.  The question of what sort of content is worthy of preservation is where the proverbial waters tend to get murky.  Functioning as a simultaneously public and personal space, the web serves as a platform for not only “official” responses to such high-profile events (in the form of online newspapers, web pages for government organizations, and the like), but also the reactions of everyday individuals through social media networks, interpersonal messages, and online journals and blogs.

This combination of professional and recreational content offers historians a vast array of material through which can be gathered a well-rounded snapshot of the core ideals, beliefs, and values of contemporary society. As pointed out by Jinfang Niu in “An Overview of Web Archiving,” this snapshot is not limited to what can be considered the “better” aspects of society (“literature, scientific publishing”), but also representative of some of the “worst” (“advertising, pornography”).  Such an array of material ranging from the ephemeral to the long-lasting, from the enlightening to the unsavory form the basis for the challenge of deciding which content to preserve within the digital archiving system.  This initial question spawns further issues, such as those of copyright, ownership, and fair use.

How far should an institution go when it comes to archiving online material? Should a certain type of content take priority for preservation over others (scholarly vs. personal, for example)? If so, does this skew the “well-rounded” snapshot of society that historians are striving to preserve?

Check out some video interviews for Association of Research Libraries’ Code of Practices in Fair Use here. Does this Code of Practices ease some of the challenges faced in web archiving?

Preserving the Digital


Today, the amount of information available online is astonishing.  Social media networks allow anyone who uses them (which, arguably, a significant amount of the population does) to leave a digital record of themselves through online photo albums, sharing internal thoughts, feelings, and beliefs in text posts, and conversations between one another – both public and private.  Research databases such as those offered by JSTOR and the Library of Congress allow users easy access to scholarly articles and primary sources alike, thanks to the transference of physical items, papers, and photographs to a digital medium through scanning and the like.  In fact, one could argue, that this ability to transform the physical into the digital is itself a significant innovation in terms of preserving historic material.

But how do we go about preserving this digital historic material?

In his essay, “Scarcity or Abundance? Preserving the Past in a Digital Era,” Roy Rosenzweig addresses precisely this question.  Asserting that with the digital boom of recent decades historians have, in a way, fallen victim to the misconception “that we have [reached] a golden age of preservation in which everything of importance was saved,” Rosenzweig highlights the significant absence of any substantial tool or methodology for the preservation of online materials. Urging historians not to take a backseat to the proceedings concerning the development of more advanced digital preservation, Rosenzweig sounds a call to action of sorts for those in the field to take on the “social, economic, legal, and organizational” hazards that come along with digital documents, particularly in issues of authenticity and ownership.

Written in 2003, Rosenzweig’s article is, admittedly, a bit dated.  However, simply looking at the vast changes within the digital world over the course of the past decade serves to reinforce his primary points.  As these rapid technological changes take place, how can a single system of preservation for digital materials be developed and employed? Have we made any progress since the time of Rosenzweig’s writing? What more can we do? How do you think the digital sphere is going to look a decade from now?

As a postscript of sorts, the Bert is Evil website lamented by Rosenzweig for its deletion from the digital world following the 9/11 attacks appears today to be alive and well on the web. Additionally, it seems to now have a sister site, Barby is Bad, chronicling the nefarious deeds of another American children’s icon, the Barbie doll.  While I am unsure if these sites were created by the initial Bert is Evil web designer, I still encourage you to check them out — they are pretty entertaining!