Muppet News Flash: Digital Libraries cost a lot of money (but don’t tell the general public)

A recent piece in the New York Times is reminding me why people don’t understand the enormous costs (let alone the time and effort) associated with digitizing a world’s culture.  Natasha Singer’s January 8 article does a great job at helping the public imagine the possibility of a Great American Digital Library, and she even quotes Benjamin Franklin to lend her argument a certain value that is created when ideas are linked to the nation’s forefathers.  What the news piece is real light on, however, are financial figures.  Check it out and see.

The article neatly summarizes the digitization efforts of certain national governments, compares them to the Library of Congress’s American Memory project and then to the Google Books project.  We learn that the LoC project has no formal connections to any public library projects and that several leading figures and organizations would like to collaborate on one giant digitization venture.  Wouldn’t it be great if we could coordinate our efforts, standardize systems and processes, and make it accessible to archivists, researchers, and to the public?  Yes Virginia, there is a Santa Claus.

My problem lies with the way the article pays short shrift to the costs of such an effort. After telling us that Harvard University’s Berkman Center for Internet & Society would like to develop a “digital public library of America,” Singer tells her readers that:

Of course, practical matters — like cost, copyright issues and technology — would need to be resolved first.

“The crucial question in many ways is, ‘How do you find a common technical infrastructure that yields interoperability for the scholar, the casual inquirer or the K-12 student?’” Dr. Billington says.

The New York Times does precious little in this article to break down the costs associated with a digitization project, let alone one of the magnitude to which it alludes.  What we’ve got listed are “copyright issues” and “technology”, which don’t touch the human capital required to develop and then maintain this digital archive.  And when Singer says technology, I don’t know if she means hardware, software, maintenance, preservation, or all of these things together.  And the piece says nothing of the physical plant required to house these servers, because even computers must be stored somewhere.  A digital library on any scale is expensive, but this article doesn’t explain why.

Now consider this second quotation, which considers the value of the Google Books project and then notes that in the future, copyright costs may have to be settled by universities, research facilities, and other PSEs:

People can read out-of-print items at no cost on Google Books, if those works are no longer subject to copyright protection. But if a judge approves a settlement between Google and copyright holders, subscription fees to access scans of out-of-print books still covered by copyright will have to be paid by universities and other institutions.

An American digital public library would serve as a nonprofit institutional alternative to Google Books, Professor Darnton says.

Now we have an example that raises the spectre of “subscription fees” without explaining the burden these fees are to universities.  I have no doubt that when most people read “subscription fees . . . will have to be paid by universities”, they don’t have a sense of the business models and financing at play; for a lot of people, what will really matter is that the buck stops somewhere but thankfully not with them.  As librarians, we know that our electronic resources, as valuable and cost-effective as they are, eat up a large part of our (largely taxpayer-funded) budgets.  These “subscription fees” are not at all like the fees people pay for cable tv or internet connection at home.  PSEs and their libraries pay through the nose to large, for-profit organizations for electronic access to materials that are often funded by the PSEs themselves.  And even in the case of non-profit organizations like JSTOR, the fees remain costly.  So a “non-profit institutional alternative” that seeks to facilitate digitization and access to a nation’s cultural heritage at reasonable rates could still leave a collections librarian bruised.

Digital Preservation is a labor of love: this machine cost nothing to Penn State and this man is happy to volunteer his time and expertise.

We need to get real when we talk about digitization projects to the public, especially when we talk about huge mega-projects like a mass digitization of American cultural history.  Articles like this New York Times piece do nothing to explain the real costs involved in digitization, collections, and electronic access.  And cost is where it counts.  Too often at the reference desk do I find myself explaining to students that material found on Internet is not free and that the dollars they pay for access on their smartphones or at home covers the cost of transmission but not for “content.”   We need to start educating people so they understand that a monthly data plan or Internet bill pays only for the pipes through which content is downloaded to their devices and not for the actual development and maintenance of the content they are retrieving, let alone the infrastructure (human and physical) required to maintain it.

I apologize if I sound like a cranky curmudgeon here.  Like most librarians, I fully believe that information wants to be free.  But that’s only a desire.  Information may want to be free, but right now it isn’t.  And it’s up to people like us to explain to the world the real costs associated in our information landscape.