Muppet News Flash: Digital Libraries cost a lot of money (but don’t tell the general public)

A recent piece in the New York Times is reminding me why people don’t understand the enormous costs (let alone the time and effort) associated with digitizing a world’s culture.  Natasha Singer’s January 8 article does a great job at helping the public imagine the possibility of a Great American Digital Library, and she even quotes Benjamin Franklin to lend her argument a certain value that is created when ideas are linked to the nation’s forefathers.  What the news piece is real light on, however, are financial figures.  Check it out and see.

The article neatly summarizes the digitization efforts of certain national governments, compares them to the Library of Congress’s American Memory project and then to the Google Books project.  We learn that the LoC project has no formal connections to any public library projects and that several leading figures and organizations would like to collaborate on one giant digitization venture.  Wouldn’t it be great if we could coordinate our efforts, standardize systems and processes, and make it accessible to archivists, researchers, and to the public?  Yes Virginia, there is a Santa Claus.

My problem lies with the way the article pays short shrift to the costs of such an effort. After telling us that Harvard University’s Berkman Center for Internet & Society would like to develop a “digital public library of America,” Singer tells her readers that:

Of course, practical matters — like cost, copyright issues and technology — would need to be resolved first.

“The crucial question in many ways is, ‘How do you find a common technical infrastructure that yields interoperability for the scholar, the casual inquirer or the K-12 student?’” Dr. Billington says.

The New York Times does precious little in this article to break down the costs associated with a digitization project, let alone one of the magnitude to which it alludes.  What we’ve got listed are “copyright issues” and “technology”, which don’t touch the human capital required to develop and then maintain this digital archive.  And when Singer says technology, I don’t know if she means hardware, software, maintenance, preservation, or all of these things together.  And the piece says nothing of the physical plant required to house these servers, because even computers must be stored somewhere.  A digital library on any scale is expensive, but this article doesn’t explain why.

Now consider this second quotation, which considers the value of the Google Books project and then notes that in the future, copyright costs may have to be settled by universities, research facilities, and other PSEs:

People can read out-of-print items at no cost on Google Books, if those works are no longer subject to copyright protection. But if a judge approves a settlement between Google and copyright holders, subscription fees to access scans of out-of-print books still covered by copyright will have to be paid by universities and other institutions.

An American digital public library would serve as a nonprofit institutional alternative to Google Books, Professor Darnton says.

Now we have an example that raises the spectre of “subscription fees” without explaining the burden these fees are to universities.  I have no doubt that when most people read “subscription fees . . . will have to be paid by universities”, they don’t have a sense of the business models and financing at play; for a lot of people, what will really matter is that the buck stops somewhere but thankfully not with them.  As librarians, we know that our electronic resources, as valuable and cost-effective as they are, eat up a large part of our (largely taxpayer-funded) budgets.  These “subscription fees” are not at all like the fees people pay for cable tv or internet connection at home.  PSEs and their libraries pay through the nose to large, for-profit organizations for electronic access to materials that are often funded by the PSEs themselves.  And even in the case of non-profit organizations like JSTOR, the fees remain costly.  So a “non-profit institutional alternative” that seeks to facilitate digitization and access to a nation’s cultural heritage at reasonable rates could still leave a collections librarian bruised.

Digital Preservation is a labor of love: this machine cost nothing to Penn State and this man is happy to volunteer his time and expertise.

We need to get real when we talk about digitization projects to the public, especially when we talk about huge mega-projects like a mass digitization of American cultural history.  Articles like this New York Times piece do nothing to explain the real costs involved in digitization, collections, and electronic access.  And cost is where it counts.  Too often at the reference desk do I find myself explaining to students that material found on Internet is not free and that the dollars they pay for access on their smartphones or at home covers the cost of transmission but not for “content.”   We need to start educating people so they understand that a monthly data plan or Internet bill pays only for the pipes through which content is downloaded to their devices and not for the actual development and maintenance of the content they are retrieving, let alone the infrastructure (human and physical) required to maintain it.

I apologize if I sound like a cranky curmudgeon here.  Like most librarians, I fully believe that information wants to be free.  But that’s only a desire.  Information may want to be free, but right now it isn’t.  And it’s up to people like us to explain to the world the real costs associated in our information landscape.


Google v. Blekko v. The Librarian. (The librarian wins.)

In the past week I’ve heard three different librarians say something like, “We lost to Google years ago”.  We know that this sort of statement isn’t complete hyperbole.  When it comes to discovering or verifying quick facts, people turn to search engines faster than they ever turned to an encyclopedia at home or a reference collection at the library.  While there are many things librarians can do better than Google, like help people find the needle the information haystack, or teach people how to make wise, informed decisions when researching, when it comes to ready reference, most of the time Google has got us beat.

The big thing Librarians still have over Google, though, is criticism and control.  We not only know how to quickly manipulate Google’s search engine (and other companies’ engines) to discover decent results, but we are pretty good at separating the wheat from the chaff.  I notice this especially with government documents and government data on the web: people who visit me at the reference desk who are looking for government data have a hard time finding information and then being able to verify its authority.  There are no second readers on the web – people have to rely on their own experience and understanding of information organization and information architecture to locate documents, and then be willing to using them with confidence.  Librarians, however, can help people locate information sources, draw relationships between items, and determine the value of this knowledge to their own work.  For these reasons alone, we’re kind of a big deal and shouldn’t be afraid to say so.

Click through for a great example on why Google is *not* a good search engine.


Especially in this so-called digital age, our ability to help people choose information sources makes us essential to information management and research services.  For all of our complaints about people’s reliance on the Google search engine and index, we can at least take comfort knowing that our “editorial” function vis-a-vis the Internet is still necessary and valued.  What’s a curator but a selector of items of value?  I’m not saying that librarians curate the web, but on the whole, we certainly have a broad understanding of the tools and resources needed to help you find what data you’re looking or to take your work to the next level.

But now, Internet, Inc has developed the latest, greatest search engine that apparently should leave us shaking in our boots: BlekkoBlekko is receiving a lot of new-startup-PR this month because it is doing what librarians have done for ages (and what Google doesn’t bother to do) – it separates the good from the downright ugly on the Internet.  Although Blekko has indexed over 3 billion webpages, it lists only top results in order to cut down on website “pollution” from content farms and simple dirty spam.  I’ll let the New York Times take over from here:

People who search for a topic in one of seven categories that Blekko considers to be polluted with spamlike search results — health, recipes, autos, hotels, song lyrics, personal finance and colleges — automatically see edited results.

And furthermore, their comparative example:

In some cases, Blekko’s top results are different from Google’s and more useful. Search “pregnancy tips,” for instance, and only one of the top 10 results,, is the same on each site. Blekko’s top results showed government sites, a nonprofit group and well-known parenting sites while Google’s included

“Google has a hard time telling whether two articles on the same topic are written by Demand Media, which paid 50 cents for it, or whether a doctor wrote it,” said Tim Connors, founder of PivotNorth Capital and an investor in Blekko. “Humans are pretty good at that.”

Blekko's logo - featuring a real live person (a librarian, no doubt)

Blekko’s founders are basically looking Google in the eye and saying the Internet isn’t going to be a wild west any more, that editorial control (if not authority control, too?) is required to organize all the information available to anyone ready to jack in to the web.

This is verging on librarians’ territory.  Should we be concerned?  I don’t think so.  Should Blekko succeed at helping the entire world discern what is valuable and critical from what is a bottle of plonk on the Internet, then I think we’ve got a problem, but given the fact that information is synthesized into knowledge at the local level, I think we still have something on the these apparent new search engine masters.  And I don’t feel like I’m sticking my head in the sand by saying that, either.  Sure, the Internet can give us a run for our money at times, but if anything it’s made the work we do all the more important to the people we serve.  With so much information available to people since the development of the web, it’s useful to have other people (i.e., us) close at hand to help them determine their particular information needs and help them solve it.

Librarians - We are electronic performers (apologies to Air)

Blekko won’t know, for instance, what titles our local public library holds, and neither it will be certain which electronic databases our local universities subscribe to.  And I can pretty much guarantee it won’t have any Canadian socio-economic data (longform or no longform) and very few government documents.  This is where the person on the ground – the librarian – can step in and act as an intermediary between our patron and what the Internet has to offer.

Funny.  I nearly called the Internet an “Interblob” just now.  Because that’s what it is – a big doughy blob of information.  But because I’m a librarian, I can help you find what you’re looking for on it – Google or no Google, Blekko or no Blekko.

The iPad is great. Scholarly e-Book interfaces on an iPad are awful.

Last night I borrowed an ipad from my library/place of work to see how our vendors’ e-reader platforms stack up. In a word, the interfaces which the vendors provide are not ipad/tablet friendly at all. EBL, ebrary, and MyiLibrary all show content on their framed pages, i.e., what we’re used to seeing on our desktops and PCs. This may be acceptable to some when you have a widescreen monitor, but it doesn’t work well at all on a tablet. It is terribly difficult to zoom in on the page in order to click on the vendors’ own zoom functions, which hampers the reading experience.

Obviously, it’s still quite early in the game, but I think the vendors could learn a little from the e-book platforms used for devices and GUIs such as the Kindle, the Sony e-reader, etc. Books used on these devices are stored in a similar PDF format, but it is far, far easier to scroll through, to zoom, and to annotate on these than it is with vendor ebook interfaces. This became as clear as day once I tried out the iPad‘s own  iBooks Bookshelf: this different piece of software – used on the same device I was trying to read our ebooks on – gave me so many more functions than the vendors’ software could.

I’m not writing off the use of tablets, in the least.  I adore the iPad and will buy one shortly.  I also think that there will be a time when most textbooks will be purchased and read on them, and I think that time is much closer than we expect. But we’re at a point where the hardware exists to support the idea, but the software interfaces still need to catch up.  Apple does have a fine product; I’m curious to see how our vendors will react to it.

In the mean time, check out an iPad if you can and compare vendor-supplied e-books to books on the Apple iBooks bookshelf (some are pre-loaded for free), and then check out other books – also on PDF – on the Project Gutenberg website. You’ll see the difference in spades.

n.b. i am referring to browser-based e-reader interfaces in this post, which are substantially different from the Apple iBook bookshelf.  But that’s my point – we need to see great software from vendors to really make the ebook work.

e-books and the humanities

Inside Higher Ed published an article this week on the recent controversy surrounding the decision by the Bird Library at Syracuse University to store rarely used texts at a site 250 miles away from campus, which has stirred debates in LIS and scholarly circles. I’ve been reading commentary in my twitter stream and RSS feeds that considers many of the subjects touched on in the article, from the role of the library and the librarian (book depository or learning commons?  Book Lover or knowledge and asset manager?) to the role of the book in the academy itself (essential to the programme, or redundant in the wake of digitization?).

There are a lot of subjects to tease out of this one post, especially on the profession’s ability to promote its mission to the wider public.  Face it, we don’t know what to call ourselves, we don’t properly and consistently explain what we do to the public, and people often don’t understand the role we play in their institutions and in society at large.  Although the subject of identity and promotion is dear to my heart, the Inside Higher Ed article touches on an undercurrent always topical in LIS circles, which is the place of the monograph in contemporary scholarship.  As we see in the original post (and also witnessed in the always-superb Little Professor blog, there is a genuine concern for the role and the place of the book in humanities libraries (let alone the scholarship!) today.  As a one-time arts student, I appreciate this concern; I spent many days and nights leafing back and forth through texts in order to immerse myself in and learn how a writer’s language and rhetoric toyed with her – and my own – understanding of the text.  So much of our literary and intellectual culture exists in a paradigm that demands individual and constant reflection of the words on the physical page, but the interaction with the text that e-books offer the reader is a poor substitute to the relationship we have with the words we find in print.

That the printed word is vital to humanities research is a truth.  That the printed word is being replaced by its digital cousin, however slowly, is a fact as well.  Economic models, and more importantly,  our culture’s interactions with the word is changing, or has already changed the way in which books are published, collected (or licensed/accessed) and read.  But I think it’s still far too soon before we should hold a wake for the monograph; so long as e-book readers remain prohibitively expensive and DRM continues to offer few benefits to the end-user, and e-book platforms such as MyiLibrary and eBrary refuse to enter consensus on a common look and utility, then the e-book will remain secondary to the printed text.

I’m not suggesting that the e-book will forever be a poor cousin to the printed and bound copy of a text – far from it.  I’m merely contending that we are still a few years away (maybe as few as 2 or 3, maybe as many as 5) before the hand-held e-book reader reaches a critical mass in the marketplace and eclipses the print edition as the format students turn to first. Until the day comes when a plurality of the public carries their own e-book reader, then the printed copy will be the main source for the humanities.

But what of the day when the e-book does assert dominance over the printed text?  Will we dispose all of our bound originals?  Will scholarship on the author’s interaction with the physical object or the study of book history fall by the wayside? Likely not.  These, and others, are strong disciplines and I don’t think the humanities will allow them to wither on the vine. Scholarship in the humanities and the tools of the scholar may change, but it will not disappear. On the contrary, our study of the actual physical text will be more important than ever, especially after such a monumental shift in reading culture will have occurred with the shift to e-readers.