Research Data Management Highlights: Digital Infrastructure Summary Summit 2014 Summary Document

This week, a very significant document regarding the future of research data management and digital stewardship landed on my desktop. This is a PDF all academic librarians in Canada must read – whether or not you are tasked with RDM. If you are in IT, Research Facilitation, REB, Industry Compliance, or are a researcher or an administrator, then you should read this, too. It conveys the pressing importance of RDM to the profession, and it shows that we have an opportunity at hand if we take it – or a storm brewing if we turn it away.

The document is the Summary Report for the Digital Infrastructure Summit 2014This conference was hosted by the Leadership Council for Digital Infrastructure in January 2014. Group representation included CARL, CRKN, CANARIE, TC3+, and CUCCIO; in all 140 participants took part (p. 1). This document outlines the outcomes of the summit, which argued that RDM is lacking in Canada, that a sincere commitment to digital stewardship and not just technology is required to move forward, and the time to act is now (p. 1). If you are a Canadian academic librarian, download the document and read it now.

Note: I was not a participant of this summit and am only summarizing the PDF in regards to RDM in Canada for librarians. I’m standing on the shoulders of giants when I write this post.

This document asks What is Digital Infrastructure (DI)?, considers the existing problems that are hampering the development of an effective DI in Canada, and traces a clear path forward on which the Canadian research enterprise should move. Research Data Management and the people involved in it are front and centre in this document, and this means academic librarians and preservationists. The library has a significant role to play, and we are expected to contribute.

Digital Infrastructure and Soft Skills

One of the document’s biggest takeaways – and what I argue should be one of the first talking points you should use when discussing research data management – is that digital infrastructure (DI) is far more than technology alone. The executive summary states in clear, plain language that digital infrastructure includes “our ability to capture, manage, preserve, and use data . . . data are infrastructure, as are the highly skilled personnel who facilitate access to data, computational power and networks” (p. 1). DI requires “skilled knowledge management personnel” (p.1) who have technical capacity, but as we see elsewhere in the document, also can participate in local and national policy formulation and interpretation, understand project management, and have the capacity to collaborate and lead in their own field and in others. These are a suite of advanced “soft skills” that are concomitant IT knowledge and experience, and they are bound together with other essential criteria such as sustained funding and ongoing government and industry support, which allow research data management to flourish rather than wither on the vine. A successful solution that addresses near-team and long-team RDM issues requires skilled, committed resources on the ground who are leading the way. DI cannot be left to colleagues on limited term appointments or to our grad students. It demands institutional memory and it requires organizational vision.

I’ve mentioned the argument in the above paragraph in a post long ago, but I’ll take this opportunity to link out again. Chuck Humphrey states this in clear terms when he explains that RDM is the “what” and the “how”, and digital stewardship is the “who”, and both are necessary requirements in RDM infrastructure. If you are a librarian, then read Chuck’s website. If you are a Canadian librarian, then read it again. And again.

What’s wrong with Canada’s Digital Infrastructure?  

The Leadership Council has cut right to the chase in their document. They want you, the reader, to know right away that there are real issues affecting digital scholarship in Canada:

  1. Our research data are a national asset, and they are not stewarded properly (p. 1). Canada needs to get up to speed, quickly. It needs RDM and it needs it now. It requires data storage infrastructure it doesn’t have at present. It requires better skills training. It requires better software development. (Fellow Librarians: This is all about us.)
  2. There is very little governance and coordination (p. 1-2). There are many, many players, from funding agencies to libraries to standards organizations to researchers themselves. We are all trying hard to fix this, but we’re not working together. Our governance model is weak. Time is lost, efforts are duplicated, and we are spinning our wheels. (Fellow Librarians: This is very much about us. Get in there and make it happen.)
  3. There is very little federal policy regarding DI (p. 2). This is related but distinct from the second point. With little direction from government, the community is looking in all directions all at once. Greater coordination, planning, and sustained, reliable investment would be beneficial to the national research enterprise. (Fellow Librarians. This, too, is about us. Do. Take part. Take charge.)

I support it's not a blog post if you don't add a worldle.

How to act. How to improve RDM. How to solve this crisis.

Note: I am focusing on mainly on RDM and digital stewardship in this post; the original document gives equal attention to other areas such as governance, policy, and funding.

Research data management/stewardship is as yet the weakest link in the Canadian DI landscape, despite the massive increases in the amount of data being created daily through the research process. There is currently no agreed-upon strategy and/or the capacity to protect this valuable public asset, with little capacity to support access, use and reuse by a wide range of users. (p. 6)

The document makes a strong case not just for increased technical infrastructure but for greater knowledge management, project management, and policy analysis. We simply cannot allow ourselves to dump data files one after another onto a server and then hope that serendipity or an as-of-yet uncoded search algorithm will help us organize, preserve, and provide access to these files in the future. Research data – especially publicly funded research data – are a public good, and they require maintenance, management, and care.

The document highlights significant RDM gaps in Canada that must be addressed. These are:

  • Lack of a core RDM resource (p. 8)
      • Canada requires a national data service, which can lead in stewardship, policy, and education. RDC, CARL, and CRKN all have assets to contribute in this regard; RDC has shown incredible strength in this area already
  • Lack of strategy (p. 9)
    • Canada has no high-level strategy framework guiding debate and decisions on standards, infrastructure and distribution access networks, obligations to existing international agreements; funding
  • Lack of Policy leadership (p. 9)
    • Tri-Council should take the next step and implement RDM policy under consideration.
  • Weak RDM culture (p. 9-10)
    • The benefits that RDM brings must be better articulated.
  • Lack of understanding of Digital Infrastructure (p. 10)
    • It is incumbent that stakeholders demonstrate to the greater community that digital infrastructure necessarily includes the data, and the professionals who steward them
  • Lack of training (p. 10)
    • RDM training is inconsistent at present and must be improved in the short-term for practitioners and researchers alike
  • Weak policy on long-term data lifecycle management (p. 11)
    • Like any collection, data must be managed in part because its supports are not without cost. Management will include asking tough questions like what should be preserved, if we have the means and capacity to preserve it, and for what length of time. I recommend that we all have discussions about data collection policies as soon as possible. Locally, in our consortia, and nationally.
  • Lack of Storage (p. 11)
    • Storage capacity for all disciplines must be addressed. RDM is in no way an “X not Y” proposition. We must serve all discipline, departments, faculties, and researchers.
  • Means to foster acceptance (p. 11)
    • This is a tricky issue. We need our researchers to accept and be a part of RDM. Compliance should be required, but strict policies at the outset may prevent too much pushback. There will be give-and-take in the beginning.
    • Note: The original document refers to “compliance” here. I don’t want to use that term. Do we need sticks? Yes. Do we want to use them? Only if we have to. But from the outside, we must have the attitude that everyone is a partner in this venture.

Good data stewardship is not just a researcher’s responsibility, but it also needed at institutional, organizational, national, and disciplinary levels. (p. 10)

Making things happen and getting things done.

The LC provides a roadmap for action and results in its summary report from its 2014 Digital Infrastructure Summit. I am focusing on RDM-related activities and policy in this post since they are both so important to me, so I do encourage you to read the entire document yourselves to see the entire action plan.

The LC’s ways forward for RDM and policy include:

  • Maintain the Leadership Council and analyze its organizational structure (p. 17-18)
    • A steering committee is required and the LC has done a good job this far. That said, there are clamours and a need for greater representation. Consider increasing membership, developing an executive committee, form working groups, and establishing a Charter and Secretariat
  • Engage government (p. 19)
    • The LC had developed a strong community-driven response to RDM challenges. That said, push government – again – for improved coordination of policy and funding
  • Establish a national RDM network (p. 20)
    • Working on CARL and CRKN’s leadership and experience in this area, establish a network focusing on services, tools and tech, and education
  • Create an RDM pilot (p. 21)
    • Develop pilot discipline-based RDM programmes in three domains: astronomy, social sciences, and medical genomics
  • Coordinate with CRKN’s Integrated Digital Scholarship Ecosystem (ISDE) (p. 22)
    • Engage with this initiative that will enable next-gen library collaboration for seamless access, and improved infrastructure
  • Develop an RDM metrics pilot (p. 22-23)
    • For assessment, understanding performance

If you have made it this far in the post, then I offer you my congratulations. There is a lot of information to synthesize, but it is vital that academic librarians in Canada understand what is on the horizon for our profession, and what role will be expected of us. As this post shows, the work that follows – the opportunity we can take hold of – is as much resource-related and people-related as it is tech-related. To discuss digital infrastructure is to discuss the people who make it happen. Research Data Management doesn’t happen on its own. RDM requires careful planning, policy interpretation, technical capacity, and a thorough understanding of resource management.  

And yes, this is an opportunity for us. But we must be ready for what is to come. RDM will soon become the coordinated response to big data in Canada as it is elsewhere in the developed world, and it will mean work. But this is our work. It is our field. Take heed, take note, ask questions, and get set. Plan for this, and get set to play a leading role, because things are going to get busy.

tl;dr : read this now.  apply it to your work.

 

One does not simply walk into an RDC

It’s that time of year when more and more students are asking about accessing datasets for their research through our local Research Data Centre. And a couple times now, I’ve found myself having to explain that one does not simply walk into an RDC

One Does Not Simply Walk Into an RDC
One Does Not Simply Walk Into an RDC

Reflecting on 2012

Porter Airlines Boarding Passes2012 has come and gone, and it’s been quite a year.  If you’ve been following along on this blog or elsewhere, then you probably know that my theme for these past twelve months has been “Planes, Trains, and Automobiles.” Since starting a term position as Government Information Librarian at Wilfrid Laurier University, I split my time between Halifax, Nova Scotia, and Waterloo, Ontario. So, not only do the students at the Library’s Second Cup know my name and face, but so do some of the stewards and other professionals at Porter Airlines in Toronto. I’m now part of the jet-set, and I can also rhyme off CANSIM tables to you like nobody’s business.

Taking on a new position in a new city (and new province) means that there has been a lot of learning and adjustment. A new job brings new duties and new work cultures.  And a new city means new roads and neighbourhoods, new cafés and pubs, and new local cultures.  I’ve traded in a Maritime hospitality built on lobster, rum, and sea shanties for Kitchener-Waterloo’s beer, schnitzel, and breads. (and I love bread.  Not kidding). Waterloo has pockets of cool, and I’m getting on quite well here.

I love my job. It has met – and exceeded – my expectations. As the Government Information Librarian, I help the university community access and use government-produced materials in their research. All of last spring’s cuts to the federal government, and especially to Statistics Canada, LAC, and to libraries within federal ministries definitely dampened the spirits of Canadian GovDoc librarians in 2012, but I’m still happy that I’ve been able to help my library’s patrons understand what the cuts mean for them and their research – today and in the future. If anything, these cutbacks have increased the need for local government publications expertise at Canadian universities, and I think the government information librarian’s role on campus is now more important than ever.

My favourite part of this position has been my work with statistics and data. Like many university libraries across Canada, responsibility for socio-economic data at the Laurier Library lies largely with the Government Information Librarian since so many of our statistical resources come from Statistics Canada.  (You can read more about the relationship between StatCan and academic libraries here. This paper by Wendy Watkins and Ernie Boyko should be required reading at library schools in Canada). I’ve long wanted to practice in this field, and I saw this posting as my opportunity to work regularly with the data skills I’ve developed through the years, and to learn even more from a whole new group of data librarians. Nearly all my favourite interactions with faculty, students, and other stakeholders in 2012 are data-related, from helping students acquire data on migration to the far north, to meeting with community members and legislators to explore nation-wide open data initiatives. These are the moments where I see my skills and expertise in librarianship put to action, and the positive contribution I make on campus puts a spring in my step. Data librarianship is an essential part of the academic enterprise; I’ve given a lot of effort in this area, worked and learned from the right people, and made gains for the library and the university. So, I’m willing to smile and say “yeah, I did that, but with the help of my friends, too.”

Scholars Portal HomeWhen it comes to adjustments, I have to say that the thing that took the longest to get used to was the new jurisdiction. I say this to all librarians, young and old, green and experienced: you will never really know how important your consortium is to your daily work until you join a new one. When I moved from Nova Scotia to Ontario, I left the Council of Atlantic University Libraries, ASIN, and NovaNet, and I joined forces with the Ontario Council of University Libraries, Scholars Portal, and TUG.  Now, my online resources are different. The OPAC is different. ILL is different. Committees are different. Organizational cultures and funding are different. Conferences and workshops are different. Support channels are different. Let me be clear: everything changes when your work takes you to a new consortium. Libraries really do things better when they work together. We’re stronger this way. But it’s not until you shift to a new jurisdiction that you’ll be reminded several times daily just how much effort colleagues at your library and at other institutions have put into making things work better, faster, and cheaper for everyone. We stand on the shoulders of giants.

The best example I can give to demonstrate this is <odesi>. Built and managed by Scholars Portal, ODESI is an essential part of socio-economic data discovery at Ontario universities. It is a repository of StatCan DLI-restricted surveys, and it also houses extensive polling data that stretches back decades in some cases. Using the Nesstar data dissemination platform, it helps novice and experience users find information from these surveys and polls, right down to the variable, and it also helps new users perform some statistical functions they may not otherwise have the knowledge to do. ODESI is a vital part of my work and I use it to access survey data almost daily during the school term. But prior to taking this position last winter, I had no access to it since most university libraries in Nova Scotia rely on the Equinox data delivery system out of Western Libraries. Moving to a new jurisdiction meant that not only did my committees and consortial colleagues change, but so too did my tools and resources, and I had to learn how to use new ones – fast. Today, I don’t know how I ever got on without ODESI. But last winter, ODESI was completely new to me because I hadn’t ever worked at an OCUL university. I have great colleagues at Laurier, and they gave me time to get to know this vital tool, but until I moved to Ontario and joined a new consortium, this was a foreign resource.

(For what it’s worth, ODESI, and the people behind it at Scholars Portal have done so much heavy lifting for students and faculty at Ontario university libraries, and I’m grateful I can use this resource and learn on their expertise. I’m also grateful that I can lean on province-wide and regional data committees for help and advice. This is a big shout-out and thanks to some great people out there – you know who you are.)

This is where the post peters out into vague resolutions and outlooks for the new year.  How will 2013 differ from 2012?  Well, I hope to not fly so much (the lustre wears off quickly), and I hope to get involved in more professional activities again. I also plan on finding new ways to up my game at work.  This will involve taking some courses and hopefully using more streaming communications tools to meet with students and faculty. We’ll see where it goes. Happy 2013!.

Food for thought: important links on StatCan and longitudinal surveys in Canada

This week, Statistics Canada publicly announced the cancellation of the longitudinal portion of the Survey of Labour Income Dynamics (SLID).  If you look at this issue of The Daily, in the very last paragraph under the Note To Readers, you’ll find text that reads:

This is the last release of longitudinal data from the Survey of Labour and Income Dynamics. Effective with next year’s release of 2011 data, only cross-sectional estimates will be available. [source] [PDF]

There has been a flurry of comments in various corners of the Internet about this cancellation. Some people see this as an outright cost-cutting measure, while others consider it in terms of a cost-benefit analysis, e.g., where should StatCan put its limited resources, staff, and funds? I have my opinions – it’s not good a idea to let this whither on the vine – but I’ll leave it up to you to decide how to to consider this action.

I will, however, draw your attention to two posts by Canadian academics who know a thing or two about socio-economic data and the mechanics of longitudinal surveys:

Miles Corak writes a nice eulogy for the SLID, but his main point lies in the constraints that StatCan‘s longitudinal surveys face. long-term funding of such surveys are not always clear since they are administered by a creature of government:

At Statistics Canada funding is annual, subject to the trade-offs in managing a whole portfolio of statistical products. It is also dependent upon financial support and direction from particular government departments whose interests and priorities ebb and flow, and are tied to broader government objectives.

. . .

In a recent interview the current Chief Statistician of Australia, Brian Pink, made a revealing and important comment: “Neither the Treasurer nor Prime Minister can tell me how to go about my business. They can tell me what information to collect, but they can’t tell me how to do it, when to do it or how often to do it.”

But it is telling that the Australian longitudinal labour market survey—The Household, Income and Labour Dynamics in Australia Survey—which was started in 2001 and has guaranteed funding for 12 years, is not being run by the Australian Bureau of Statistics but rather by an institute at the University of Melbourne.

The current Chief Statistician of Canada is in a more challenging position. He also has the responsibility to manage surveys that form no part of Mr. Pink’s mandate, surveys whose value is in the long-term, much longer than a fiscal year, and even longer than an electoral cycle.

As Canadians embark on another experiment in longitudinal survey taking they should have confidence that Statistics Canada will design and manage the technical details in an efficient, effective, and indeed innovative way; but past experience, both here and abroad, may also make them wonder if the managerial structure and financial responsibility is designed to match the long-term horizon these data require. [source]

Corak, I think, is asking us to consider if it’s time for other agencies to administer longitudinal surveys in Canada (at the very least, he’s making the observation that things are done differently in other countries).  Blayne Haggart puts it in very plain terms:

[Corak] argues that the real problem may be that, as a government agency with a one-year budget horizon subject to political whims, Statistics Canada isn’t the best placed agency to handle projects with time horizons that stretch beyond electoral cycles into decades. This means that even throwing the bums out wouldn’t solve the underlying problem. New boss, meet old boss and all that. [source]

As for me, it’s too early to decide. I’m not sure what yet to think.  I definitely have dogs in this race, but I’m not yet in a position to agree or disagree worth Corak and Haggart. My preference is to have well-funded government statistical agencies who collect and disseminate socio-economic data, and to have well-funded government knowledge centres (e.g., LAC) that can improve the preservation of and access to government information. But on the question of longitudinal studies, perhaps Corak and Haggart’s opinions have enough merit for us to have a discussion on whether Canadian not-for-profits and university research centres should make a big step forward and take a decisive lead in the future.