Plenary 1: Data from national surveys: access, analysis and sharing (Room: James Bay)
Speaker: Dr. Anthony C. Masi, Provost, McGill University
Speaker: Dr Denise
Lievesley, National Health Service
Central to the National Programme for
IT is the establishment of a patient care record whereby every interaction
a person has with the health system in England is entered into that
individual's record. The pooling of these records creates a rich data
resource which could provide remarkable research opportunities. There is
also significant information potential from integrating data from other
sources (administrative and survey) with the data from these patient care
records. However these developments also raise significant IT
and information governance issues especially relating to confidentiality
and the use of sensitive material. Denise will talk about how
the aim of maximising the use of these data could be achieved whilst
Plenary 3: In Pursuit of Statistical Literacy: Two National Examples (Room: James Bay)
Speakers: Reija Helenius, Head of Development
National statistical agencies produce a wealth of data and information which may or may not be understood by their intended audiences. Without statistical literacy, data will remain a closed box. This session will explore the activities taken by two national agencies, Statistics Finland and Statistics Canada and outline the programs, progress and challenges experienced by each.
A1: Self Archiving or Self-Storage: Which is it to be? (Room: Lachine)
Session Chair: Sharon Neary, University of Calgary
StORe Wars: May the Source and its Outputs be with
Abstract: The area of interaction between output (research publication) repositories and source (primary research data) repositories was the principal focus of the StORe project. The main aim of the project was to identify options for increasing the value of using both source and output repositories by improving the linkages between them, thereby increasing the potential from significantly enhanced information access and dissemination. A key deliverable from the StORe Project is the pilot demonstrator. It consists of a set of middleware designed to demonstrate the function of bi-directional links between source and output repositories. This middleware was developed to meet the specific needs of the social science e-research community, but is based on the underlying general requirements as defined from the StORe survey of the behaviours of researchers within seven scientific disciplines represented by the project. The pilot demonstrates the implementation of enhanced functionality within a test environment and the potential for a generic solution across the UK’s broader e-research community. A short time hence, in a hyperspace not so far away, researchers will be able to instantly move from any publication to the data on which its findings were based, to instantly be linked to all publications that have resulted from a particular dataset and to move seamlessly around the research environment of data and its outputs.
Abstract: On the 1st of February, DANS launched its new on-line archiving system: EASY. What distinguishes this system from other archiving systems is the possibility for researchers to deposit datasets themselves, to customize the presentation of the dataset in the system and to control who will get access to the datasets. This presentation will present the results of the systems usage evaluation which has taken place in June of 2006, from which we can draw conclusions on the feasibility of having researchers depositing and archiving data themselves. Moreover, it will go into the numerous ways in which this web-application can be modified and customized without having to change the original programs source files, making it relatively simple to implement for organizations with very specific archiving system requirements
Bargain bookmarks and priceless
tags: Socially organizing social data.
Abstract: The "openness" of data can result in confusion. There are so many data resources and they change every day! We try to keep up by bookmarking, but traditional bookmarks are private and immobile. We share data resources on listservs, but listservs impose a certain linearity that fails to teach us about what data resources really matter and to whom. Social bookmarking tools, like Del.icio.us and Furl, provide a way to liberate your bookmarks from your computer, allowing you and others to access them. Social bookmarking tools extend the possibilities of sharing resources through "tagging,” allowing users to organize resources and contribute to a system that reveals how exactly users think about the resources at hand. In this session, hear about current experiments with social bookmarks and tags and find out about how IASSIST can use these technologies to create a more interactive community of data providers, users, and seekers.
A2: Open Data and the Common Good: Technology Solutions for Difficult Challenges (Room: Saint Lawrence)
Moderator: Ernie Boyko, Statistics Canada, Retired
Data confidentiality and the Common Good.
Abstract: This talk will focus on the issues faced in making sensitive data available to the researcher community. It will emphasize the importance to respect the respondents' privacy and examine how proper data archiving and statistical disclosure techniques can be used to address the issues of accessibility.
The Open Data Environment.
Abstract: This talk will focus on metadata and open data environments. It will highlight the value of metadata and outline the challenges of raising support and investment for metadata management frameworks. It will also relate the importance of high quality metadata in the context of the international research communities and the need to coordinate efforts in the development of new tools for the social sciences.
Walking the Wire: How Technology Helps Us Achieve the Correct
Balance (Open Data Foundation).
Abstract: This presentation examines how the the adoption of relevant metadata specifications, the use of the open tools and standard techniques, and respect for statistical principles and methodologies can be at the foundation of an open framework for data management that answers the needs of the various communities. It summarizes the panel and provides a vision of standard technology being created within a framework which recognizes the complexities of the issues surrounding Open Data, and provides a new definition of the problem which takes into account the realities of data confidentiality within an Open Data community.
(Room: Mont Royal)
Session Chair: Bo Wandschneider, University of Guelph
Appraisal and Selection of Scientific Data for the Long-Term
Archive: A Case Study.
Abstract: A long-term archive offers the potential to provide future knowledge communities with capabilities to discover, access, and use scientific data and research-related information. Given the potential costs and commitments to preserve digital information, those responsible for long-term preservation of data can develop resources for the appraisal and selection of data being considered for accession into the long-term archive. A case study describes developments enabling the appraisal and selection of data for accession into a long-term archive including decision-maker roles and responsibilities, the process for nominating data as candidates for accession, criteria for the appraisal of candidate data, choices for levels of preservation and dissemination services to be designated for approved data, and the process to facilitate the appraisal and selection of data for accession into the long-term archive.
The 2004 Canadian National Consultation on Access to Scientific
Research Data (NCASRD): recommendations and implementation of a national
strategy on data access.
Abstract: In mid-June 2004, an expert task Force, appointed by the National Research Council Canada (NRC) and chaired by Dr. David Strong, came together in Ottawa to plan a National Forum as the focus of the National Consultation on Access to Scientific Research Data. The Forum brought together more than seventy leaders Canada-wide in research, data management, administration, intellectual property and other pertinent areas. This presentation will be a comprehensive review of the issues, the opportunities and the challenges identified during the Forum. Complex and rich arrays of scientific databases are changing how research is conducted, speeding the discovery and creation of new concepts. Increased access will accelerate even more these changes, creating a whole new world. With the combination of databases within and between disciplines and countries, fundamental leaps in knowledge will occur that will transform our understanding of life, the world and the universe. The Canadian research community is concerned by the need to take swift action to adapt to the substantial changes required by the scientific enterprise and since no national data preservation organization exists, it is felt that a national strategy on data access or policies needs to be developed. It is also recommended that a Task Force be created to prepare a full national implementation strategy. Once such a national strategy is broadly supported, it is proposed that a dedicated national infrastructure, tentatively called Data Canada, be established, to assume overall leadership in the development and execution of a strategic plan.
Threats to Open Data: Implications for Library Services and
Abstract: The concept of open data offers clear, acknowledged benefits for research and scholarship. However, a number of threats to the success of open data remain, including economic, cultural and technical issues. As key stakeholders in the development of the open data, libraries must address these challenges and their implications for services and collections. Major obstacles include infrastructure challenges; integration of metadata creation into the research process; commodification of data; barriers in academe and publishing to sharing research; and legislative constraints. This presentation will examine each of these threats and how libraries can anticipate and prepare for the emerging importance of open data.
Session Chair: Meredith Krug, Federal Reserve Bank
A Digital Preservation Response to Technological
Abstract: Digital curation ensures that digital content will remain accessible and meaningful to users over time. To achieve this objective, digital preservation strategies must continually evolve as information technology evolves. Responding effectively to new technologies requires the development of new skills, knowledge, tools, and perspectives. The digital preservation community has largely lacked access to information about technology developments presented in understandable and meaningful ways. Developing the means for the digital preservation community to continually respond to evolving technology would establish the educational and informational foundation for sustained digital preservation research and development. This paper considers the needs of the digital preservation community in responding to technology, the scope of interest in technology developments for digital preservation, the means for prioritizing responses to technological change as it occurs based on digital preservation requirements, and the range of possible responses to technology by the digital preservation community.
Archiving Multi-media and Web-based Data: Issues of
Representation and Sustainability.
Abstract: Increasingly the UK Data Archive is seeing resources and potential deposits being created that move beyond raw data. For social scientists this is a relatively new phenomena, although the arts and humanities have been creating such resources for some time. Researchers and teachers are creating value-added resources to accompany data that are not just hard publications. They include web sites, interactive front-ends to data and user guides and opportunities to comment on or annotate data. This provides a new challenge for us as social science data archivists. Recent work on the representation of social science research utilising hypermedia and research participants input is highlighting some innovative ways of publishing and disseminating research outputs. But what should be archived and how? This paper will address some of the key issues that archives might consider.
Renewal of the 1956 Institute Website and Connecting it to the
National Digital Database (NDD).
Abstract: The 1956 Institute has renewed its website. A new design was made and implemented using the Oracle Portal (www.rev.hu). The following questions were raised at the demonstration. Further possibilities: Connection of the various language versions of the pages and moves between them (from the editorial and users point of view); Search facilities (setting of data sources) with the aid of Oracle Ultrasearch; The hardware and software environment for developing and changing over to the new system. The 1956 Institute joined the NDD in 2006, so that its photo documentary database will now become accessible also through the NDD Internet search program (kereso.nda.hu). The data link will be made over XML data paths as recommended by Dublin Core. Further plans: Connection of further data sources to the NDD (digital content developments, chronology, etc.)
B2: Quantitative Literacy: Assessing Needs, Developing Tools and Delivering the Goods. (Room: James Bay)
Session Chair: Wendy Watkins, Carleton University
Capacity Building for Quantitative Methods in
Abstract: The Economic and Social Research Council and the Scottish Funding Council have jointly funded a scoping study to identify the capacity building needs for quantitative teaching and research in Scotland. The findings of the study, along with several other studies and pilots across the UK, will be used to develop a strategy to improve the supply of quantitatively trained social science researchers. The study is being undertaken by a multi-disciplinary team at the University of Edinburgh and involves a survey of both social science lecturers and researchers, and managers of information services within Scottish Universities, to identify barriers and needs. The scoping study is being conducted against a backdrop of longstanding and widespread concern about a UK-wide deficit in quantitative skills amongst social scientists. The funders recognise the need to develop quantitative skills amongst social scientists; and that this needs to take place during the earliest stages of career development.
Numeracy and Quantitative Reasoning Initiative at the
University of Guelph.
Abstract: Numeracy has been identified as a long standing learning objective for the University of Guelph from 1987 through to 2006. A successful application to the Learning Enhancement Fund program at the University of Guelph has lead to the creation of a multidisciplinary group comprised of individuals from Teaching Support Services, the Mathematics and Statistics department, the Data Resource Centre, the Library, the Learning Commons and Computing and Communications Service to spearhead the initiative. The group will be responsible for the development of a repository to collect and disseminate learning objects that will build new opportunities for students to improve their numeracy and quantitative reasoning skills. This paper will outline the purpose of the initiative and will share development strategies and progress to date.
Incorporating Statistical Competencies into University-Level
Information Literacy Programs in the Social Sciences.
B3: Care and Maintenance of a Global Knowledge Community (Room: Saint Lawrence)
Moderator: San Cannon, Federal Reserve Board
Panelists: Mary McGrath (Bank of Canada),
Abstract: A panel whose experience covers a
range from fledgling networks to well established communities will discuss
the processes and problems associated with the following:
Moderator: Julia Lane,
National Opinion Research Center
The role of the International Household Survey Network and the
Accelerated Data Program.
The IHSN Microdata Management Toolkit: 2007
Country experiences in setting up
a national data archive.
Abstract: Establishing a data archive is a challenging task for any organization. When it comes to developing countries, the fundamental issues are the same as everywhere else but additional problems emerge to make the task more daunting. For this presentation, a representative from a national statistical agency will share experiences in setting up a national data archive using the DDI and leveraging on best practices and tools like the IHSN Microdata Management Toolkit
C1: Considering the data management plan: More than Window Dressing? (Room: Saint Lawrence)
Session Chair: Gretchen Gano, New York University Library
Over the past year, discussions continued about the merits of open data
and about requiring data sharing plans in grant proposals for publicly
funded research. Many are skeptical that such requirements will
significantly impact the competition for research funds without new models
or incentives. However, regardless of how firm policies or incentives
encouraging data sharing may become in the future, researchers may turn to
data managers with questions about how to strategize, present, and execute
a data management plan.
Preparing public use data files
Abstract: Cleaning and preparing data files for public release are tasks that many researchers and survey professionals feel are unnecessary in the era of computer assisted and web based interviewing. The addition of automation, in principle, should have resulted in a dataset that is ready immediately upon data collection. In practice, however, the automation necessary to collect data does not always produce data files that are in fact designed to be used by the uninitiated (ergo "public") user. In this part of the session, I will discuss how to approach data set cleaning and documentation from a user's perspective.
Approaches to data dissemination and preservation
Abstract: This presentation describes approaches to dissemination and preservation, including dissemination and preservation through major data-archives, self-archiving, preservation strategies and agreements, DATA-Pass, and related technologies.
Formalising data management plans for large scale
Abstract: This presentation describes work undertaken by UKDA to formalise data management and archiving for a cross-Research Council research programme. The funding of a dedicated data support service for the multimillion dollar, interdisciplinary Rural Economy and Land Use Progamme, has enabled the creation of a formal Data Management Policy and Plan. The concise plan must completed by PIs and signed off by the UKDA before programme contracts are issued. A glossy brochure on Guidance on Best Practice in Data Management, primarily authored by the late Alasdair Crockett, has also been produced for the programme.
C2: DDI in Canada – Where are we at? (Room: James Bay)
Moderator: Michel Seguin, Data Liberation Initiative, Statistics Canada
An Update on DDI Working Groups
at Statistics Canada
Ontario Universities moving forward with DDI.
Abstract: The art of creating dataset codebooks using the Data Documentation Initiative (DDI) in Canada has been slowly gaining ground. The University of Guelph has been marking up codebooks for the past 5 years and is now involved in a collaborative effort with partner Universities in Ontario and Statistics Canada to determine a core set of tags and best practices procedure of completing the tags. This paper will discuss why some Ontario universities have chosen to move away from their home grown systems and implement DDI compliant systems and how working with Statistics Canada will benefit the Canadian data community.
Capitalising on metadata: tool development
Abstract: The Research Data Centre Network recently received funding to develop metadata tools to facilitate research using the Statistics Canada confidential data in these Centres. The metadata and the tools to exploit the metadata were identified specifically to solve three related problems for researchers. This presentation will discuss the solutions to these problems from the life cycle perspective of conducting research in Canada's Research Data Centres.
C3: New Discovery Tools: Thinking Outside the Catalogue (Room: Lachine)
Session Chair: Anna Bombak, University of Alberta
Searching for Data: Powered by Google.
Abstract: Even for the experienced person, finding data and statistics can be a daunting, if not very frustrating task. For the novice, finding the right data can be virtually impossible. The typical approach of googling it has about as much chance of finding the right information as winning the lottery. As powerful as the Google search engine is, it will only bring up the most popular sites, not necessarily the best sites. Not to mention that most novices cannot always differentiate between a reliable source and one that is questionable. Google Custom Search Engines (CSE) can combine the expertise of the data librarian with the power of Google searching. By restricting or boosting the search to specific sites or pages, the results returned are fine-tuned toward data and statistics from reliable sources. This presentation will demonstrate the difference between a standard Google search and a CSE, how to set up a CSE, and highlight some of the more useful features and some of its drawbacks.
Snippets of Data at a Glance: Using RSS to deliver
Abstract: For most researchers, more data are always better. But for some number watchers, especially in the economic and financial realm, an observation or two is all that is needed but it is needed the moment it is available. How can data providers serve these clients as well as those who want every observation even if it isn’t readily available? A group of central banks (US, Canada and Mexico) as well as some international institutions (the European Central Bank and the Bank for International Settlements) are developing a specification for RSS that will allow such immediate delivery as well as adding extra metadata for those users who are compiling a more complete dataset. This paper will discuss the origins and specifications of RSS-CB1.0 as a data transmission mechanism and explore the success of the current pilot implementations.
Multilingual Web Services - Possibilities and
Abstract: Experiences and ideas from the Finnish Social Science Data Archive which provides web services on three languages but with somewhat different goals for each. The presentation focuses on what to take into account when planning multilingual services, and what kind of pitfalls or good practices there are. Should the existing web site and services just be reproduced in another language, with the same content? If not, then what? It is only after the goals, potential service users and available resources have been outlined that we have a basis from which to plan the web design. The presentation includes advice on some aspects of web design, and an introduction on how to use the DDI to facilitate multilingual data documentation.
University Information System
RUSSIA: Bilingual (Russian - English) Search Tools to Integrate Data and
Abstract: University Information System RUSSIA (uisrussia.msu.ru) is an electronic library serving RF social research and education. The system has been in operation since 2000. Currently up to 3 million. documents from 60+ holdings are in the system. The UIS RUSSIA is maintained as an integrated resource with content-based searching across collections. The technology for automatic linguistic text processing (ALTP, special software-lingware-knowledgeware complex) was designed, developed and implemented within the framework of the project. The technology is customized to process all main types of business prose documents. Documents in English may be also processed and cross-searched, exploiting bilingual (Russian-English) search tools. A short summary in Russian accompanies a document in English. Journal of Economic Literature (JEL)-based searching is implemented and is currently being tested on the full-text SocioNet module which maintains publications on economic and social sciences in English available via links in the Research Papers in Economics (RePEc; www.repec.edu) electronic library; the collection covers 30000+ articles. In 2007, the UIS RUSSIA team plans to integrate foreign universities' and think-tank publications in English. Search in English across UIS RUSSIA holdings (in Russian) is completed.
C4:Data Services mash-ups: Maps, Research and Everything! (Room: Mont Royal)
Session Chair: Richard Boily, Université du Québec à Rimouski
Business Data and Challenges for Reference and Collection
Abstract: With the demand for business data rising exponentially, business librarians face different issues and challenges than data librarians. While data librarians already face extraordinary issues and challenges, business librarians’ situations are even more so such that data librarians often shy away from dealing with business data-related issues. This paper assesses the current situation in selected Canadian libraries including trends in the demand for business data and what it means to librarians working with business schools as their programs gain popularity. It emphasizes the unique nature of business data used in traditional scholarly research as well as in professional fields which are unique from a data service point of view. It includes a brief demonstration of the most popular business data sources and their use. The paper explores the detachment between traditional data library services and business data services, examines the implications of this disconnect for reference and collection development, and discusses viable options for bridging the divide.
Maps that mash: daring, dangerous, or
Creating historical digital census boundary maps for
Canada - a pilot project.
Abstract: While vector-based boundary files are available from Statistics Canada for census boundaries for most levels of geography from the 1986 census through 2006, almost no comparable files are available prior to 1986. Maturing GIS software, meanwhile, has made aggregate statistics in computer-readable form from 1961 and on more widely used for research than ever before. There is clearly demand for digital boundary files to allow the spatial display and analysis of pre-1986 census aggregate statistics. The objectives of this pilot project were to (1) Assemble the existing resources for creating digital enumeration: boundary files for 1976 and 1981 census geography (2) Devise and test methodologies of re-creating historical EA data based on currently available ancillary digital and analog spatial datasets and supportive Statistics Canada materials. The ultimate objective, to (3) Evaluate the feasibility of creating a national coverage of historic census maps.
Session Chair: Vince Gray, University of Western Ontario
Project to set up service for secure remote access to files
from data file linkages.
Abstract: With a view to promoting research while
ensuring privacy, the Institut de la statistique du Québec, the Québec
government's official statistics agency, has undertaken, in partnership
with researchers and other government agencies, to set up a service that
will provide researchers with secure remote access to microdata files
resulting from the linkage of various administrative or survey data files
already available. This new service features a number of advantages for
researchers, privacy authorities and the agencies holding administrative
data files: easier and faster information searching using a
web-accessible integrated data dictionary; a standardised procedure for
authorising and obtaining data files that is recognised by all interested
parties; linkage of administrative data files that do not have the same
identification number using proven statistical methodologies; masking of
data files from the linkage to prevent spontaneous recognition; remote
access by researchers using a highly secure mechanism, which ensures
Opening up access to birth cohort study data: A UK Medical
Research Council pilot project.
Abstract: The UK Medical Research Council (MRC) policy on data sharing recognises the value of making scientific data (eg. population and clinical trials) more widely available across the research community. If not creating true open data resources, the MRC's policy involves opening up access to previously (relatively) inaccessible life-course data. To this end, the MRC is currently running a pilot project designed to test the feasibility of preparing and making data more widely available from the accomplished 1946 British Birth Cohort Study, and the National Survey of Health and Development (NSHD). Using Nesstar as the data sharing tool, the pilot aims to evaluate the costs and benefits of opening up access to the NSHD data and to document lessons learned that may inform similar activities in the bio-medical area undertaken in the future. Special attention is given to the particularities of opening up access to a long-running longitudinal study which, due to the age of the study, has traditionally been poorly-documented, and to balancing likely increased scientific outputs with risk of disclosure. Among the key scientific outputs that the project is interested in examining is the feasibility of researchers sharing new variables derived from the study.
Medical research and data sharing - how open can we
Abstract: Scotland intends to introduce an Electronic Health Record in the near future to provide an open data system throughout the health service to improve access to a current record and prevent the multiple-blood-test-syndrome due to lack of communication between health care providers. This vision, however, is accompanied by various legal and ethical problems. One such problem is the question of access. While the use of open data within the health service offers considerable advantages, issues such as confidentiality need to be taken into account. Many patients still expect the traditional model of care with doctor-patient confidentiality, however, this is hard to equate with the reality of extended health care teams where the need for data sharing and open access to patient data has grown considerably. This paper will provide an overview of the problems of how open the data sharing can, be using several scenarios to provide examples and highlight some of the options being considered.
D2: Presenting the New DDI 3.00: What Can it Do for You? (Room: Saint Lawrence)
Moderator: Mary Vardigan, Director, DDI Alliance
Panel: Wendy Thomas (Minnesota Population Center),
This session is sponsored by the Structural Reform Group of the DDI Alliance, and its purpose is to present DDI 3.0 to the IASSIST community, which has nurtured its development. Over the past year DDI 3.0 has gone through extensive internal and public reviews and is now being prepared for publication, pending approval by the DDI Alliance Expert Committee following IASSIST. While we've been promising wonderful things for you -- the data archivist, producer, user, and programmer -- here is where it all comes together. This session provides an overview and real examples of DDI 3.0 and what it can do. We haven't lost any functionality; rather, we've improved what we had before by making it more machine-actionable and flexible, and we've added new features that will enhance its utility and expand its coverage. Come and find out what the new DDI 3.0 has for you.
D3: Data Access Questions: Open and Shut (Room: Mont Royal)
Session Chair: Maxine Tedesco,
University of Lethbridge
Licensed to Distil - Data of Course.
Abstract: Archives have fought many battles establishing their presence in the research world; they are welcomed by academics and policy makers in their quest to ask new questions of existing data; yet tread on uncertain ground for data creators and government departments, where concerns of disclosure and anxiety associated with statistical comparisons can loom high. The UK Data Archive (UKDA) celebrates its 40th Anniversary in 2007 four decades of acquiring and sharing key social and economic data. This paper will outline its position within the UK research arena discussing how, by establishing a tiered licensing model, the UKDA has ensured that the concerns of the depositors, data creators and the needs of its user communities are sated. Two major issues will be discussed: the need for establishing networks and relationships of trust and symbiosis; and building and nurturing expertise and support required to establish a workable legal framework under which data curation and sharing can operate.
When Data Aren’t Open: Restricted-Use Data: Trials,
Tribulations and Triumphs.
Abstract: The Population Research Institute is well-known inside Penn State (and increasingly outside the university) as the place to go for access to restricted-use data and for help with developing applications for obtaining contractual data. The purpose of this paper is to share our experiences and perspectives in helping researchers obtain, house, and maintain access to restricted-use data. We hope that sharing some of our trials, tribulations and triumphs will help inform those in the data community who are considering offering restricted-data support. In six years with PRI as Data Archivist, Jennifer Darragh helped the number of contracts grow from four in 2000, to sixteen by 2006. In just three short months, PRI's new Data Archivist, Kiet Bang, has had to learn how to manage these contracts (additions, removals, renewals, custodial transfers) and coordinate with other essential PRI staff to keep restricted data projects running smoothly.
Creative Commons and Data Dissemination at an Academic Data
Center: Issues and Potential Benefits.
Abstract: With multiple objectives, the Creative Commons (CC) licensing movement is making inroads into the academic community. In parallel, efforts are under way to investigate the characteristics and potential for creating a science commons based on open access to data and advanced information technology. CIESIN, the Center for International Earth Science Information Network, through its NASA-funded Socioeconomic Data and Applications Center (SEDAC), is experimenting with incorporating creative commons licensing into its publicly-available data and information products. This paper will describe the issues CIESIN has encountered surrounding implementing CC licensing as part of a data center dissemination strategy. Particular focus will be on the challenges of using a creative commons approach within an academic context and on the potential benefits for end-users.
Accessing Eurostat Data.
E1: Government Data in Legacy Formats: Approaches in Ensuring Access and Preservation (Room: Lachine)
Session Chair: Tess Trost, Texas Tech University
Ensuring Long-Term Access to Government Documents Through
Abstract: A partnership has been formed between Indiana University Bloomington and the U.S. Government Printing Office (GPO) to make the documents distributed to federal depository libraries on floppy disks available on the Internet. In a pilot study for this project, the CIC floppy disk project (FDP), I have developed an emulation infrastructure to enable access to these documents. I will discuss the technical issues addressed and the results of this pilot study. In addition, I will discuss the immediate issues of providing network access to the digital archive of the GPO CD-ROMs. For example, using many of these documents requires the ability to \"mount\" and \"execute\" a (networked) CD-ROM image.
Migrating Government Information from CD-ROMs: Scaling a Pilot
Abstract: Government information originally distributed on CD-ROMs is now posing challenges for long-term access. In 2006, librarians at Yale University Library’s Government Documents & Information Center undertook a pilot project migrating sample CD content to a server. The librarians normalized file formats, produced multiple forms of metadata for the CD-ROMs, and documented the methodology, workflow, and costs of this work.
Virtual Machines in the Data Lab
Abstract: This presentation will describe methods developed at U.C. Berkeley Library to allow an end-user to access a cd-rom image repository from a networked workstation and install applications in a controlled \"virtual machine\" environment. This approach provides an immediate solution to most of the problems associated with legacy software installation under a modern operating system.
E2: The CESSDA Experience: a Royal Mountain Road to Success (Room: Mont Royal)
Session Chair: Hans Jørgen Marker, Danish Data Archive
Strengthening the Infrastructure - CESSDA Incorporated (Sub
Title: You Got to Have Friends)
Abstract: The ambitious plans for the CESSDA portal require some essential strengthening of the existing infrastructure. A new project within the EU FP7 ESFRI roadmap research infrastructure programme will aim to do just that. Its objectives are to set in place the governance, legal, financial and management ground work and ground rules that are needed before the CESSDA future is possible.
Door of Perception - The CESSDA Portal (Sub Title: Break on
through to the Other Side)
Abstract: This paper will provide an overview of the new CESSDA portal. Since the 1970s a major aim of the Council of Social Science Data Archives (CESSDA) has been to improve researchers' and students' access to data. The long-awaited new portal is an important step towards that goal, providing a gateway to data materials and information about CESSDA. It enables users to locate datasets, questions and variables in datasets from all over Europe. This presentation will also describe the challenges of developing a website for multiple distributed organizations from varying cultures.
The Essence of the Net - CESSDA future (You Ain't Seen Nothing
E3: Strength in Numbers: Building Collaborative Services for Users (Room: Saint Lawrence)
Session Chair: Diane Geraci, Harvard University
"The data is plentiful and easily available" -- H. A. Gleason,
Jr. Cross-Pollenization of Collections, Skills, and Service Philosophies
among Data Archives and Libraries
Abstract: Researchers, faculty, graduate and even undergraduates are working with statistical materials within many disciplines. This creates a need for establishing collaborative partnerships among data archives and campus libraries, especially those serving social science and humanities. We will demonstrate our strategy for creating a data and statistical service for a research university environment which accommodates users with a wide range of statistical knowledge and needs. Discussion will include details on developing a multiple point of service/ access model for data and statistical resources, specifically covering: staff, organizational policy, varying levels of user needs and sophistication, and incorporating information and statistical literacy programs. The paper will discuss tools or methods that will enable campus departments to assess data and statistical resource needs and create a data/statistical services program that will meet immediate and long-term goals.
Please Use our Data
Abstract: At the outset of the Data Liberation Initiative (DLI), there were only 9 Data Centres in Canada with experienced staff. A Training Module was developed in 1997 but it is now outdated. Today, there are over 70 Data Centres in Canada. The staff who manage them have varying job descriptions but the new generation, as well as those who have been there longer, need to be able to find information about DLI quickly for their clients, as well as for themselves. The DLI wants to inform its communities of the content of their holdings and then help them to access the data. And thus, the Compleat DLI Survival Kit was born.This presentation will give the background of the Compleat DLI Survival Kit and look at each of the chapters in some detail. And we will show how this tool will prove beneficial to all Canadian Data Centres and to other users of Statistics Canada data.
The World on a Plate: Making Data Digestible.
Abstract: Intergovernmental data is delivered in many different formats and this is a major barrier to use. Data aggregators, national data services and data librarians all have a role in making data available on [the] web. ESDS International hosts over 30 databases from many different intergovernmental agencies including the World Bank, IMF, UN, IEA and OECD. We deliver the data through a single web interface, and this has proved a very successful strategy in the development of a new user community. In this paper we describe the underlying structure of the international databanks and how the data is converted into Beyond 20/20 format and made available on the web; i.e. we describe our data-loading and quality assurance procedures, and integration with access management systems. We also examine the role data librarians can play in liaising with software developers and data providers to create more intuitive and effective interfaces for our user communities.
E4:Prospects for DDI - What the Evidence and Experience Tell Us (Room: Mont Tremblant)
Session Chair: Ron Nakao, Stanford University
New Frontiers: Can Panel Studies
Go DDI? First Experiences in documenting the German Socio-Economic Panel
Study with DDI 3.0.
Abstract: DDI Version 3.0 provides new features to document more complex studies, like collections of datasets and their relationship between themselves. The presentation will focus on the documentation of the central structure of German Socio-Economic Panel Study (SOEP), especially the linkage between the panel waves and the household/person relationship. The application of DDI 3.0 will be illustrated by selected documentation examples. As DDI 3.0 will be published in the first half of this year, this paper can only be a report on the early attempt to document complex panel studies like SOEP with DDI 3.0. The study we use is the German SOEP, a representative longitudinal study of private households in Germany which currently consists of more than 260 single datasets. It provides information on household level and on personal level over time. The Panel was started in 1984 in West Germany and was expanded as early as June 1990 to include the states of the former German Democratic Republic, even before the reunification. Some of the many topics include household composition, occupational biographies, employment, earnings, health, and satisfaction indicators. The data are available to researchers in Germany and abroad.
Documenting, Maintaining, and Sharing Standard Variables with
DDI Version 3.0: the ISCO Example.
Abstract: Standard variables can be documented and maintained by the new DDI 3.0 in resources separate from an actual study. This approach enables efficient sharing of metadata in one organization or over several organizations. DDI 3.0 resources can form the basis for a variable registry, for persistent sources of metadata. The documentation structure of a variable in DDI 3.0 is related to the metadata registry standard ISO 11179. Using standard variables in DDI resources can facilitate comparison of collection of studies. The ISCO variable is used to illustrate these new possibilities of DDI. DDI Version 3.0 will be published in the first half of this year. The International Standard Classification of Occupation (ISCO) is a standard variable for organizing jobs into a clearly defined set of groups according to the tasks and duties undertaken in the job. The classification has four nested levels of codes representing the categories of occupations. ISCO is used in major studies like ISSP, GSS, ESS, and Eurobarometer. ISCO is developed by the International Labour Organization (ILO).
Whither DDI - Status and Prospects in Canada.
Part 1 provides an introduction to the DDI for researchers and knowledge workers. It describes the current state of the standard, its implementation in software tools, and the adoption of such tools in Canada's universities and governments. It introduces Version 3 and associated issues. Although there are some encouraging signs, the evidence suggests that the DDI is fragile, has not yet achieved much return on investment, and is taking on considerable risk. Part 2 introduces some of the difficulties in implementing standards for statistical metadata. It outlines the essential elements for a successful DDI: a)the standard; provision of metadata and content according to that standard, b) provision of sustainable, industrial-strength metadata management, web server, and 'killer' end-user applications to make it so, c) and close co-ordination among the elements to focus limited resources on the achievement of strategic goals. Part 2 provides an assessment of the status of the DDI for each element.
The paper concludes that investments in the various components that comprise the DDI are asynchronous with the realities of the standard's overall implementation, take-up and payoffs. There appears to be almost no coordination between the elements, and little recognition among those working in each area of the interdependence of their efforts. The paper proposes that a meeting take place between key actors to initiate better coordination processes, and makes interim suggestions for action by each.
F1: Data-PASS: Collaborating to Preserve At-Risk Data. (Panel Discussion) (Room: Saint Lawrence)
Moderator: Amy Pienta, Inter-university Consortium for Political and Social Research (ICPSR), University of Michigan
Panelists: Darrell Donakowski, University of
The Data Preservation Alliance for the Social Sciences (Data-PASS) is a partnership of five major social science archives in the U. S. supported by an award from the Library of Congress. The goal of Data-PASS is to acquire and preserve data at-risk of being lost to the research community, including opinion polls, voting records, large-scale surveys, and other social science studies. This panel will discuss the successes and barriers encountered in the first three years. A particular focus will be given to the relationships we are building with private sector and non-profit data organizations, as well as individual researchers, to preserve their data and make it available to the research community. We will also present our efforts to design a syndicated storage system across the archives.
F2: Data Beyond Numbers: Using Data Creatively for Research (Room: Lachine)
Session Chair: Mary Luebbe, University of British Columbia
The data is out there. Analyzing from electronic tracks of
Abstract: Traditional research data is primary data captured through the deliberate process of data collection, through an instrument built from themes of research, narrowing in on the relevant theories, and concluding with operationalizations of concepts and measurement of the variables that then will constitute the data foundation for the research analysis. Most often the research investigations within the broad area of social science have as their normal research objects the human subjects. The validity of the results obtained through this design is greatly affected by the coloring in obtaining socially desirable answers as well as by an often ignored bias present in low response rates. This presentation will exemplify how the electronification of human behavior supplies an added complete, reliable, and easily accessible resource for analysis both through regular content statements (exemplified by e-mails and blogs) as well as the electronic traces of behavior (exemplified by click-streams of web behavior).
The Importance of Data Visualization in Data
Abstract: Data visualization is an important means by which we analyze information, particularly in the current social climate where visual media are so prevalent. Although some disciplines, such as statistics and computer science, have invested considerable resources in studying and improving data visualization techniques, social scientists have not fully embraced this trend. This paper argues that data visualization is an important aspect of quantitative reasoning and data literacy. It presents Edward Tufte's principles of design and illustrates how they can be applied to empirical research results using an example from criminology. It discusses the skills and software needed to effectively implement these design principles and addresses the technical limitations of applying these techniques to typical social science analyses.
Punishment and Reward for Research Data Sharing.
Abstract: Publicly funded research data deposited into data archives are public goods. Under voluntary contribution, data are under-deposited and under-prepared. Either punishment or reward should be available to motivate researchers to deposit data and spend more effort in data preparation. This paper built a simple mathematic model to analyze the effects of punishment and reward. Hopefully, it could be help policy makers decide the incentive mechanisms for data sharing.
F3: Extending IASSIST through Outreach (Room: Mont Royal)
Session Chair: Ernie Boyko, Statistics Canada, Retired
TITLE: Outreach to Schools of Information Science
Abstract: This is an update on efforts to get library schools to understand the value of data librarianship. The speakers will report on their efforts in this important area (including the results of IASSIST 2006 poster session taped interviews).
Measuring IASSIST Against 'Science's Sine Qua Non: Making
Scientific Knowledge Understandable, Relevant and
Abstract: This presentation reports on IASSIST's outreach efforts to the natural science community at the CODATA conference, held in Beijing, China in October 2006. This presentation will report on IASSIST's outreach efforts to the natural science community at the CODATA conference, held in Beijing, China, in October 2006. In a keynote address at CODATA 2006, Dr. Jane Lubchenco, a professor of marine biology and zoology at Oregon State University, challenged the scientific community to make scientific knowledge more understandable, relevant, and useful by improving communication between the scientific researcher and the public at large. The presenters, Paula Lackie and William Block, responded to this clarion call by tailoring their CODATA presentation on Social Science Data Archives to issues raised in Lubchenco's keynote speech. Our argument: the social science data community already fosters communication between (social) scientific research and the public via the close interaction of social scientists, data professionals, and computing specialists that make up the IASSIST community. In that regard, the social science data community--as exemplified by IASSIST--was held up as a model to the natural science community for achieving the objectives outlined in the CODATA keynote.
IASSIST Outreach Activities in Russia
Abstract: Russia cooperates with the IASSIST
since Conference in Poughkeepsie, NY in 1990. University Information
System RUSSIA team initiated the contacts. In 2001 other Russian
colleagues joined ? Independent Institute for Social Policy and Russian
Academy of Sciences Institute of Sociology.
Preparing Datasets for the National
Abstract: Created in 2002 by Act 613 the Ghana AIDS Commission established Monitoring & Evaluation Focal Persons at all levels to facilitate the coordination of the national response against HIV/AIDS. This decentralised monitoring mechanism effectively addresses the complex nature of the determinants of HIV transmission. The Focal Persons ensure the implementation of Local M&E Plans. They do this by collecting, collating and compiling routine data for assessing the local response to HIV/AIDS. Various open data storage and retrieval tools (CRIS, EpiInfo) are available to assist in this vein. Although information may be collected at all levels, lapses in standardisation of reporting formats may lead to decision-making vacuums because some agencies have different M&E targets. Although facing enormous challenges, the HIV/AIDS M&E experience in Ghana has been successful because community, district, regional, national and global monitoring are linked to a common goal set out in the national M&E plan.
G1: Harmonizing Data and Documentation: Best Practice Examples (Room: James Bay)
Moderator: Mary Vardigan, Inter-university Consortium for Political and Social Research (ICPSR), University of Michigan
Background: Harmonizing separate datasets and their documentation for purposes of comparison provides new perspectives and extends the analytic value of existing data resources. However, the harmonization task is complex and must be approached with care, even when datasets were designed to be comparable. This panel provides insights into three harmonization projects with information on how the harmonization task was conducted, tools employed in the project, data characteristics taken into account, and the resulting data and documentation products.
Ex Ante Harmonization Across 30 Counties: Lesson
Abstract: This presentation will discuss the lessons learned from ex ante harmonization and documentation of a large scale cross-national epidemiological study of mental health conducted in more than 30 countries, in 35 languages with more than 200,000 respondents. Cross-national survey research is inherently more difficult than research conducted in one nation. With cross-national surveys, layers of complexity are added with variations in sample design, survey content and concept comparability, translation approaches, human subject and ethics review and oversight, interviewer staffing and training, quality control processes and procedures, and local conditions, customs, and context. These issues will be explored within the framework of the harmonization and documentation of the World Mental Health Initiative with commentary on lessons learned and recommended approaches. Discussion and examples will also focus on the levels and types of documentation needed, including questionnaire versions, language translations, and details of the data collection life-cycle.
Harmonization of the Collaborative Psychiatric Epidemiology
Abstract: The three component studies of the National Institute of Mental Health Collaborative Psychiatric Epidemiology Surveys (CPES) -- the National Comorbidity Survey Replication (NCS-R), the National Latino and Asian American Study (NLAAS), and the National Study of American Lives (NSAL) -- have both overlapping and unique content, similarities and differences in variable naming conventions, and differences in question wording and response options on some common measures. This presentation describes challenges faced in harmonizing the three datasets, clarifying similarities and differences, and developing interactive Web-based documentation to facilitate understanding and analysis of the merged CPES public use data. It also describes the processes and tools developed to assist in harmonization, which we view as first steps in the development of best practices for harmonization of datasets at SRC and ICPSR.
Aging in Three Countries: A New Data Resource for Comparative
Abstract: The Health and Retirement Study (HRS) is an ongoing longitudinal study that began in 1992 and documents the changing social, economic, health, and psychological experiences of men and women, aged 50+. Since 1992, there have been sister survey data collection efforts in other countries undertaken so that the data might be compared with the United States and with each other. ICPSR has begun a pilot project to demonstrate the feasibility of organizing, harmonizing, and presenting these data in a way that facilitates research and demonstrates the extent to which the data are comparable (or not). Metadata from the HRS, the English Longitudinal Study on Aging, and the Mexican Health and Aging Study are combined into an interactive tool for researchers. Results of the harmonization and this tool will be demonstrated.
Session Chair: Gail Curry, University of Northern British Columbia
Improving Data Services by the Creation of a Question
Abstract: This paper will demonstrate and discuss how a question database will be of great value for both the social science research community but also for Data Archives. The question database will hold information on every question from the questionnaires deposited in the archive. The database will be searchable from the Web via a dedicated search interface. Having access to the full and original question wording will improve data services in several ways. The paper will present the new advantages and possibilities such as 1) Users will have an additional method for identifying more precisely relevant surveys, links to all the surveys using the given question, 2) the possibility of making comparative research of the surveys using a given question, 3) support in the construction of new questions and questionnaires-new research projects as the question wordings themselves reflect the time and circumstances in which they were designed-more open access and insight in the surveys and the data via the question database. The paper will discuss issues involved in designing, creating and maintaining a question database and the benefits it creates.
Displaying Survey Questionnaires to Data Users, Accuracy versus
This session will provide a practical demonstration of a web resource, the ESRC Question Bank (Qb), which displays survey questionnaires and methodologies. In the past year the Qb has undergone extensive redesign to improve its usefulness. This demonstration will have two key aims: firstly to examine what the Qb is and how it may be used in conjunction with other data resources, secondly to show how it may be used to examine different questionnaire documentation styles and make comparisons between them. Two particular issues arising from the facilities in CAPI programs (mainly Blaise in the UK) will be discussed: routing and control checks. The variety of solutions to the problems of reporting these functions adopted by survey agencies will be highlighted. This will be interpreted from the viewpoint of the secondary analyst, making clear the need for DDI standards in this area.
G3: Towards a National Infrastructure for Community Statistics: Local data sharing issues and resources (Room: Mont Royal)
Session Chair: Rebecca Blash, The Brookings Institution
Guide to Administrative Data Records.
Abstract: This paper begins with a brief section on recent developments in the work on neighborhood indicators, followed by a discussion of some of the practical and methodological challenges of using administrative records data for indicators. The main body of the monograph is a catalog that describes the types of administrative records that are being used to craft neighborhood indicators. The descriptions are brief and where possible the reader is referred to sources for additional information
NNIP Data Sharing Guide
Abstract: In cities across the country, organizations belonging to the National Neighborhood Indicators Partnership (NNIP) have assembled recurrently-updated neighborhood level data in order to promote the use of information in community building and neighborhood development. The Partnership is currently developing a guidebook based on the partners' experiences in data-sharing to assist other intermediary organizations. This presentation will give a preview of the guide by suggesting strategies for political negotiations in requesting data from agency officials; describing common elements in memoranda of agreement; and recommending procedures for responsibly handling confidential data.
Statistical Metadata and NICS: Issues and
Abstract: Statistical metadata is commonly defined as data about data. In fact, it is much more than the catchy phrase implies. The metadata related to a particular statistical dataset identify the data and describe its content and quality so that data users can retrieve, use, and process the data appropriately. Metadata document information about a statistical dataset’s background, purpose, content, collection, processing, quality, and related information that an analyst needs to find, and properly understand and manipulate the data. The information in a statistical metadata system is essentially a reference library about a dataset. As such, the metadata for a statistical dataset broadens the number and diversity of people who can successfully use a data source once it is released. It is the purpose of this paper to discuss issues related to the development and use of statistical metadata and to describe resources to standardize and automate statistical metadata. While there are many types of metadata – this paper is concerned only with statistical metadata
G4: NEW ARCHIVES (Room: Saint Lawrence)
Session Chair: Kathleen Matthews, University of Victoria
Qualitative Data Archiving in the Czech
Abstract: The aim of the paper is to present the qualitative data archive Medard. The archive has been based upon other similar institutions in the world, particularly upon the British Qualidata archive, and it was founded in order to provide archiving of qualitative data generated by research in the social sciences and humanities. The archive acquires, deposits and provides data exclusively in digital form. It is the only qualitative data archive not only in the Czech Republic, but in the whole region. However, many obstacles impede the use of qualitative data archiving in practice among researchers. The obstacles as well as forthcoming challenges will be discussed in the presentation.
Opening Access to Indigenous Data in
Abstract: The Australian Social Science Data Archive has decentralised its national role and developed a distributed model, with archive nodes established across our country. This distributed model allows nodes to specialise in data where access may be blocked, but there exists a critical mass of researchers who are able to open data, add value and ensure sustainability. One such data type is that of Indigenous data. The Indigenous population in Australia is small and highly disadvantaged. Data collected by government are treated with the utmost sensitivity to protect both the population and the government from criticism. But even within those groups working to improve the conditions of Australia's Indigenous population, there is a level of protectiveness and restrictiveness which sees data blocked to those outside the immediate network. This paper will investigate methods used in ASSDA's pilot project to get stakeholder buy-in and establish an open source of Indigenous data.
Open data: new possibilities for knowledge
Abstract: Making data open is a task (particularly important) in Russia. Russian Sociological Data Archive (recently celebrated 6 years of work) considers this job as its main mission. Making data open stimulates some important processes providing development of knowledge communities in Russia. 1. It encourages horizontal links between researchers. Russian scholars always suffer from lack of regular interpersonal connections. Sharing of data breaks dissociation of Russian scientific society and stimulates emerging of research networks. 2. It increases quality of research. If one is to share collected data, s/he aims to do the best to follow the high methodological standards. On the other hand, analysis of trusted open data provides better base for further development of research. 3. Provision of empirical materials for teaching students in social and political science. Easily accessible trusted data improve analytical skills of the students for their future as scholars.