Web and Data Sources for Social Policy Research
Wednesday, September 14, 2011
Susan Czarnocki (Char-not-ski)
Social Policy Research: International Focus
United Nations Research Institute for Social Development
Institute for Health and Social Policy
The Spirit Level: why more equal societies almost always do better
Social Policy Research: Canadian Focus
Caledon Institute of Social Policy
Council on Social Development
(free statistics on poverty, welfare recipients,
Millenium Development Goals:
The Centre for the Study of Living
Social Policy Simulation Database and Model
The Conference Board of Canada
The Fraser Institute
Links and Updates:
How is Canada doing overall?
Finding and Using MicroData (see discussion of Data-types, below)
FINDING AND USING CANADIAN NUMERIC DATA
CANSIM: Canadian Socio-Economic Information and Management Database
- CANSIM I: the 'original' version: about 8 million
series; no longer being updated
- CANSIM II the "new, improved" version: about 25 million
series, and counting....
Three Different Access Points:
CANSIM II via E-STAT
- CANSIM I via CHASS (UofToronto)
- CANSIM II via CHASS (UofToronto)
Searching CANSIM II via E-Stat:
- Reduced set of Tables: easier to find main indicators for
major subject areas.
- drawback: not contain the more esoteric topics, smaller levels of geography.
Searching CANSIM I (the 'original' CANSIM):
- More series from the 1960'2-1970'2;
- Many series end between 1995--2000.
Click on "search and retrieve CANSIM" -->
3 SEARCH OPTIONS:
- MATRIX INDEX:
- searches the "high level headings"
- many tables are then listed under each heading
- MAIN INDEX
- searches title of every individual vector: comprehensive
- FULL TEXT INDEX
- searches detailed descriptions:
Searching CANSIM II (now contains both time-series and vectors of
Canadian Census Data:
- For Urban (CMA) data from E-Stat:
- getting Census Tract id-numbers
- getting Census Tract data
- Using Geo-Search and Getting Census-tract Data through E-Stat: Brief Guide
Tabular Census data; Geography files: 2006 and previous
- Select under Data: Basic Cross-Tabulations (BST)
- using Beyond20/20
Search the Census with E-Stat:
- From the left-hand list, select "Search Census": Gives access to data since 1986
Data Jargon: What Type of Data is it?
GENERATING DATA: Research involves collecting classifiable information or quantifiable measurements on phenomena of interest.
Data are the results of such research. Two primary dimensions for classifying data types are:
- Units of measurement
Units of measurement:
A microdata file consists of a matrix of rows and columns, where each row represents one case, and each column represents a variable [one measurement or response for each case].
ID Sex Vote Usually converted ID Sex Vote
person1 M PQ to numbers: 1 1 1
person2 F Lib 2 2 2
person3 M ADQ 3 1 3
Examples of Microdata:
- Census Public Use MicroData Files (PUMF's)
- Public Opinion Polls
- Social Surveys
A macrodata file contains records
that describe not the individual but a group of respondents. This type
of file is known as a macrodata or aggregate data file. The unit of observation
is a group of units which might in themselves have been units of observation.
Such a file might
contain variables such as: number of persons in local households,
average income of all members in the household, vote totals for candidates in an election,
total enrolment in various educational insitutions, etc.
Examples of Macrodata:
- National Accounts Data
- Financial Statements; Annual Reports
- Most tables in newspaper articles, institutional publications, etc.
Cross-sectional data is collected during one time-period. This is the usual time-frame for social surveys.
In the 'ideal case', all respondents would be interviewed simultaneously, but surveys often take several
months to complete. Nonetheless, the survey would still be taken as reflecting a "snap-shot" of a 'single' period.
In time-series data, the “case” or “unit of observation” is a period of time: months, years, etc. Columns are still variables, but the rows consist of the value of that variable for one entity (e.g. a country, province, organization) for a given
Canada, annual: GDP, cross-national:
Year GDP Unemp CPI Investment Year Canada UK
2000 1038.8 7.7 2.2 1200600 2000 1038.8 943.41
1999 957.9 8.1 0.97 1345300 1999 957.9 901.27
1998 901.8 9.2 1.2 1567300 1998 901.8 859.81
1997 877.9 6.5 1.3 1100500 1997 877.9 811.07
To add the time-dimension to surveys, various special types of surveys are used: panel, or longitudinal surveys, which attempt to follow the same sample-set of respondents
over time. Also, surveys which ask the same, or similar questions of different respondents can be analyzed to provide some over-time analysis, through cross-sectional time-series
ASCII: American Standard Code for Information
- generic [can have 'any'
- does not require 'translation' by special
- can be viewed in
Data-relevant plain-text [ASCII] format
FREE format: variables not always
end on same column
- commas: use the
- tabs: referred to
as "tab-delimited" file
FIXED format: variables always
end on same column
- with delimiters: read like 'free format'
- no delimiters
Data in fixed-format without delimiters
must be accompanied by a codebook
Code-book indicates which questions
/variables occupy which columns
(.doc, .ppt, .xls, etc.)
- can be read directly only by the software which
- all other programs need a 'translator' [built-in
or external] to read them
Examples from the world of data:
- Statistical Packages:
[SPSS] .dta [Stata]