Web and Data Sources for Social Policy Research
Wednesday, September 14, 2011


Susan Czarnocki   (Char-not-ski)
(514) 398-1429

 Social Policy Research: International Focus
  The Spirit Level: why more equal societies almost always do better
Social Policy Research: Canadian Focus
Millenium Development Goals:

Finding and Using MicroData

(see discussion of Data-types, below)


CANSIM: Canadian Socio-Economic Information and Management Database

  • CANSIM I:   the 'original' version:  about 8 million series;  no longer being updated
  • CANSIM II  the "new, improved" version:  about 25 million series, and counting....

Three Different Access Points:

  • CANSIM II  via  E-STAT
  • CANSIM I   via CHASS (UofToronto)
  • CANSIM II   via CHASS (UofToronto)

    Searching CANSIM II via E-Stat:

    • Reduced set of Tables: easier to find main indicators for major subject areas.
    • drawback: not contain the more esoteric topics, smaller levels of geography.

    Searching CANSIM I (the 'original' CANSIM):

    • More series from the 1960'2-1970'2;
    • Many series end  between 1995--2000.

    Search strategy:
    Click on "search and retrieve CANSIM"    -->    Search CANSIM Catalogue Files


    • MATRIX INDEX:   
      • searches the "high level headings"
      • many tables are then listed under each heading
      • searches title of every individual vector:   comprehensive but time-consuming
      • searches detailed descriptions: 

    Searching CANSIM II  (now contains both time-series and vectors of cross-sectional data:)


    Canadian Census Data:

    Census Geography: key to the Census

    Using GeoSearch --

  • For Urban (CMA) data from E-Stat:
    • getting Census Tract id-numbers
    • getting Census Tract data
      • Using Geo-Search and Getting Census-tract Data through E-Stat:  Brief Guide

    EDRS website:  Tabular Census data;   Geography files: 2006 and previous

    Search the Census with E-Stat: 

    • From the left-hand list, select "Search Census": Gives access to data since 1986



    Data Jargon: What Type of Data is it?

    Basic definitions

    GENERATING DATA:  Research involves collecting classifiable information or quantifiable measurements on phenomena of interest. Data are the results of such research. Two primary dimensions for classifying data types are:

    • Units of measurement
    • Time-frame

    Units of measurement:
        • Microdata:
      A microdata file consists of a matrix of rows and columns, where each row represents one case, and each column represents a variable [one measurement or response for each case].
      ID           Sex      Vote  Usually converted     ID  Sex Vote
      person1       M        PQ      to numbers:         1   1    1
      person2       F        Lib                         2   2    2
      person3       M        ADQ                         3   1    3  

      Examples of Microdata:

      • Census Public Use MicroData Files (PUMF's)
      • Public Opinion Polls
      • Social Surveys

        • Macrodata:
      A macrodata file contains records that describe not the individual but a group of respondents. This type of file is known as a macrodata or aggregate data file. The unit of observation is a group of units which might in themselves have been units of observation.

      Such a file might contain variables such as: number of persons in local households, average income of all members in the household, vote totals for candidates in an election, total enrolment in various educational insitutions, etc.

      Examples of Macrodata:

      • National Accounts Data
      • Financial Statements;   Annual Reports
      • Most tables in newspaper articles, institutional publications, etc.

      • Cross-sectional Data:
    Cross-sectional data is collected during one time-period. This is the usual time-frame for social surveys. In the 'ideal case', all respondents would be interviewed simultaneously, but surveys often take several months to complete.   Nonetheless, the survey would still be taken as reflecting a "snap-shot" of a 'single' period.
      • Time-series Data:
    In time-series data, the “case” or “unit of observation” is a period of time: months, years, etc. Columns are still variables, but the rows consist of the value of that variable for one entity (e.g. a country, province, organization) for a given time-period:
    Canada, annual:                           GDP, cross-national:
    Year     GDP    Unemp  CPI  Investment   Year    Canada    UK
    2000   1038.8    7.7   2.2   1200600     2000    1038.8    943.41          
    1999    957.9    8.1   0.97  1345300     1999     957.9    901.27          	
    1998    901.8    9.2   1.2   1567300     1998     901.8    859.81
    1997    877.9    6.5   1.3   1100500     1997     877.9    811.07  

    To add the time-dimension to surveys, various special types of surveys are used: panel, or longitudinal surveys, which attempt to follow the same sample-set of respondents over time. Also, surveys which ask the same, or similar questions of different respondents can be analyzed to provide some over-time analysis, through cross-sectional time-series analysis, etc.

    Data-file formats:

    1.    Non-proprietary:    (ASCII,    'text-file',    .txt)

    ASCII: American Standard Code for Information Interchange:

      • generic   [can have 'any' file-extension, traditionally  .txt ]
      • does not require 'translation' by special software;
      • can be viewed in Notepad

    Data-relevant plain-text [ASCII] format types:

    FREE format:   variables not always end on same column

      1. space[s]   
      2. commas:   use the extension  .csv   [comma-separate values]
      3. tabs:         referred to as "tab-delimited" file

    FIXED format:   variables always end on same column

      1. with delimiters: read like 'free format'
      2. no delimiters
        • Data in fixed-format without delimiters must be accompanied by a codebook
          Code-book indicates which questions /variables occupy which columns

    2. Proprietary:    (.doc, .ppt, .xls, etc.)

    • can be read directly only by the software which generated it
    • all other programs need a 'translator' [built-in or external] to read them

    Examples from the world of data:

    • Spreadsheets:              .xls,   .wks
    • Statistical Packages:    .sav  [SPSS]    .dta [Stata]
    • databases                       .dbf

    Prepared by Susan Czarnocki
    January 14, 2010