Census Microdata Overview


This page describes the various public use microdata (PUMS) products that are companions to the population and housing summary files produced either from Census 2000 or from the American Community Survey (ACS). This page gives information meant to be helpful in making the decision about which product to use. Much of the information provided was taken from the Census Bureau's Public Use Microdata Sample 2000 Census of Population and Housing (large file). The products that are available include:
 

Public Use Microdata
   Census 2000
          1% Sample
          5% Sample
   American Community Survey
          Annual 2000-forward

     Restricted Access Files

Census Research Data Centers (CRDS)

An Introduction to Microdata

The following list of questions and answers serves as an introduction to the understanding of microdata.

  1. What is microdata?
     
    Microdata are the individual records that contain information collected about each person and housing unit. They are computerized versions of the questionnaires collected from households, as coded and edited during census processing.
  2. How does microdata relate to Summary File Data?
     
    The Census Bureau uses these confidential microdata to produce the summary data that go into the various reports, summary files, and special tabulations. The individual response data are tabulated and often cross tabulated based on the values of more than one variable, and the totals statistically adjusted to give counts that are representative of the entire population. These tabulated results are what are in the Summary Files.
  3. Why would you use microdata?
     
    Microdata samples are useful to users who are doing research that does not require the identification of specific small geographic areas or detailed cross tabulations for small populations. Use microdata to study relationships among census variables not shown in existing census tabulations. This often is done when studying the characteristics of specially defined populations groups other than the race, hispanic origin, and age groups analyzed in the summaries files.
  4. What is the difference between the PUMS files and restricted files?
     
    The restricted data files contain all the survey questionnaire responses while the PUMS are subsets of the survey responses selected to represent people and housing units (in the case of 2000 data there is a 1% and 5% sample). For PUMS files the subsetting and reporting are done in a manner that avoids disclosure of information about households and individuals. The techniques used to accomplish this are: a unique geographic reporting structure based on relatively large reporting areas called PUMAs and super-PUMAs, the use of reporting thresholds, and a variety of statistical procedures to mask identifable persons or households. With restricted files, microdata are not modified. Confidentiality is ensured by screening access.

Public Use Microdata Samples (PUMS)

Public Use Microdata Sample (PUMS) files contain records representing samples of the occupied and vacant housing units and the people in the occupied units. Persons in group quarters are also included. The files contain individual weights for each person and housing unit, which when applied to the individual records, expand the sample to the relevant total.

For the 2000 data, the 1% file provides a fuller range of detailed characteristics and the 5% file provides greater geographic detail but less characteristic detail.

Below is a summary of the characteristics of each file and a review of what options there are for access. Although the detailed information refers to Census 2000 products, much also applies to ACS, particularly with files produced starting with 2006.

Levels of geographic reporting
  • 1% PUMS
    • lowest level is super-PUMA with a minimum population threshold of 400,000
    • super-PUMA boundaries encompass one or more contiguous PUMA areas (no PUMA codes on 1% file)
    • super-PUMAs are defined within states and state codes are reported
    • codes to show relationship to MSAs are reported
  • Census 2000 5% and ACS PUMS
    • lowest level is a PUMA with a minimum population threshold of 100,000
    • super-PUMA and state codes are reported
    • codes to show relationship to MSAs are reported
    • for NYC, PUMA boundaries approximate Community District boundaries
  • More Information
    • Files are hierarchical as each housing unit record is followed by a one or more person records, one for each occupant.
    • The serial number on both record types affords the option of processing the data either sequentially or hierarchically.
    • For each state there is a geographic equivalency file, PUMEQ1-XX.TXT or PUMEQ5-XX-TXT, that shows the relationship of PUMS geography to standard census geography.
    • The MABLE/Geocorr2K: Geographic Correspondence Engine with Census 2000 Geography is another way to look up geographic equivalents for the 5% file.

Data Variables

  • 1% PUMS
    • maximum amount of social, economic, and housing information available
    • the only threshold for the identification of variable category is a national minimum population of 8,000 for race and Hispanic origin
  • Census 2000 5% and ACS PUMS
    • a minimum threshold of 10,000 nationally is set for the identification of variable categories within categorical variables
  • More Information

Data Availablity

Via FTP

  • Census 2000 1% PUMS
  • Census 2000 5% and ACS PUMS
  • More Information
    • Note: files are published by state.
    • Requires handling large (sometimes zipped) files, and the use of statistical software.
    • For 2000, DataGate has NYC subsets in Stata and SPSS formats.
       

 Via Online Extraction    

      The Integrated Public Use Microdata Series (IPUMS) USA is a free web site.

  • Easy to use web interface to U.S. Census Microdata for years 1850-current
  • Decennial Census data and American Community Survey data is included.
  • More Information
    • An extraction creates a subset of individual microdata records based on your selection criteria together program code (SPSS or Stata) that reads the data.
    • Use the documentation at this site, since IPUMS assigns uniform codes across all the samples and brings relevant documentation into a coherent form to facilitate analysis of social and economic characteristics over time.
    • You can obtain data across states in one download.
       

Data available on DVD (2000 only)

  • 1% PUMS
    • ask for the DVD to use in the DSSC Data Service; application runs from the disc, no installation necessary.
  • Census 2000 5% and ACS PUMS
    • ask for the DVD to use in the DSSC Data Service; application runs from the disc, no installation necessary.
  • More Information
    • Beyond 20/20 software is designed to perform basic cross tabulations of any desired set of variables on the PUMS file.
    • Easy to use; no software skills needed.
    • Only the data for one cross tabulation can be extracted at a time.
    • You can use geographic codes for cross tabulations allowing you to analyze data from multiple geographic areas, including totals for the nation, in one pass.
    • Extracts done with B20/20 software are reported as tabulated results, not as records of individual responses.
    • You can choose to produce weighted or unweighted extracts.

PUMS Weight Variables

Weight variables are applied to the variables in the PUMS file during data analysis to create results that are representative of the population. There is a person weight variable for use with person characteristics and a housing weight variable for use with housing characteristics. Using the weights within a software application, usually requires only that you designate which weight variable you want to use. An understanding of how they are applied will help you choose and the examples below serve that purpose.

Restricted Access Files

To access all the responses from the long form rather than a subset of the responses can be done only at one the Census Bureau's Census Research Data Centers (CRDC)

The CRDCs are locations that provide a secure environment where researchers have limited access to confidential economic and demographic microdata, with appropriate safeguards to protect data confidentiality. Researchers work on site in a controlled environment which ensures that the Census Bureau's standards for maintaining the confidentiality of data is rigorously maintained. Users must submit their research design for approval before being granted access and must use the data resources on-site at one of the Census RDC sites.  CU is affiliated with two CDRCs (affiliation means standard usage fees are waived.