SPSS: What You Need to Know to Write an SPSS Program


This is a brief guide to the basics of writing an SPSS program. SPSS programs are text files that have an .sps extension.   The Windows and Mac versions of SPSS have pull-down menus facilitate working with the data without having to worry about the syntax of commands in an SPSS program.   The occassions when you will need to  work with an SPS progam file are:

  • when using SPSS on CUNIX you use use the syntax descirbe here;
  • on Windows/Mac when the data you have is an ASCII text format with accompanying SPSS syntax file. (note: if you only have a text data file and documentation, you must write your own SPSS program file).

For help with an SPSS program file, you can visit the Data Service in Lehman Digital Social Science Center or email dssc.data@columbia.edu.

Basic Rules for all Syntax

  • A command must begin in column 1.
  • Continuation lines must be indented at least one space.
  • A command ends with a "." (period).
  • Case does not matter except in the name of the file.
  • Quotes can be single or double but they must match.

Four Part of and SPSS Progam

  1. Defining and reading the data
    (FILE HANDLE and DATA LIST or GET FILE).
  2. Selecting and/or modifying the data
    (SELECT IF, RECODE, COMPUTE, etc.).
  3. Statistical procedure(s)
    (FREQUENCIES, CROSSTABS, REGRESSION etc.).
  4. Saving a "save file"
    [optional but recommended for repeated runs].

     Other Commands and a Sample Complete Program

Part 1 - Defining and Reading in the Data

SPSS files normally have an .sav extension but may also have a less common .por extension (an ASCII-based format that facilitates transporting to other computers and operating systems without problems.) 

In the Windows/Mac versions the OPEN command with in file menu will read SPSS formats and SAS, Stata, Excel, and dBase formats.  The SAVE and SAVE AS commands can write files out in all these formats.

If you have raw data, you will have to do all the work of defining the variables yourself, i.e, write a program.  When reading or writing af file in a SPSS program you must use the name of the file and its location (path).  Examples are:

  • file1.dat
    raw data file located in your directory
  • file1.sav
    SPSS data file located in your directory
  • surveys/file1.sav
    SPSS data file located in a subdirectory of your home directory
  • /u/9/s/me2000/surveys/file1.sav  
    SPSS data file located in a subdirectory of your home directory, with a full path name

SPSS program files cannot read files in SAS, Stata, Excel, and dBase formats.  They can read SPSS formats and ASCII fixed format or delemited files.

If you are reading an SPSS .sav file you need

  1. Documentation listing the variables you want and their mnemonic names in the system file.
  2. An SPSS program with
    • a GET FILE = command with the name of the save file in quotes,
    • (optional) a /KEEP (or /DROP) subcommand with the names of the variables you
      want to use (or not use). If you need all the variables, leave this out.
    • a period to end the command.

Example:

GET FILE="/u/9/s/me2000/surveys/file1.sav"
    /KEEP zodiac sex.

If you are reading raw (ASCII) data you need

  1. Documentation describing the variables.
  2. Mnemonic names for the variables you want (up to 8 characters each only) with
    • Each variable's column position(s) in the file, e.g. 1-4, and
    • Each variable's type, i.e., integer, decimal, or alphanumeric.
    You make up the names. You can use V[n] to V[m], e.g., V1 to V100, if you want,
    but using mnemonic names is a lot easier in the long run. You don't have to define all
    the variables in your file, just the ones you need.
  3. The length of the records in the file (the LRECL).
  4. An SPSS program with
    • A FILE HANDLE command with
      • a "handle" (IN for the example below),
      • a "/" (slash),
      • a NAME= subcommand with the name of the raw data file in quotes,
      • the LRECL= subcommand, and
      • a period to end the command.
    • A DATA LIST command with
      • a FILE=handle subcommand,
      • a "/" (slash), and
      • then the list of the mnemonic variable names, each followed by its column positions, and its type if it has a decimal place or is an alphanumeric, and
      • a period to end the command.
      Example:

              FILE HANDLE MYFILE/NAME="file1.dat" lrecl=1200.
              DATA LIST FILE=MYFILE /
                      PERSONID 1-4
                      SEX 6
                      BIRTHYR 7-10
                      INCOME 15-21 (2)
                      STATE 55-56 (A).

      In the example above, INCOME has 2 decimal places and STATE is a 2 column character variable. Note that you don't have to define all the variables in your file, just the ones you need.
    • Some raw data files can have multiple lines of data for each case. This frequently happens with opinion surveys where the responses from one respondent are reported on two or three lines (in the documentation often referred to as "records" or "cards"). Use the subcommand RECORDS= following the DATA LIST command.
      • a records=# subcommand placed after the file handle, with # = the
        number of records per case,
      • a "/" (slash) marking the start of each record followed by an
        integer that indicates which record it is.

      Example:

              FILE HANDLE MYFILE/NAME="file3.dat".
              DATA LIST FILE=MYFILE records=3
                      /1
                      P-ID-REC1 1-4
                      SEX 6
                      BIRTHYR 7-10
                      INCOME 15-21 (2)
                      STATE 55-56 (A)
                      /3
                      P-ID-REC3 1-4
                      industry 5-8
                      occup 9-11.

      In the example above, there are three records per case. Note that no variables
      are defined for record type=2. You only need to define the variables you need.

If you are reading an SPSS portable file you need

  1. Documentation listing the variables you want and their mnemonic names in the portable file.
  2. An SPSS program with
    • an IMPORT FILE= command with the name of the portable file in quotes,
    • (optional) a /KEEP (or /DROP) subcommand with the names of the variables
      you want to use (or not use). If you need all the variables, leave this out.
    • a period to end the command.

Example:

    IMPORT FILE="/eds/datasets/gss/data/gss94-all.por"
                        /KEEP zodiac sex.

Part 2 - Selecting and Modifying the Data

This part is optional. You may not need to select cases or modify or create new variables. But if you do, these are the most common commands.

  1. SELECT IF
    This command selects whole CASES, usually people.

    Examples:

            SELECT IF (sex = 1).
            SELECT IF (STATE = "NJ").
            select if (any(racegrp,4,5,6,8)).

    Warning!
    The effect of multiple SELECT IF statements is cumulative. See the manual on using the TEMPORARY command if you don't want this.
  2. COMPUTE
    Create a new variable.

    Examples:

            COMPUTE NEWAGE=0.
            COMPUTE YRRETIRE=BIRTHYR+65.
            COMPUTE INCOME=salary+interest+divdnds. 
  3. RECODE
    Change the values of a variable. It is best to do this on a new variable created from an old one so you don't lose the old values. You never know when you may have to back up and use them again. The default format for new integer variables is F8.2. It's worth making this more efficient with the FORMAT command.

    Example:

            RECODE AGE (MISSING=9)(18 thru HI=1)(LOW thru 18=0) into VOTER.
            RECODE PLACE (1=1)(2 thru 7=2)(else=0) into CITYTOWN.
            RECODE MONTH (" "=99) (CONVERT) ("-"=11)("&"=12) into NEWMONTH
            FORMATS VOTER CITYTOWN (F1.0) NEWMONTH (F2.0)
  4. IF
    Conditional change. This is useful for cleaning data as well as recoding (third example below, "NJ" for "JN").

    Examples:

            COMPUTE WORKWK=0.
            IF (WORK GT 0 and WORK LE 35) WORKWK=1.
            COMPUT PLRTY = 1.
            IF RANGE(VALUE(PLURALTY),2,8) PLRTY = 2.
            IF (STATE EQ "JN") STATE="NJ".
  5. MISSING VALUES
    Declare some values of a variable "missing" so they won't be used in statistical calculations.

    Example:

            MISSING VALUES AGE (0)
                                      Score1 to Score10 (999)
                                      STATE ("XX").

    Warning! Missing Values affect RECODE and COMPUTE statements and can have unexpected results. When you create or modify any variable, be sure to check very carefully what happened with the Missing Values. For example:

            MISSING VALUES PLURALTY (2 THRU 8).
            COMPUTE PLRTY=PLURALTY.
            RECODE PLRTY (2 THRU 8 = 2).

    won't work. Cases coded 2 through 8 are Missing and won't be recoded. (See the Manual for the VALUE function to get around this.)

Part 3 - Statistical Procedures

  1. Very Important!!! Before you do any other analysis, run FREQUENCIES on all the variables you are going to use in your analysis so you know what your data looks like. Check the FREQUENCIES output for mis-codings and unusual outliers. (Hints: Be careful about running FREQUENCIES on variables with unique or nearly unique values, e.g., ID or INCOME. Use the subcommand /FORMAT=ONEPAGE to save space.)
  2. Decide what statistics procedures are appropriate for your research. You and your advisor/statistician have to do this. EDS doesn't provide statistical consulting.
  3. Look up the particular procedure command in the manual and choose the subcommands you need.

Part 4 - Saving the SPSS Save File

  1. Decide where you want to save the file, e.g., your home directory or a subdirectory.
  2. Pick a name. Do not use punctuation other than an underscore ("_") in the name. The file extension is .sav.
  3. To create a save file, add the command SAVE OUTFILE= at the end of your program.

    Examples:

        SAVE OUTFILE="april.sav".
        SAVE OUTFILE="surveys/april.sav".
        SAVE OUTFILE="/u/9/s/me2000/surveys/april.sav".

Some other nice (but optional) commands

  • TITLE
    Puts a title line on your output.
  • SET WIDTH 80
    Narrow the width of the output so you can easily read it on a computer screen.
  • SET HEADER NO
    Turn off the page headings after page 1.
  • N OF CASES x
    Run on x number of cases to test the program.
  • SAMPLE [percent]
    Take a percentage sample of cases.
  • SAMPLE [n from m]
    Take a sample of n cases from m cases.
  • COMMENT
    Write a comment lines in your program. Highly recommended. The comment can extend for many lines until it ends with a period.
  • And * (asterisk) at the start of a line is an alternative to the COMMENT command

Example of a Complete Program

TITLE "A very simple program".
set width 80.
file handle in/name="/p/us/sue/zspssx/famous.dat".
DATA LIST FILE=in /
        idnum 1-3 (N)
        fname 4-15 (A)
        lname 16-27 (A)
        age 28-29
        sex 30
        byear 31-34
        dyear 35-38
        status 39.
VAR LABELS
        idnum "Case Number"
        fname "First Name"
        lname "Last Name"
        age "Age at Death"
        sex "Sex"
        byear "Year of Birth"
        dyear "Year of Death"
        status "Status".
VALUE LABELS
        sex
                1 "Male"
                2 "Female" /
        status
                1 "Real"
                2 "Fictional"
                3 "Possibly Real".
missing values
        status (3).
select if (sex eq 2).
freq vars=status.
save outfile="famous_females.sav".