Stata for a PC or Mac: Brief Information


Stata is a general purpose statistical package.  This is a brief introduction to using Stata with information that applies to both PCs and Macs.

Stata is available to Columbia University students, faculty, and staff at a significant discount from their normal prices. See this URL for site license information.

Stata Interactive Help

Stata has extensive interactive help.

  • The "Help" pulldown menu on the upper right. If you know the name of the command choose "Stata Command" to see an explanation of what it does, options that work with the command, and often examples.  Use the "search" or "contents" features if you don't know the name of the command.
  • The describe command. This tells you about your active dataset and its variables. Use describe, short if you don't want to see the list of variables. If you want to find a certain variable but are unsure of its name, use the wild card * symbol, e.g., describe in* will list all the variables starting with "in" and describe *9 will list all the variables ending in 9.
  • The lookfor command. Use this to search through variable names and variable labels for a string, e.g., lookfor in will find the variable, income, and variables whose variable labels are ethnic minority and "moves since age 16."

The Stata Interface

On the PC and Mac, Stata has a number of windows: You can use the pull-down Window menu or just point-and-click on the one you want.

  • Command Window
    Where you type Stata commands.
  • Results Window
    Where the results of your commands appear.
  • Review Window
    A running list of the commands you've used. You can click on old commands to re-issue them or cut-and-paste to the Command Window for editing.
  • Variables Window
    A list of variables in the current data set. You can click on their names to insert them into commands in the Command Window.
  • Viewer Window
    Displays log files and information from help requests.
  • Graph Window
    Graphing output.
  • Data Editor Window
    The data editor, another spreadsheet type display. Here, you can change your data. You must close this window before you can do anything else.
  • Do-file Editor
    This is a separate built-in editor program that is started from within Stata but opens as a separate window. You can use it to create, modify, and run Stata do-files.

You can close any window except the Command and Results windows. Windows can overlap as you use them and you can lose track of them particularly on the Mac where they are all free floating. Use the Window menu to find them or move from one to another.

When Stata is open two windows always appear and must remain open during the entire session: the Command window and the Results window. Many of the most common commands can be entered either by typing the correct syntax into the Command window or by using the choices listed in the program's pull-down menus. This guide contains references to both the syntax for commands and the file menu technique to selecting some commands.

Using and Saving Stata Datasets

A "Stata Dataset" is one in the special Stata format. Stata datasets have the extension .dta. To open a Stata Dataset, either use the Open option in the File menu or, in the Command window, type the use command followed by the name of a Stata dataset.

use survey1

If you only need some of the variables from a Stata Dataset, you can just read in those variables with this variant of the use command:

use age sex status using survey1

To save a Stata Dataset, either use the Save or Save as option in the File menu or, in the Command window, type the save command plus a filename. If it already exists, you will need to add the option, replace. You do not have to type the file extension. The extension will be .dta by default.

save survey2, replace

Note:  To create a file that can be read by version 8 or 9 of Stata, use the saveold command.

        saveold survey2

If you are using Stata/SE and want to save the dataset for use in the smaller, Intercooled version of Stata, use the option intercooled on the save command:

        save surveyl, intercooled

String variables must be less than 80 characters to be saved in Intercooled.

Some Useful Stata Commands

  • describe
    Describes the currently active data file, showing the number of observations and variables, size of file, names and types of variables. describe,short gives info about the file but not the variables.
  • summarize
    Gives summary statistics. You can give it an argument of a list of variables.
  • codebook
    Creates a simple codebook describing your data.
  • clear
    Clear everything from memory, including your data, value labels, equations, etc. In effect, it resets Stata.
  • memory
    Check memory allocations.
  • list
    Lists all or part of the currently active data. The command can be quite complex. You can give it arguments of a list of variables, cells, rows, pattern and conditional matches, for example, the command:

        list age status in 1/100 if age>14

    lists the variables age and status for those aged over 14 in the first 100 observations. Be careful about using list. If you have a very large dataset, the listing will go on and on.
  • browse
    Opens the browser window and displays all or part of the currently active data. As with the list command, it can be quite complex. You can give it arguments of a list of variables, cells, rows, pattern and conditional matches, for example, the command:

            browse age status in 1/100 if age>14

    opens the browser window and displays the variables age and status for those aged over 14 in the first 100 observations.
  • tab1 [varname]
    Do simple frequencies on a variable. It is always worth knowing what the data you're working with looks like.

Working with Large Files & Increasing Memory

If you get the message no more room for observations (as opposed to variables), you don't have enough memory to read in your entire Stata dataset. The command, "memory", gives a report on memory usage. To increase memory, give the command:

        set memory #m

where "#" is a number and "m" is megabytes.

You may be able to reduce your memory requirements by saving your data more efficiently. Stata's default variable type is 8 bytes. This is unnecessarily large for most social science data. Use stata's compress command to reduce your data to its most efficient format and then resave your file.

Note: The compress command does not create a compressed version of your file in the way that compression utilities such as gzip or pkzip do. Rather, the Stata compress command changes the data types to store your variables such that each variable is stored optimally. See the Stata Manual for more information on types of Stata variables.

Further note: If you get the message "No more room for variables" (as opposed to observations), you have too many variables on the file for the default maximum allowed.  To override the default use the maxvar command but first determine how many are on the file with the describe command on the unopened file (eg. describe using "c:/myfile.dta". short).  

The Intercooled Stata has an absolute limit of 2,047 (2**11 -1) . Stata SE and Stata MP (version on CUIT workstations) has a limit of 32,767 (2**15 - 1).  If you know the names of the variables, you can read in only the ones you need. Since Stata works almost entirely in memory, the fewer the variables (and observations) the faster it runs.

Reading ASCII Data into Stata

The two most common commands to read data from an ASCII file into Stata are insheet and infile:

  1. insheet
    Use insheet if the file was created by a spreadsheet or a database program with one observation per line and the variable delimiter is a comma or a tab character. If you are coming from Excel, create a .csv file. The first line can be a list of variables. A period (.) is understood to mean a numeric missing value; double quotes ("") to mean a missing string variable.
  2. infile is used to read fixed format raw data without delimiters using a dictionary file. See example below or click here for more information on writing a dictionary file.

The syntax used in the command window and examples are below. The commands can also be initiated from the File menu using the Import option.

Insheet

Syntax

          insheet using filename, options

where filename is the name of the ASCII file created by the spreadsheet or database program. By default, Stata will assign the names v1,v2,...,vn to the variables. If you saved the spreadsheet file with the names of variables in the first row, Stata can use them if you specify the option names, for example:

insheet using mystuff.dat, names

If you didn't save the spreadsheet file with the names of variables, you can add them later with the label command. If you have a lot of variables, make up a .do file with all the label commands.

Infile

Syntax: 

        infile using dictionary-file

If the variables in your data file are not delimited, you need a Dictionary File to describe the positions of your variables to Stata, where dictionary-file is the dictionary file containing the specifications for reading the variables. Here's an example.

        dictionary using dump.dat {
                _column(1) id %5f
                _column(6) age %2f
                _column(8) str1 sex %1s
                _column(9) str1 status %1s
          }

Click here for more information on writing a dictionary file.

You can also read in ASCII datasets using the pull-down File menu under the "Import" option. If the data isn't delimited, you will still need a dictionary file or the information on the column location and data type of your variables.