Stata on CUNIX


Stata is a general-purpose statistical package. This document offers a brief introduction to Stata on CUNIX, with several examples of reading ASCII data.  At the CUNIX prompt type stata and the specifics of which version is available will be listed.

Stata Help

Stata has extensive interactive help.

  • help
    Use help when you know the Stata word or phrase you need help on .
  • search
    Use search when you are not sure of the name of the command or are looking for information on a topic. It searches a keyword database and the Internet.
  • findit
    This is like search but searches for information on a topic across all sources including the online help, the FAQs at the Stata web site, the Stata Journal, and all Stata-related internet sources including user-written additions. From findit, you can click to go to a source or to install additions.
  • describe
    This tells you about your active dataset and its variables. Use describe, short if you don't want to see the list of variables. If you want to find a certain variable but are unsure of its name, use the wild card * symbol, e.g., describe in* will list all the variables starting with "in" and describe *9 will list all the variables ending in 9.
  • lookfor
    Use this to search through variable names and variable labels for a string, e.g., lookfor in will find the variable, income, and variables whose variable labels are ethnic minority and moves since age 16.

Using and Saving Stata Datasets

A "Stata Dataset" is one in the special Stata format. Stata datasets have the extension .dta. To open a Stata Dataset type the use command followed by the name of a Stata dataset.

use survey1

If you only need some of the variables from a Stata Dataset, you can just read in those variables with this variant of the use command:

use age sex status using survey1

To save a Stata Dataset type the save command plus a filename. If it already exists, you will need to add the option, replace. You do not have to type the file extension. The extension will be .dta by default.

save survey2, replace

Note:  To create a file that can be read by version 8 or 9 of Stata, use the saveold command.

        saveold survey2

If you are using Stata/SE and want to save the dataset for use in the smaller, Intercooled version of Stata, use the option intercooled on the save command:

        save surveyl, intercooled

String variables must be less than 80 characters to be saved in Intercooled.

Some Useful Commands

  • describe
    Describes the currently active data file, showing the number of observations and variables, size of file, names and types of variables. describe,short gives info about the file but not the variables.
  • summarize
    Gives summary statistics. You can give it an argument of a list of variables.
  • codebook
    Creates a simple codebook describing your data.
  • clear
    Clear everything from memory, including your data, value labels, equations, etc. In effect, it resets Stata.
  • memory
    Check memory allocations.
  • list
    Lists all or part of the currently active data. The command can be quite complex. You can give it arguments of a list of variables, cells, rows, pattern and conditional matches, for example, the command:

        list age status in 1/100 if age>14

    lists the variables age and status for those aged over 14 in the first 100 observersions. Be careful about using list. If you have a very large dataset, the listing will go on and on.
  • tab1 [varname]
    Do simple frequencies on a variable. It is always worth knowing what the data you're working with looks like.

Reading ASCII Data into Stata

The two most common commands to read data from an ASCII file into Stata are insheet and infile:

  1. insheet
    Use insheet if the file was created by a spreadsheet or a database program with one observation per line and the variable delimiter is a comma or a tab character. If you are coming from Excel, create a .csv file. The first line can be a list of variables. A period (.) is understood to mean a numeric missing value; double quotes ("") to mean a missing string variable.
  2. infile is used to read fixed format raw data without delimiters using a dictionary file. See example below or click here for more information on writing a dictionary file.

The syntax used in the command window and examples are below. The commands can also be initiated from the File menu using the Import option.

Insheet

Syntax

          insheet using filename, options

where filename is the name of the ASCII file created by the spreadsheet or database program. By default, Stata will assign the names v1,v2,...,vn to the variables. If you saved the spreadsheet file with the names of variables in the first row, Stata can use them if you specify the option names, for example:

insheet using mystuff.dat, names

If you didn't save the spreadsheet file with the names of variables, you can add them later with the label command. If you have a lot of variables, make up a .do file with all the label commands.

Infile with a Dictionary File

Syntax: 

        infile using dictionary-file

If the variables in your data file are not delimited, you need a Dictionary File to describe the positions of your variables to Stata, where dictionary-file is the dictionary file containing the specifications for reading the variables. Here's an example.

        dictionary using dump.dat {
                _column(1) id %5f
                _column(6) age %2f
                _column(8) str1 sex %1s
                _column(9) str1 status %1s
          }

Click here for more information on writing a dictionary file.

Working with Large Files & Increasing Memory

If you get the message "No more room for observations" (as opposed to variables), you don't have enough memory to read in your entire Stata dataset. The command memory gives a report on memory usage. To increase memory, give the command:

set memory #mm

where # is a number and mm is megabytes.

You may be able to reduce your memory requirements by saving your data more efficiently. Stata's default variable type is 8 bytes. This is unnecessarily large for most social science data. Use stata's compress command to reduce your data to its most efficient format and then resave your file.

Note: The compress command does not create a compressed version of your file in the way that compression utilities such as gzip or pkzip do. Rather, the Stata compress command changes the data types to store your variables such that each variable is stored optimally. See the Stata Manual for more information on Stata variables types.

Further note: If you get the message "No more room for variables" (as opposed to observations), you have too many variables on the file for the default maximun allowed.  To override the default use the maxvar command but first determine how many are on the file with the describe commmand on the unopened file (eg.describe using"c:/myfile.dta", short).

Intercooled Stata has an absolute limit of 2,047 (2**11 -1) . Stata SE has a limit of 32,767 (2**15 - 1). If you know the names of the variables, you can read in only the ones you need. Since Stata works almost entirely in memory, the fewer the variables (and observations) the faster it runs.

The log File

To start logging your session on cunix, give the Stata command:

log using filename

where filename is the name of the file. It can be any name. .smcl will be the file extension. This is a Stata-proprietary format. To log to an ordinary ASCII file use the t option:

log using filename, t

This will save the log to filename.log.

If the file already exists, it will be appended to. Logging can be turned on and off any number of times during a Stata session. The log file closes automatically when you exit Stata. If you use the t option, the log file is an ASCII file so you can edit it with any unix editor (pine, emacs, vi).

Stata in Batch Mode

It is possible to run Stata in batch mode on Cunix.  Prepare a file with the Stata command you want to be executed. Be sure to turn paging off. Save the file with the extension .do. (Click here to see an example.) Then run Stata in the background with this command:

stata -q -b do mybatch > NUL &

where mybatch is the filename of the file containing your Stata commands. Output will be in a file with the same name as the input do file and with the file extension .log. (The > NUL gets rid of the "running on computer" message.)