The Basics of Stata
Methods and Statistics in Political Science
Political Science 221

This guide explains the basic operations of Stata, a computer program for statistical analysis.  You will need to use this program in order to complete problem sets for the course.  The guide focuses on the aspects of Stata listed below.  For further assistance with calculating statistics in Stata, please see Statistical Commands in Stata.

  1. Accessing Stata in the campus computer labs

  2. Calling up the survey data set from the public server

  3. Calculating the mean of a variable

  4. Saving and printing the output from your analysis

  5. Exiting Stata

  6. Answering questions about Stata

I. Accessing Stata in the campus computer labs

1. Turn on the computer, and log in by typing in your userid and password.

2. When the main screen appears, use the mouse to move the highlighted arrow on top of Start in the lower left-hand corner of the screen.  Click the left mouse key once.

3. A new set of options will appear.  Move the mouse arrow to All P rograms, and click the left mouse key.

4. When another set of options appears, move the mouse arrow to Stata (or Intercooled Stata 9), and click the left mouse key.  This command directs the computer to open the program Stata, which you will use to analyze the data set.

II. Calling up the survey data set from the public server

1. When you open Stata, a new window will appear, containing several windows inside it.  Of these inside windows, the two most important are the Stata Command window and the Stata Results window.  The command window contains a blinking cursor.  You type in any Stata commands here.  Currently, the Results window is largely blank, but it will present the results of your analysis and any commands that you enter.

2. As you work through this guide, you will use a data set containing voters' evaluations of U.S. Senate candidates in 1988.  To open this data set in Stata, type use "P:\Political Science\Pol221 - Data\ses1988.dta" , clear and press Enter.  The portion of this command inside the quotation marks gives the location and name of the data set that you wish to use.  The final part of the command (, clear) tells Stata to clear any data sets currently in its memory, in order to make room for the new data set.

3. You can also bring up data sets using the drop-down menus at the top of the overall Stata window.  With this approach, opening a data set is exactly like opening a Word document; you must browse through various subdirectories until you final the relevant file.  But, most of the commands below use the Stata Command window.  This guide therefore explains how to do every relevant command using the command window.

4. With the data set loaded, you are now ready to begin analyzing the data set.  It is organized as follows.  In the 1988 survey, respondents were asked a series of questions about the two major party Senate candidates in their state in 1988 (respondents with no Senate race were excluded).  For example, respondents were asked to rate each candidate on a 100-point thermometer scale, with 0 indicating an extremely unfavorable rating and 100 an extremely favorable rating.  In the data set that you will use for this guide (ses1988.dta), the survey respondents have been manipulated, so that the unit of observation is a respondent's rating of a candidate.  Since each respondent rated two candidates, each respondent is included twice in the data set, once for each candidate rating that the respondent provided.  Thus, the data set looks like this:

Respondent number thermometer rating of candidate other variables ...
respondent #1 respondent rating of Democratic candidate  
respondent #1 respondent rating of Republican candidate  
respondent #2 respondent rating of Democratic candidate  
respondent #2 respondent rating of Republican candidate  
respondent #3 respondent rating of Democratic candidate  
respondent #3 respondent rating of Republican candidate  
.... .... ....

III. Calculating the mean of a variable

1. Most analysis of the data set will require you to examine the answers that respondents provided to survey questions.  These answers are captured in variables, with one variable per question.  (For more information on the variables, see the codebook for the ses1988.dta data set, which is also available from the class web page.)  For a particular question (i.e., variable), you can obtain the average value (mean) of the respondents' answers, by using the sum command.

2. Imagine that you want to know the mean thermometer rating of candidates in the 1988 survey.  According to the codebook of variables, the name of this variable is therm.  To obtain the mean value of this variable, type sum therm in the Stata command window, and press Enter.

3. The results window will then display several numbers involving therm.  The first (Obs, or 2818)  is the number of observations (in this case, respondents) used to calculate the average. The second number is the actual average (56.39461).  The third number is the standard deviation of the variable (25.04371).  The last two numbers are the minimum and maximum values of the variable for observations in this data set. 

4. You may wish to calculate the mean for therm and another variable such as contactc (the number of contacts that a respondent has with a candidate).  At the command window, type sum therm contactc ; you can extend this format for as many variables as you would like.  To obtain the means of all variables in a data set, simply type sum .

5. You may also wish to learn the frequency distribution for a variable, i.e., the number of respondents who fall into each category of the variable.  To obtain these values, type tab contactc and press Enter.  This command produces for each category of contactc the number of respondents in that category, the percentage of all respondents in that category, and the cumulative percentage of respondents across the categories of contactc.

6. Another possible calculation is the average value of the thermometer ratings (therm) for respondents with different levels of contact with the candidate.  To obtain these means, type tab contactc, summarize(therm) and press Enter.  This command directs Stata to summarize the therm variable (i.e., calculate means, etc.) for different values of the contactc variable.

7. This last table indicates substantial variation in thermometer scores across levels of candidate contact.  If a respondent has had no contact with a candidate, the average thermometer rating of that candidate is about 51.  Among respondents with the maximum number of contacts (8), the average thermometer rating is about 75.  

IV. Saving and printing the results of your analysis

1. When you finish your analysis, you often will want a hard copy of any results that you produced.  You can then use this hard copy to produce tables in Word that you will turn in with your assignments.  Stata makes it easy to save and print the results of your analysis.

2. After figuring out all the commands for creating the needed results, type log using temp.log, replace and press Enter.  This command directs Stata to create a log (or record) of all commands that you subsequently type and their corresponding output.  The commands and output will be located in a file called temp.log.  The , replace option tells Stata to overwrite any previous version of this file and create a new version with the most recent set of commands.

3. Next, type in the commands needed to create the results.  Stata offers two nice shortcuts for repeating previously-typed commands.  First, you can use the Page Up key to have previous commands displayed in the command window.  Try pressing the Page Up key several times to illustrate this shortcut.  Second, the Review window contains all the results that you have typed thus far in your current Stata session.  Double-click on one of these commands, and Stata will execute it again.  These two shortcuts can save valuable time when you recreate your output for the log file.

4. When you have finished producing the necessary output, type log close and press Enter.  This command directs Stata to close the log file and to cease putting a copy of commands and results into that file.

5. You can then edit or print the log file (temp.log) just as you would any Word file.

V. Exiting Stata

When you have finished your analysis and printed the results, you can exit Stata by typing exit in the command window and pressing Enter.  The computer will then exit Stata.

VI. Answering questions about Stata

When you need help in figuring out a particular command in Stata, you can turn to several sources.  First, you can type help in the command window.  Stata will then display a wide range of help options available in the program.  Second, you can type help and then the name of a specific Stata command, if you are uncertain about how to use it.  For example, the command help sum produces directions on how to use the sum command.  Finally, if you still are unable to figure out a particular command, email the professor who teaches your class.