Statistical Commands in Stata

The table below lists all the statistical commands in Stata that you will need to use in order to complete the daily problems and problem sets for POL221.  The first column in the table lists the actual wording of each command, using generic datasets and variables (such as "datasetname" or "varname").  The second column explains what each command does.  The third column provides an example of each command, using an actual dataset and variables in that dataset.  To access that dataset, click on P:\Political Science\Pol221 - Data\euro.dta.  For a description of the variables in the dataset, click on Codebook for euro.dta.

Please note that each command must be typed on a single line at the Stata Command Prompt.  Varying the size of this web page may force a command to appear on more than one line.  For an initial introduction to using the Stata program, please see The Basics of Stata.

Actual Command Description Example
use "datasetname", clear load a data set called "datasetname" into Stata (make sure that you use the correct name and address for the data set) use "P:\Political Science\Pol221 - Data\euro.dta", clear
sum varname for a variable called "varname," calculate the mean, standard deviation, minimum and maximum values, and number of cases sum age
sum varname, detail for a variable called "varname," calculate the mean, standard deviation, minimum and maximum values, and number of cases, plus percentile distribution (including median) sum age, detail
list varname1 varname2 ... varname* list the values (across all observations) for "varname1," "varname2," ... "varname*" list age north employed
function if varname1=="value of varname1" perform any command such as "function" for all observations where "varname1" equals "value of varname1" list age north employed if country==1
tab varname for a variable called "varname," calculate the frequency distribution tab country
tab varname1 varname2 for two variables called "varname1" and "varname2," calculate a crosstabulation or table tab country north
tab varname1, summarize(varname2) for two variables called "varname1" and "varname2," split data into groups, using varname1; for each group, calculate average of varname2 tab country, summarize(employed)
ttest varname1, by(varname2) for two variables called "varname1" and "varname2," calculate an independent samples difference of means test (samples grouped by varname2; calculate means of varname1); assumes equal variances ttest age, by(north)
ttest varname1, by(varname2) unequal for two variables called "varname1" and "varname2," calculate an independent samples difference of means test (samples grouped by varname2; calculate means of varname1); assumes unequal variances ttest age, by(north) unequal
ttest varname1=varname2 for two variables called "varname1" and "varname2," calculate a non-independent (i.e., dependent) samples difference of means test (compare values for varname1 and varname2 across same set of cases) not possible with euro.dta dataset
corr varname1 varname2 ... varname* for variables called "varname1," "varname2," up to "varname*," calculate the correlation coefficient for each possible pair of all variables listed (from varname1 to varname*) corr age educ country
tab varname1 varname2, chi calculate a chi-squared test for "varname1" and "varname2" tab employed north, chi
tab varname1 varname2, taub calculate Kendall's tau-b for "varname1" and "varname2" tab employed north, taub
reg dv iv1 iv2 ... iv* estimate a regression model where "dv" is the dependent variable, and the independent variable(s) are "iv1", "iv2", up to "iv*" reg educ age employed country