Basic Stata Use

Details

How do I set my working directory?

Your working directory is the directory or folder in which Stata looks when you give it a disk access command.

  • To see what your current working directory is, type "pwd"
  • To see the files and folders in your working directory, type "ls"
  • To move up one level in your directory tree, type "cd .."

For instance, if you are in /Home/Users/johndoe/Stata and you type "cd .." then you will be in /Home/Users/johndoe.

If your current working directory has a folder called "myfolder" in it and if you want to change your working directory to "myfolder," type "cd myfolder." This, for example, could move you from "/Home/Users/johndoe/Stata" to /Home/Users/johndoe/Stata/myfolder."

Important note! If your folder name has a space in it, you have to enclose the name in quotes. For example: cd "My Folder"

You can use multiple "cd .." and "cd NAMEHERE" commands to move anywhere you want in your hard disk (NAMEHERE refers to a folder into which you want to move; remember to enclose NAMEHERE in quotes if necessary).

How do I calculate means, variances, and standard deviations?

Use the command "summarize." You can simply type "summarize," in which case you will get means, standard deviations, and so forth for all variables in memory. Or, you can type

summarize VARNAME

which will give you a summary of the variable VARNAME. Also, add the option detail, as in "summarize, detail" or

summarize VARNAME, detail

to get various percentiles.

How do I delete observations from a data set?

Use the "drop" command. Suppose that a data set has 10 observations. If you type "drop in 5" then the 5th observation will be deleted. Similarly, you can type "drop in 1/3" to drop the first three observations. Another way to drop delete observations is to use an if" clause.

For example, "drop if VARNAME<4" will drop all observations that have VARNAME<4. One could have more complication expressions like "drop if VARNAME1<5 & VARNAME2>5" and so forth.

How do I have Stata report normal tail areas and inverse normal tail areas?

To compute the left tail area for a given z value, use the following command:

display normal(z)

where z is the value of interest.

To compute the inverse tail area for an area equal to p, use the following command:

display invnormal(p)

The use of y is generic, and any acceptable label will work.

How do I use Stata to calculate tail areas and critical values for the t distribution?

Use the function ttail(n,t) where n is degrees of freedom and t is the critical value of interest. Also, use the function invttail(n,p) where p is a right tail area from a t distribution with n degrees of freedom. For example,display ttail(5,2)will return the upper tail area (to the right of 2) of a t distribution with 5 degrees of freedom. Similarly,display invttail (5,.05) will return the critical value from a t distribution with 5 degrees of freedom such that the area to the right of the value is 0.05.NOTE: Stata is very picky about spaces. Make sure that function calls do not have spaces after arguments. In other words, "ttail (5,2)" will generate an error but "ttail(5,2)" will not.

How do I calculate the correlation between two variables?

Use the command "correlate" as in

correlate VARNAME1 VARNAME2

which will produce a 2 x 2 correlation matrix. One can also type

correlate VARNAME1 VARNAME2 ... VARNAMEk

which will produce a k X k correlation matrix. Typing "correlate" without any arguments produces a correlation matrix for all variables.

How do I calculate the correlation between two variables?

Use the command "correlate" as in

correlate VARNAME1 VARNAME2

which will produce a 2 x 2 correlation matrix. One can also type

correlate VARNAME1 VARNAME2 ... VARNAMEk

which will produce a k X k correlation matrix. Typing "correlate" without any arguments produces a correlation matrix for all variables.

How do I generate a list of random numbers from a uniform distribution?

The command "generate x = uniform()" will draw random values from the unit interval. These values can be scaled to arbitrary intervals, i.e., "generate x1 = x*10" after assigning x as before will generate x1 from a uniform distribution on the interval from zero through ten.

How do I calculate confidence intervals?

Use the command "ci." For example, to make a 98% confidence interval for a continuous variable called VARNAME, enter "ci VARNAME, level(98)." Note the use of the level option to specify the level of the desired confidence interval. If your variable is binary as opposed to continuous, i.e., consists of zeroes and ones, then add the option "binomial" as in "ci VARNAME, binomial level(98)." There are various options for how to calculate binomial confidence intervals; these are described in online help.

How do I calculate fitted values and residuals from a regression?

After a successful "regress" command, the command

predict r, residual

will create a new variable r that contains residual values. Any variable name may be used. Similarly,

predict yhat

will create a new variable yhat that contains fitted values. As before, any variable name may be used.

How do I divide a variable by a constant (or multiply it or add to it)?

If the variable is called var1, then the command

replace var1 = var1 / k

changes var1 to var1 / k where k is a specified constant. You could also use

replace var1 = k * var1

or

replace var1 = k + var1

if you wanted to multiply var1, add to it, and so forth.

I want to calculate a chi-squared statistic for some tabulated data. I have the table but do not have the data that generated it. How can I have Stata calculate the statistics I need?

Use the "tabi" command. For example, the command

tabi 9 7 \ 13 7\ 29 26\ 50 47 \ 41 49, chi2

will make a 5 x 2 table using the provided counts and will calculate the chi-squared statistic.

Details

Article ID: 64700
Created
Tue 10/9/18 12:22 PM
Modified
Tue 11/12/19 10:12 AM