Preliminary plan
Notes, Data sets and programs
for
Postgraduate course
in
Linear and logistic regression

Edited November 28th, 2011
By Morten Frydenberg
morten@biostat.au.dk



If you are using your own laptop at the exercises then download datasets and do-file.
A list of all data sets All data
If you want the data in SAS or R go here.

Where to store downloaded and homemade Stata programs (so called ado files):
By default Stata assume that the downloaded ado and help files are located in
C:\ado\personal (for you personal/homemade programs) and
C:\ado\plus (for the one you download from the net).
C:\ado (ado files made many years ago).

You can check the setting by by the commmand sysdir in Stata.

If you want to store the programs in another locating, say on your R drive,
you do this by the commands

sysdir set PERSONAL "R:\ado\personal\"
sysdir set PLUS     "R:\ado\plus\"
sysdir set OLDPLACE "R:\ado\"

These lines should either be in your profile.do file or in the begining of every do file you use.

User written Stata procedure you need during the course:
The .ado and the .sthlp files should be place in your personal ado-folder (see above).
  • Confidence interval for the standard deviation in a regression model
    (This will now work with Stata 9 , 10 and 11):
    cisd.ado and cisd.sthlp
  • Save coefficients and writting equation after a regression model:
    (Version 1.12 will now work on complicate models with many parameters and _const is saved as b0)
    regeq.ado and regeq.sthlp

Day 1: Monday November 7th 2011
9.15 - 10.30 Lecture: Simple linear regression -1 .
The model, the parameters, estimation and inference.

All Stata code used at the lecture.
Data set used at the lecture: lung
10.30 - 12.00 Exercises .
Data set used at the exercises: lung.
12.00 - 13.00 Lunch break
13.00 - 14.30 Lecture: Simple linear regression -2 .
Checking the model, residuals, leverage, diagnostics plots,transformation of variables.

Most of Stata code used at the lecture.
Data set used at the lecture: lung and gfrdata.
14.30 - 16.00 Exercises .
Data set used at the exercises: gfrdata and glyco.


Day 2: Wednesday November 9th 2011

9.15 - 9.30 Summarizing Mondays exercises.
9.30 - 10.30 Lecture: Multiple linear regression - 1 Note updated May 10th 2011.
The model, the parameters, estimation and inference.
Checking the model.

All of Stata code used at the lectures today. Note updated May 10th 2011.
Data set used at the lecture: fram200 .
10.30 - 12.00 Exercises .
Data set used at the exercises: lung and fram200 .
12.00 - 12.30 Lunch break
12.30 - 14.00 Lecture:
Prior to Stata 11 Multiple linear regression - 2
Stata 11             Multiple linear regression - 2
Working med categorical explanatory variables
Interaction/effectmodification.
14.00 - 15.30 Exercises .
Data set used at the exercises: lung and fram200 .


Day 3: Friday November 11th 2011

9.15 - 12.00 Exercises .
Data set used at the exercises: serumchol.
12.00 - 12.30 Lunch break
12.30 - 13.30 Exercises continued .
13.30 - 15.30 Lecture: Linear regression, collinerarity, splines and extensions Note updated May 12th 2011
Collinearity
Restricted cubic splines
Clustered data

Some off Stata code used at the lecture.
Data set used at the lecture : serumchol194 , Framingham, and FEV .

Home work
The homework to the last week is to go through the lectures on logistic regression day 7 in the Basic Statistics course Day7.pdf (Day7.do).
After that you should complete the exercises Exercise7.pdf with data : postterm and tatsoib.
SPSS users should substitute exercise 7.1 with SPSSday7_1.pdf.


Day 4: Monday November 21st 2011

9.15 - 10.00 Discussing the home work.
10:15 - 12:00 Lecture:
Prior to Stata 11 Logistic regression .
Stata 11             Logistic regression .
Odds ratios via logistic regression
Continuous independent variables
Categorical independent variables
Interactions
Wald and likelihood ratio test
The logistic regression model in general

Most of Stata code used at the lectures.
Data set used at the lecture: obese and case_control.
12.00 - 12.30 Lunch break.
12.30 - 14.00 The lecture continued.
14.00 - 15.30 Exercises
Data set used at the exercises: obese.
A short sketch of a solution: RegDay4Ex.do updated November 28th 2011


Day 5: Wednesday November 23rd 2011

9.15 - 10.00 Exercises. - Monday afternoon continued
10.15 - 12.00 Lecture: Modelbuilding in regression models
Modelbuilding: this to consider
Confounding and adjustment
Model selection an its consequences
Over-fitting
A strategy
12.00 - 12.30 Lunch break
12.30 - 15.30 Exercises
Data set used at the exercises: coffee.


Day 6: Friday November 25th 2011

9.15 - 12.00 Working with wednesdays exercise
12.00 - 12.30 Lunch break
12.30 - 13.30 Discussing wednesdays exercise
13.30 - 15.00 Lecture: Working with logistics regression models and Extensions .
Diagnostics for logistic regression
Conditional logistic regression
Models for relative risk or risk differences
Missing data ´
Binary data with several random components

Some of Stata code used at the lectures.
15.00 - 15.30 Course evaluation