Make your own free website on Tripod.com
 
 

 

 

Simple Linear Regression

Multiple Regression

Correlation Coefficient

Regression Equation

Least Squares

Residual

Multiple Regression Correlation Coefficient

Stepwise Regression

 

 

 

Simple Linear Regression

Simple linear regression aims to find a linear relationship between a response variable and a possible predictor variable by the method of least squares.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Multiple Regression

Multiple linear regression aims is to find a linear relationship between a response variable and several possible predictor variables.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Correlation Coefficient

A correlation coefficient is a number between -1 and 1 which measures the degree to which two variables are linearly related. If there is perfect linear relationship with positive slope between the two variables, we have a correlation coefficient of 1; if there is positive correlation, whenever one variable has a high (low) value, so does the other. If there is a perfect linear relationship with negative slope between the two variables, we have a correlation coefficient of -1; if there is negative correlation, whenever one variable has a high (low) value, the other has a low (high) value. A correlation coefficient of 0 means that there is no linear relationship between the variables. .

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Regression Equation

A regression equation allows us to express the relationship between two (or more) variables algebraically. It indicates the nature of the relationship between two (or more) variables. In particular, it indicates the extent to which you can predict some variables by knowing others, or the extent to which some are associated with others.

A linear regression equation is usually written

Y = a + bX + e

where

Y is the dependent variable

a is the intercept

b is the slope or regression coefficient

X is the independent variable

e is the error term

The equation will specify the average magnitude of the expected change in Y given a change in X.

The regression equation is often represented on a scatterplot by a regression line.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Least Squares

The method of least squares is a criterion for fitting a specified model to observed data. For example, it is the most commonly used method of defining a straight line through a set of points on a scatterplot.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Multiple Regression Correlation Coefficient

The multiple regression correlation coefficient, R2, is a measure of the proportion of variability explained by, or due to the regression (linear relationship) in a sample of paired data. It is a number between zero and one and a value close to zero suggests a poor model.

A very high value of R2 can arise even though the relationship between the two variables is non-linear. The fit of a model should never simply be judged from the R2 value.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Residual

Residual (or error) represents unexplained (or residual) variation after fitting a regression model. It is the difference (or left over) between the observed value of the variable and the value suggested by the regression model.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Stepwise Regression

A best regression model is sometimes developed in stages. A list of several potential explanatory variables are available and this list is repeatedly searched for variables which should be included in the model. The best explanatory variable is used first, then the second best, and so on. This procedure is known as stepwise regression

 

 

 

 

Example on simple linear regression


Several college students applied their recently
obtained statistical knowledge to their hunt for
the best deal on an apartment. One of several
determinants of monthly rent was apartment
Size. The students collected a sample of 20
apartments with the following monthly rental
prices and square footage. Conduct a simple 

linear regression analysis.

 
   

Example


 
   

 

 

Example


 

 

 
   

Example

 

 

 

 
   

 

 

Example


lThe coefficient of correlation
r = 0.786, confirms the positive linear relationship that we observed in the scatter plot.
l
lNext, is to perform the regression analysis.

 

 
   

Example


l
lStat>Regression>Regression
lSelect Rent in Response
lFootage in Predictors
lClick Results and choose the second Display.
lClick OK

 

 
   

 

 

Example


 

 
   

Example


Regression Analysis: Rent versus Footage
The regression equation is
Rent = 184 + 0.314 Footage
Predictor        Coef     SE Coef          T        P
Constant       183.70       51.12       3.59    0.002
Footage       0.31364     0.05823       5.39    0.000
S = 17.60       R-Sq = 61.7%     R-Sq(adj) = 59.6%
Analysis of Variance
Source            DF          SS          MS         F        P
Regression         1      8986.6      8986.6     29.01    0.000
Residual Error    18      5575.1       309.7
Total             19     14561.7

 

 
   
Example


lThe first part of the minitab output gives the regression equation:
The regression equation is
Rent = 184 + 0.314 Footage

 

 
   

Example


l
S = 17.60       R-Sq = 61.7%     R-Sq(adj) = 59.6%
lAbout 62% of the total sum of squared errors of the monthly rents about their mean can be explain by the regression equation.

 

 

 

Example on multiple linear regressiom


  • lWhen a house needs to be appraised for a mortgage or property taxes, the appraiser typically approaches the problem by selecting four to six comparable  homes in the area which have sold recently.
  • lThen the price is adjusted up or down to reflect differences between comparable homes.

 

 
   

Example


lSuppose a homeowner in a residential area is interested in predicting the value of her home, and has gathered the following data on homes for sale in her area.
lThe objective is to develop a useful regression model.

 

 
   

The Data


 

 
   
Scatter Plot


lTo develop the model, we construct scatter plots to study the relationship between the response variable (y) and the independent variable and calculate correlation between all pairs of variables.

 

 

 
   
Scatter Plot


lConsider the following model,
lWith y = price, x1 = bedrooms,
x2 = area and x3 = age
 
   
Scatter Plot

 

  • lClick Graph > Matrix Plot
  • lSelect Price, Bedrooms, SqFtArea and Age in  Graph Variables
  • lClick Options, choose Upper right Matrix  Display
  • lOK

 

 
   

Scatter Plot

 

 
   

Scatter Plot


 

 
   

Scatter Plot


 

 
   

Scatter Plot


  • lWhat can we say about the scatter plot?
  • lIn the first row of the matrix show the relationships between the response variable and the independent variables.

 

 

 
   
 
 
 
 

Scatter Plot

  • lThe first scatter plot indicates a positive relationship between price and number of bedrooms.
  • lThe second scatter plot shows a positive relationship between price and area of the home.
  • lThe last scatter plot has a negative relationship.
  • The other three scatter plots  show the relationship between pairs of independent variables.
  • lThe next step is to find the correlation between response and the independent variables.
 
   
 
Correlation
  • lClick Stat>Basic Statistics>Correlation
  • lSelect Price, Bedrooms,SqFtArea and Age in Variables
  • lOK

 

 

 
   

Correlation


 

 
   

Correlation


 

 

 
   

Correlation


 

 
   
Correlation


lThe evidence found in the scatter plot were supported by the correlation value in the output.
lThe highest correlation is between sale price and area of the home.
lThe moderately high correlation between number of bedrooms and area indicates possible multicollinearity.

 

 

 
   
Regression Equation


  • lClick Stat > Regression>Regression
  • lSelect Bedrooms, SqFtArea and Age in Predictors
  • lClick Results and choose the second Display
  • lOK

 

 

 
   
   
Regression Equation
 

 

 

 

 

   
Regression Equation
 

 

 

 

 

   
Regression Equation


 

 

 

 

 

   
Least Squares Regression Equation


lThe following model,
lwith y = price, x1 = bedrooms,
x2 = area and x3 = age is fit to the data

 

 

 

 

 

   
Least Squares Regression Equation


lThe first part of the Minitab output gives the regression equation,

 

The regression equation is
Price = 54686+3232 Bedrooms + 33.4 SqFtArea - 672 Age
 

 

 

 

 

Coeffient of Determination


lThe coefficient of determination is R2 = 68.4%.
lWhich mean approximately 68% of the total variation in home prices is explained by the regression model.
l32% is not explained by the regression model.

 

 

 
   
Testing The Usefulness Of The Model


lSome hypothesis testing must be performed to determine whether the model is useful in predicting sale price.
lTo test whether the overall model is useful, the null and alternative hypotheses are;

 

 
   
 Hypotheses Testing


lThe test statistics F = 23.07 and the p-value = 0.000 are given in the analysis of variance table.
lSince the p-value = 0.000, we would reject H0 for any     level.

 

 
   
   
 
 
Hypotheses Testing


lWe have strong evidence to conclude the model is useful for predicting the sale price of residential property.
lThe next step is to test the usefulness of the predictors.

 

 
   
Usefulness Of The Predictors


lThe least useful predictor is one with the highest p-value, which in this example is the number of bedrooms.

 

 

 
   
Usefulness Of The Predictors
lFrom the regression coefficient table,
Predictor        Coef     SE Coef          T        P
Constant        54686       13821       3.96    0.000
Bedrooms         3232        5151       0.63    0.535
SqFtArea       33.419       5.474       6.11    0.000
Age            -672.2       258.9      -2.60    0.014
lThe p-value = 0.535 so we do not reject the null hypotheses. There is not sufficient evidence that the number of bedrooms is a useful predictor.

 

 
   
New Model


lSince we do not have enough evidence that the number of bedrooms is a useful predictor, try to make a new model by excluding the number of bedrooms.
lRun the regression analysis again using area and age as predictors.

 

 
   
 

New Model


  • lClick Stat>Regression>Regression
  • lSelect Price in Response
  • lSelect SqFtArea and Age in Predictors
  • lClick Results and choose the second Display
  • lOK

 

 

 
   

New Model


 

 

 
   
 

New Model


 

 

 
 
Residual Analysis


lTo determine whether the regression model is misspecified, whether there are unusual observations or outliers.
lThe model assumes that the errors are independent and that probability distribution of     is normal with zero mean and a constant varians.
 

 

 

 

 Residual Plots

 

  • lClick Stat>Regression>Regression
  • lSelect Price in Response and AqFtArea and Age in Predictor
  • lClick Storage, choose Residuals and Fits
  • lClick Results and choose the second Display
  • lOK

 

 
 

Residual Plots


 

 

 
 

Residual Plots


 

 

 
 

Residual Plots


 

 

 
 
Residual Plots


  • lClick Stat>Regression>Residual Plot
  • lSelect RESI1 in Residuals
  • lSelect FITS1 in Fits
  • lEnter a Title
  • lOK

 

 

 
 

Residual Plots


 

 

 
 

Residual Analysis


 

 
 

Residual Analysis


 

 

 
 
Residual Analysis


lA normal distribution of residuals would plot a straight line on the normal plot and as a mount shaped histogram.
lBoth plot indicates a normal distribution.

 

 

 
 
Residual Analysis


  • lThe I Chart and Residuals Vs. Fits plot a random pattern in the residuals.
  • lThe successful test given with the output  indicate that errors are independent and there  are no outliers or unusual residuals.

 

 

 
 
Stepwise Regression


lIt is a method of selecting, from a set of independent variables, those that produce the best equation.
lIt then selects the independent variable that has the highest partial correlation.
 

 

 

 

 

Stepwise Regression


lConsider the data given previously.
lA homeowner in a residential area is interested in predicting the value of her home and has gathered data on homes for sale in her area.
lUse stepwise regression to identify significant variables.

 

 

 
 

Stepwise Regression


  • lClick Stat>Regression>Stepwise
  • lSelect Price in Response
  • lSelect Bedrooms,SqFtArea, Age in Predictors
  • lOK

 

 

 
 
Stepwise Regression


 

 

 

 

 

 

Stepwise Regression


 

 

 
 

Stepwise Regression


 

 

 
 
Stepwise Regression


lAt Step 1, SqFtArea was selected as the most useful predictor. The model at Step 1 is,
Price = 49046 + 35.0 SqFtArea
  with R2 = 61.7%
 

 

 

 

 

 

Stepwise Regression


lAt Step 2, Age was added as a useful predictor. The model at Step 2 is,
Price = 60794 + 35.4 SqFtArea –645 Age
with R2 = 68.0%

 

 

 
 

Stepwise Regression


lStepwise regression has selected the variables SqFtArea and Age, the same variables which we had selected in the regression analysis process in previous example.