# What Simple Linear Regression Is and How It Works

## A Basic Statistics Approach to Analyzing Quantitative Data

Linear regression models are used to show or predict the relationship between two variables or factors. The factor that is being predicted (the factor that the equation *solves for*) is called the** **dependent variable. The factors that are used to predict the value of the dependent variable are called the independent variables.

Good data does not always tell the complete story. Regression analysis is commonly used in research as it establishes that a correlation exists between variables. But correlation is not the same as causation. Even a line in a simple linear regression that fits the data points well may not say something definitive about a cause-and-effect relationship.

In simple linear regression, each observation consists of two values. One value is for the dependent variable and one value is for the independent variable.

- Simple Linear Regression Analysis The simplest form of a regression analysis uses on dependent variable and one independent variable. In
- this simple model, a straight line approximates the relationship between the dependent variable and the independent variable.
- Multiple Regression Analysis When two or more independent variables are used in regression analysis, the model is no longer a simple linear one.

### Simple Linear Regression Model

The simple linear regression model is represented like this: ** y** = (

*β*0 +

*β*1 +

*Ε*By mathematical convention, the two factors that are involved in a simple linear regression analysis are designated ** x** and

**. The equation that describes how**

*y***is related to**

*y***is known as the**

*x***regression model**. The linear regression model also contains an error term that is represented by

**, or the Greek letter epsilon. The error term is used to account for the variability in**

*Ε***that cannot be explained by the linear relationship between**

*y***and**

*x***. There also parameters that represent the population being studied. These parameters of the model that are represented by (**

*y**β*0+

*β*1

*x*).

The simple linear regression equation is represented like this: ** Ε**(

**) = (**

*y**β*0 +

*β*1

*x*).

The simple linear regression equation is graphed as a straight line.

(*β*0 is the ** y** intercept of the regression line.

*β*1 is the slope.

** Ε**(

**) is the mean or expected value of**

*y***for a given value of**

*y***.**

*x*A regression line can show a positive linear relationship, a negative linear relationship, or no relationship. If the graphed line in a simple linear regression is flat (not sloped), there is no relationship between the two variables. If the regression line slopes upward with the lower end of the line at the ** y** intercept (axis) of the graph, and the upper end of line extending upward into the graph field, away from the

**intercept (axis) a positive linear relationship exists. If the regression line slopes downward with the upper end of the line at the**

*x***intercept (axis) of the graph, and the lower end of line extending downward into the graph field, toward the**

*y***intercept (axis) a negative linear relationship exists.**

*x*### Estimated Linear Regression Equation

If the parameters of the population were known, the simple linear regression equation (shown below) could be used to compute the mean value of ** y** for a known value of

**.**

*x*** Ε**(

**) = (**

*y**β*0 +

*β*1

*x*).

However, in practice, the parameter values are not known so they must be estimated by using data from a sample of the population. The population parameters are estimated by using sample statistics. The sample statistics are represented by *b*0 +*b*1. When the sample statistics are substituted for the population parameters, the estimated regression equation is formed.

The estimated regression equation is shown below.

(** ŷ**) = (

*β*0 +

*β*1

*x*

(** ŷ**) is pronounced

*y hat*.

The graph of the estimated simple regression equation is called the estimated regression line.

The *b*0 is the y intercept.

The *b*1 is the slope.

The ** ŷ**) is the estimated value of

**for a given value of**

*y***.**

*x***Important Note:** Regression analysis is not used to interpret cause-and-effect relationships between variables. Regression analysis can, however, indicate how variables are related or to what extent variables are associated with each other. In so doing, regression analysis tends to make salient relationships that warrant a knowledgeable researcher taking a closer look.

**Also Known As: **bivariate regression, regression analysis

**Examples: **The **Least Squares Method** is a statistical procedure for using sample data to find the value of the estimated regression equation. The Least Squares Method was proposed by Carl Friedrich Gauss, who was born in the year 1777 and died in 1855. The Least Squares Method is still widely used.

**Sources:**

Anderson, D. R., Sweeney, D. J., and Williams, T. A. (2003). Essentials of Statistics for Business and Economics (3rd ed.) Mason, Ohio: Southwestern, Thompson Learning.

______. (2010). Explained: Regression Analysis. MIT News.

McIntyre, L. (1994). Using Cigarette Data for An Introduction to Multiple Regression. *Journal of Statistics Education, 2*(1).

Mendenhall, W., and Sincich, T. (1992). Statistics for Engineering and the Sciences (3rd ed.), New York, NY: Dellen Publishing Co.

Panchenko, D. 18.443 Statistics for Applications, Fall 2006, Section 14, Simple Linear Regression. (Massachusetts Institute of Technology: MIT OpenCourseWare)