Linear Regression and Correlation

# Linear Equations

OpenStaxCollege

Linear regression for two variables is based on a linear equation with one independent

variable. The equation has the form:

where *a* and *b* are constant numbers.

The variable ** x is the independent variable, and y is the dependent variable.** Typically, you choose a value to substitute for the independent variable and then solve for the dependent variable.

The following examples are linear equations.

Is the following an example of a linear equation?

*y* = –0.125 – 3.5*x*

yes

The graph of a linear equation of the form *y* = *a* + *bx* is a **straight line**. Any line that is not vertical can be described by this equation.

Graph the equation *y* = –1 + 2*x*.

Is the following an example of a linear equation? Why or why not?

No, the graph is not a straight line; therefore, it is not a linear equation.

Aaron’s Word Processing Service (AWPS) does word processing. The rate for services is 💲32 per hour plus a 💲31.50 one-time charge. The total cost to a customer depends on the number of hours it takes to complete the job.

Find the equation that expresses the **total cost** in terms of the **number of hours** required to complete the job.

Let *x* = the number of hours it takes to get the job done.

Let *y* = the total cost to the customer.

The 💲31.50 is a fixed cost. If it takes *x* hours to complete the job, then (32)(*x*) is the cost of the word processing only. The total cost is: *y* = 31.50 + 32*x*

Emma’s Extreme Sports hires hang-gliding instructors and pays them a fee of 💲50 per class as well as 💲20 per student in the class. The total cost Emma pays depends on the number of students in a class. Find the equation that expresses the total cost in terms of the number of students in a class.

*y* = 50 + 20*x*

# Slope and *Y*-Intercept of a Linear Equation

For the linear equation *y* = *a* + *bx*, *b* = slope and *a* = *y*-intercept. From algebra recall that the slope is a number that describes the steepness of a line, and the *y*-intercept is the *y* coordinate of the point (0, *a*) where the line crosses the *y*-axis.

Svetlana tutors to make extra money for college. For each tutoring session, she charges a one-time fee of 💲25 plus 💲15 per hour of tutoring. A linear equation that expresses the total amount of money Svetlana earns for each session she tutors is *y* = 25 + 15*x*.

What are the independent and dependent variables? What is the *y*-intercept and what is the slope? Interpret them using complete sentences.

The independent variable (*x*) is the number of hours Svetlana tutors each session. The dependent variable (*y*) is the amount, in dollars, Svetlana earns for each session.

The *y*-intercept is 25 (*a* = 25). At the start of the tutoring session, Svetlana charges a one-time fee of 💲25 (this is when *x* = 0). The slope is 15 (*b* = 15). For each session, Svetlana earns 💲15 for each hour she tutors.

Ethan repairs household appliances like dishwashers and refrigerators. For each visit, he charges 💲25 plus 💲20 per hour of work. A linear equation that expresses the total amount of money Ethan earns per visit is *y* = 25 + 20*x*.

What are the independent and dependent variables? What is the *y*-intercept and what is the slope? Interpret them using complete sentences.

The independent variable (*x*) is the number of hours Ethan works each visit. The dependent variable (*y*) is the amount, in dollars, Ethan earns for each visit.

The *y*-intercept is 25 (*a* = 25). At the start of a visit, Ethan charges a one-time fee of 💲25 (this is when *x* = 0). The slope is 20 (*b* = 20). For each visit, Ethan earns 💲20 for each hour he works.

# References

Data from the Centers for Disease Control and Prevention.

Data from the National Center for HIV, STD, and TB Prevention.

# Chapter Review

The most basic type of association is a linear association. This type of relationship can be defined algebraically by the equations used, numerically with actual or predicted data values, or graphically from a plotted curve. (Lines are classified as straight curves.) Algebraically, a linear equation typically takes the form ** y = mx + b**, where

**and**

*m***are constants,**

*b***is the independent variable,**

*x***is the dependent variable. In a statistical context, a linear equation is written in the form**

*y***, where**

*y = a + bx***and**

*a***are the constants. This form is used to help readers distinguish the statistical context from the algebraic context. In the equation**

*b**y = a + bx*, the constant

*b*that multiplies the

**variable (**

*x**b*is called a coefficient) is called as the

**slope**. The slope describes the rate of change between the independent and dependent variables; in other words, the rate of change describes the change that occurs in the dependent variable as the independent variable is changed. In the equation

*y = a + bx*, the constant a is called as the

*y*-intercept. Graphically, the

*y*-intercept is the

*y*coordinate of the point where the graph of the line crosses the

*y*axis. At this point

*x*= 0.

The **slope of a line** is a value that describes the rate of change between the independent and dependent variables. The **slope** tells us how the dependent variable (*y*) changes for every one unit increase in the independent (*x*) variable, on average. The ** y-intercept** is used to describe the dependent variable when the independent variable equals zero. Graphically, the slope is represented by three line types in elementary statistics.

# Formula Review

*y* = *a* + *bx* where *a* is the *y*-intercept and *b* is the slope. The variable *x* is the independent variable and *y* is the dependent variable.

*Use the following information to answer the next three exercises*. A vacation resort rents SCUBA equipment to certified divers. The resort charges an up-front fee of 💲25 and another fee of 💲12.50 an hour.

What are the dependent and independent variables?

dependent variable: fee amount; independent variable: time

Find the equation that expresses the total fee in terms of the number of hours the equipment is rented.

Graph the equation from [link].

*Use the following information to answer the next two exercises*. A credit card company charges 💲10 when a payment is late, and 💲5 a day each day the payment remains unpaid.

Find the equation that expresses the total fee in terms of the number of days the payment is late.

Graph the equation from [link].

Is the equation *y* = 10 + 5*x* – 3*x*^{2} linear? Why or why not?

Which of the following equations are linear?

a. *y* = 6*x* + 8

b. *y* + 7 = 3*x*

c. *y* – *x* = 8*x*^{2}

d. 4*y* = 8

*y* = 6*x* + 8, 4*y* = 8, and *y* + 7 = 3*x* are all linear equations.

Does the graph show a linear equation? Why or why not?

[link] contains real data for the first two decades of AIDS reporting.

Year |
# AIDS cases diagnosed |
# AIDS deaths |

Pre-1981 | 91 | 29 |

1981 | 319 | 121 |

1982 | 1,170 | 453 |

1983 | 3,076 | 1,482 |

1984 | 6,240 | 3,466 |

1985 | 11,776 | 6,878 |

1986 | 19,032 | 11,987 |

1987 | 28,564 | 16,162 |

1988 | 35,447 | 20,868 |

1989 | 42,674 | 27,591 |

1990 | 48,634 | 31,335 |

1991 | 59,660 | 36,560 |

1992 | 78,530 | 41,055 |

1993 | 78,834 | 44,730 |

1994 | 71,874 | 49,095 |

1995 | 68,505 | 49,456 |

1996 | 59,347 | 38,510 |

1997 | 47,149 | 20,736 |

1998 | 38,393 | 19,005 |

1999 | 25,174 | 18,454 |

2000 | 25,522 | 17,347 |

2001 | 25,643 | 17,402 |

2002 | 26,464 | 16,371 |

Total |
802,118 |
489,093 |

Use the columns “year” and “# AIDS cases diagnosed. Why is “year” the independent variable and “# AIDS cases diagnosed.” the dependent variable (instead of the reverse)?

The number of AIDS cases depends on the year. Therefore, year becomes the independent variable and the number of AIDS cases is the dependent variable.

*Use the following information to answer the next two exercises*. A specialty cleaning company charges an equipment fee and an hourly labor fee. A linear equation that expresses the total amount of the fee the company charges for each session is *y* = 50 + 100*x*.

What are the independent and dependent variables?

What is the *y*-intercept and what is the slope? Interpret them using complete sentences.

The *y*-intercept is 50 (*a* = 50). At the start of the cleaning, the company charges a one-time fee of 💲50 (this is when *x* = 0). The slope is 100 (*b* = 100). For each session, the company charges 💲100 for each hour they clean.

*Use the following information to answer the next three questions*. Due to erosion, a river shoreline is losing several thousand pounds of soil each year. A linear equation that expresses the total amount of soil lost per year is *y* = 12,000*x*.

What are the independent and dependent variables?

How many pounds of soil does the shoreline lose in a year?

12,000 pounds of soil

What is the *y*-intercept? Interpret its meaning.

*Use the following information to answer the next two exercises*. The price of a single issue of stock can fluctuate throughout the day. A linear equation that represents the price of stock for Shipment Express is *y* = 15 – 1.5*x* where *x* is the number of hours passed in an eight-hour day of trading.

What are the slope and *y*-intercept? Interpret their meaning.

The slope is –1.5 (*b* = –1.5). This means the stock is losing value at a rate of 💲1.50 per hour. The *y*-intercept is 💲15 (*a* = 15). This means the price of stock before the trading day was 💲15.

If you owned this stock, would you want a positive or negative slope? Why?

# Homework

For each of the following situations, state the independent variable and the dependent variable.

- A study is done to determine if elderly drivers are involved in more motor vehicle fatalities than other drivers. The number of fatalities per 100,000 drivers is compared to the age of drivers.
- A study is done to determine if the weekly grocery bill changes based on the number of family members.
- Insurance companies base life insurance premiums partially on the age of the applicant.
- Utility bills vary according to power consumption.
- A study is done to determine if a higher education reduces the crime rate in a population.

- independent variable: age; dependent variable: fatalities
- independent variable: # of family members; dependent variable: grocery bill
- independent variable: age of applicant; dependent variable: insurance premium
- independent variable: power consumption; dependent variable: utility
- independent variable: higher education (years); dependent variable: crime rates

Piece-rate systems are widely debated incentive payment plans. In a recent study of loan officer effectiveness, the following piece-rate system was examined:

% of goal reached | < 80 | 80 | 100 | 120 |

Incentive | n/a | 💲4,000 with an additional 💲125 added per percentage point from 81–99% | 💲6,500 with an additional 💲125 added per percentage point from 101–119% | 💲9,500 with an additional 💲125 added per percentage point starting at 121% |

If a loan officer makes 95% of his or her goal, write the linear function that applies based on the incentive plan table. In context, explain the *y*-intercept and slope.