Hey. I'm in an intro stat class, and I need to know how to find the equation of a regression line. I'm not very math-minded, so could anyone give me some easy to follow steps or maybe an example problem? I'm not looking for just a textbook definition.
Will be quick to award BA, thanks in advance for any help =)
Will be quick to award BA, thanks in advance for any help =)
-
You have a list of n samples that look like [x[1], y[1]), (x[2], y[2]), ..., (x[n], y[n]) and you want to fit then to a line: y = m x + b.
You can't. Real data is almost never going to lie exactly on a straight line.
So you use a linear model, which is a line with an error term added to it: y[i] = m x[i] + b + e[i] where e[i], the error term, measures how much the dependent variable y[i] differs from its predicted linear value m x[i] + b.
Usually you want to define m and b so that, in some way, they are the best possible values of m and b. One way to do this is called a least squares linear regression. Don't get freaked about the word regression, it basically means there is an error term, e[i[, in your model. A least squares linear regression says to pick m and b so that the sum of the squares of the e[i], e[1]^2 + e[2]^2 + ... + e[n]^2, has the smallest possible value.
Let's define some quantities:
xybar = average value of x[i] y[i] = (x[1]y[1] + x[2]y[2] + ... + x[n]y[n]) / n
xbar = average value of x[i] = (x[1] + x[2] + ... + x[n]) / n
ybar = average value of y[i] = (y[1] + y[2] + ... + y[n]) / n
x²bar = average value of (x[i])² = ((x[1])² + (x[2])² + ... +(x[n])²) / n
After a lot of complicated (and really cool) math, it turns out that
m = (xybar = xbar * ybar) / (x²bar - xbar²) and
b = ybar - m * xbar
You can't. Real data is almost never going to lie exactly on a straight line.
So you use a linear model, which is a line with an error term added to it: y[i] = m x[i] + b + e[i] where e[i], the error term, measures how much the dependent variable y[i] differs from its predicted linear value m x[i] + b.
Usually you want to define m and b so that, in some way, they are the best possible values of m and b. One way to do this is called a least squares linear regression. Don't get freaked about the word regression, it basically means there is an error term, e[i[, in your model. A least squares linear regression says to pick m and b so that the sum of the squares of the e[i], e[1]^2 + e[2]^2 + ... + e[n]^2, has the smallest possible value.
Let's define some quantities:
xybar = average value of x[i] y[i] = (x[1]y[1] + x[2]y[2] + ... + x[n]y[n]) / n
xbar = average value of x[i] = (x[1] + x[2] + ... + x[n]) / n
ybar = average value of y[i] = (y[1] + y[2] + ... + y[n]) / n
x²bar = average value of (x[i])² = ((x[1])² + (x[2])² + ... +(x[n])²) / n
After a lot of complicated (and really cool) math, it turns out that
m = (xybar = xbar * ybar) / (x²bar - xbar²) and
b = ybar - m * xbar
-
Hi, thank you for your question.
What I would recommend you do is to get your self a TI89 graphing calculator and use the stats function on it.
http://www.tc3.edu/instruct/sbrown/ti83/regres89.htm
This websites explains how to use the stats function. Jump to step two if all you are looking for is the line.
Otherwise, math definition is really the way to go.
Hope that helped.
What I would recommend you do is to get your self a TI89 graphing calculator and use the stats function on it.
http://www.tc3.edu/instruct/sbrown/ti83/regres89.htm
This websites explains how to use the stats function. Jump to step two if all you are looking for is the line.
Otherwise, math definition is really the way to go.
Hope that helped.