#statistics **Regression line** - a line that roughly approximates the linear [[Relationships|relationship]] between pairs of scores. ONLY USE THE [[Regression|REGRESSION]] **LINE** if the relationship is linear. ## Least squares regression line in Russian - метод наименьших квадратов. Least squares regression equation: $$\LARGE Y'=bX+a$$, where Y' - predicted value, X - known value, a & b: $$\LARGE b=r\sqrt{\frac{SS_Y}{SS_X}}$$ where r - [[Correlation coefficient]], SSx/y - [[Measures of variability|sum of squares for X and for Y]]. $$\LARGE a=\overline Y - b\overline X$$, where overline Y and X - sample means for all Y and X scores. ## Predictive errors The smaller the total for all predictive errors, the more favorable will be the prognosis for our predictions. It is desirable for the regression line to be placed in a position that minimizes the total predictive error. ![[Pasted image 20230909164902.png]] To avoid the arithmetic standoff always prodcued by adding negative and positive predictive error, the placement of the regression line minimizes not the total predictive error but the total **squared** predictive error - LSRL. **Predictive errors from the mean and from the LSRL**: ![[Pasted image 20230909180959.png]] ## Standard error of estimate, $\LARGE S_{y|x}$ $$\LARGE s_{y|x}=\sqrt{\frac{SS_{y|x}}{n-2}}=\sqrt{\frac{\sum((Y-Y')^2)}{n-2}}=\sqrt{\frac{SS_y(1-r^2)}{n-2}}$$,where SSy|x - sum of the squares for predictive errors, Y-Y'; and the [[Degrees of freedom]] term in the denominator, n-2, reflects the loss of two degrees of freedom because **any straight line, including the regression line, can be made to coincide with two data points**. $\LARGE S_{y|x}$ is read as "s sub y given x". Standard error of estimate - kinda like the [[Measures of variability|standard deviation]], a rough measure of the average amount of predictive error. *Use of the standard error fo estimate, $\LARGE s_{y|x}$, assumes that except for chance, the dots in the original scatterplot will be dispersed equally about all segments of the regression line* - **homoscedasticity** (i.e. variance is more or less the same). ![[Pasted image 20230909174339.png]] ## Multiple Regression equation **MRE** - a least squares equation that contains more than one predictor or X variable. ![[Pasted image 20230909182655.png]]