#statistics **Correlation coefficient** - a number $\LARGE \in$ $\LARGE [-1; 1]$ that describes the [[Relationships|relationship]] between pairs of [[Variables in statistics|variables]]. **Pearson correlation coefficient, r** - a number between -1.00 and +1.00 that describes the linear relationship between pairs of [[Data in statistics|quantitative]] variables: 1. The sign of **r** indicates the type of linear relationship, whether positive or negative 2. The numerical value of **r**, without regard to sign, indicates the strength of the linear relationship The value of **r** can't be interpreted as a proportion or percentage of some perfect relationship On the assumption that reasonably large numbers of pairs of scores are involved, an r with the absolute value of 0.5 or more, would represent a very strong relationship in most areas of behavioral and educational research. (MOST, BUT NOT ALL) The smaller the range of a distribution, the lesser its correlation coefficient: ![[Pasted image 20230902233142.png]] ALWAYS ACCOUNT FOR THAT **A CORRELATION COEFFICIENT, REGARDLESS OF SIZE, NEVER PROVIDES INFORMATION WHETHER AN OBSERVED RELATIONSHIP REFLECTS A SIMPLE CAUSE-EFFECT RELATIONSHIP OR SOME MORE COMPLEX STATE OF AFFAIRS** ## Formula $$\LARGE r=\frac{SP_{xy}}{\sqrt{SS_X\cdot SS_y}}$$, where SSx and SSy - the two [[Measures of variability|sum of squares]], and SPxy: $$\LARGE SP_{xy}=\sum(X-\overline X)(Y-\overline Y)=\sum (XY)-\frac{(\sum X)(\sum Y)}{n}$$ **sum of the products**. ## Alternatives to Pearson's r ### Spearman's $\LARGE \rho$ To describe the correlation between *ranks* assigned independently by two judges to a set of science projects, simply substitute the numerical ranks in Pearson's r formula, then solve for a value of the Pearson r. Spearman's $\LARGE \rho$ shows that the ranks can be described using a [[Монотонность функции|monotonic]] function, even if it is not linear. ### Point biserial correlation coefficient To describe the correlation between quantitative data and *qualitative or nominal data with only two categories*, assign arbitrary numerical codes, such as 1 and 2, to the two qualitative categories, then solve Pearson's r formula. ### Cramer's $\LARGE \phi$ To describe the relationship between *two ordered qualitative variables* (e.g. the attitude toward legal abortion and education level), assign any *ordered* numerical codes, such as 1,2,3, to the categories for both qualitative variables, then solve the formula for Pearson's r. ## Correlation matrix **Correlation matrix** - a table showing correlations for all possible pairs of variables ![[Pasted image 20230909162635.png]] ## Squared correlation coefficient $\LARGE r^2$ - the proportion of the total variability in one variable that is predictable from its relationship with the other variable. $$\LARGE r^2 = \frac{SS_{Y'}}{SS_Y}=\frac{SS_Y-SS_{Y|X}}{SS_Y}$$, $\LARGE SS_{Y'}=\sum (Y'-\overline Y)^2$. - $\LARGE r^2$ does not apply to individual scores - $\LARGE r^2$ does not ensure cause-effect