correlation

Correlation

Evan Jung January 09, 2019

1. Intro

  • Case 1: high values of X go with high values of Y, X and Y are positively corrleated.
  • Case 2: low values of X go with low values of Y, X and Y are positively corrleated.
  • Case 3: high values of X go with low values of Y, and vice versa, the variables are negatively correlated.

2. Key Terms

Correlation Coefficient is a metric that measures the extent to which numeric variables are associated with one another (ranges from -1 to +1). The +1 means perfect positive correlation The 0 indicates no correlation The -1 means perfect negative correlation

To compute Pearson’s correlation coefficient, we multiply deviations from the mean for variable 1 times those for variable 2, and divide by the product of the standard deviatinos:



Correlation Matrix is a table where the variables are shown on both rows and columns, and the cell values are the correlations between variables.

(Explanatoin of this plot remains to you!)

  1. The orientation of the ellipse indicates whether two variables are positively correlated or negatively correlated.

  2. The shading and width of the ellipse indicate the strength of the association: thinner and darker ellipse correspond to stronger relationships.

2.1. Other Correlation Estimates

The Spearman’s rho or Kendall’s tau have long ago been proposed by statisticians. These are generally used on the basis of the rank of the data. These estimates are robust to outliers and can handle certain types of nonlinearities because they use for the ranks.

But, for the data scientists can generally stick to Pearson’s correlation coefficient, and its robust alternatives, for exploratory analysis. The appeal of rank-based estimates is mostly for smaller data sets and specific hypothesis tests

Scatterplot A plot in which the a-xis is the value of one variable, and the y-axis the value of another.

The returns have a strong positive relationship: on most days, both stocks go up or go down in tandem. There are very few days where one stock goes down significantly while the other stocs goes up (and vice versa).

3. Key Ideas for Correlation

  • The correlation coefficient measures the extent to which two variables are associated with one another.

  • When high values of v1 go with high values of v2, v1 and v2 are positively associated.

  • When high values of v1 are associated with low values of v2, v1 and v2 are negatively associated.

  • The correlation coefficient is a standardized metric so that it always ranges from -1 (perfect negative correlation) to +1 (perfect positive correlation)

  • A correlation coefficent of 0 indicates no correlation, but be aware that random arrangements of data will produce both positive and negative values for the correlation coefficient just by chance. ##
    1. Further Reading Statistics, 4th ed., by David Freedman, Robert Pisani, and Roger Purves (W.W. Norton, 2007), has an excellent discussion of correlation.


'R > [R] Statistics' 카테고리의 다른 글

Assessing Prediction Performance R  (0) 2018.12.17
Designing_model  (0) 2018.12.15
Statistical Modeling in R Part 1  (0) 2018.12.13

+ Recent posts