Questions tagged [pearson]

in statistics, Pearson's r, the Pearson product moment correlation coefficient, shows the extent of a linear relationship between two data sets on a scale from -1 to 1.

Overview

Pearson product-moment correlation coefficient is given by the following equation:

enter image description here

where,

pXY = Pearson’s correlation coefficient;
Cov(X,Y) = covariance of random variables X and Y;
Var(X) = variance of random variable X;
Var(Y) = variance of random variable Y;


Tag usage

Questions on tag should be about implementation and programming problems, not about the statistical or theoretical properties of the technique. Consider whether your question might be better suited to Cross Validated, the StackExchange site for statistics, machine learning and data analysis.

155 questions
2
votes
1 answer

Pearson Correlation after Normalization

I want to normalize my data and compute a pearson correlation. If I try this without normalization it works. With normalization I get this error message: AttributeError: 'numpy.ndarray' object has no attribute 'corr' What can I do to solve this…
matthew
  • 399
  • 1
  • 7
  • 15
2
votes
0 answers

How to compute this huge Correlation Matrix?

I have a huge matrix with nrow=144 and ncol=156267 containing numbers and I would like to compute the correlation between all the columns. This can be done using the bigcor function described here:…
NKGon
  • 55
  • 8
2
votes
2 answers

Python pandas correlation corr() TypeError: Could not compare ['pearson'] with block values

one = pd.DataFrame(data=[1,2,3,4,5], index=[1,2,3,4,5]) two = pd.DataFrame(data=[5,4,3,2,1], index=[1,2,3,4,5]) one.corr(two) I think it should return a float = -1.00 but instead it's generating the following error: TypeError: Could not compare…
MJS
  • 1,573
  • 3
  • 17
  • 26
2
votes
2 answers

Raster correlation and p-values from cor.test

I am trying to get pixel-wise correlations and significance (p-value) between two raster bricks using cor and cor.test. My data are here: Brick 1 Brick 2 They're both fairly small, less than 2MB altogether. I found the following two codes (both…
lamochila
  • 35
  • 1
  • 6
2
votes
0 answers

JAVA: How to take Pearson correlation of iris matrix?

I read iris data set from file "iris.txt"and store it into arraylist. Now i need some help that how to take Pearson correlation of that matrix using formula, as matrix consist of 150 rows and 4 columns.I need just a little push up so i can continue…
Junayd Khalid
  • 21
  • 1
  • 5
2
votes
1 answer

Constructing correlated variables

I have a variable with a given distribution (normale in my below example). set.seed(32) var1 = rnorm(100,mean=0,sd=1) I want to create a variable (var2) that is correlated to var1 with a linear correlation coefficient (roughly or exactly)…
Remi.b
  • 17,389
  • 28
  • 87
  • 168
2
votes
1 answer

How do scipy.stats..fit methods work?

How do distribution fitness-tests, ex. scipy.stats.norm.fit work? Investigation of scipy source code led me to rv_continuous.fit method, but it looks like beating the air. What algorithms are used, Pearson's chi-squared test or some other ones? UPD…
leventov
  • 14,760
  • 11
  • 69
  • 98
2
votes
2 answers

R: Calculating Pearson correlation coefficient in each cell along time line

I have two sets of rasters, both with same x,y,z extent. I've made two stacks: stacka and stackb. I want to calculate the Pearson correlation coefficient (PCC) in each grid cell between two stacks along the time line. I've made a simpler example…
EDU
  • 267
  • 1
  • 5
  • 13
2
votes
3 answers

k means clustering algorithm

I want to perform a k means clustering analysis on a set of 10 data points that each have an array of 4 numeric values associated with them. I'm using the Pearson correlation coefficient as the distance metric. I did the first two steps of the k…
cooldood3490
  • 2,418
  • 7
  • 51
  • 66
2
votes
1 answer

Wrong correlation result for big numbers

The cor() function fails to compute the correlation value if there are extremely big numbers in the vector and returns just zero: foo <- c(1e154, 1, 0) bar <- c(0, 1, 2) cor(foo, bar) # -0.8660254 foo <- c(1e155, 1, 0) cor(foo, bar) # 0 Although…
Ali
  • 9,440
  • 12
  • 62
  • 92
2
votes
8 answers

Pearson Similarity Score, how can I optimise this further?

I have an implemented of Pearson's Similarity score for comparing two dictionaries of values. More time is spent in this method than anywhere else (potentially many millions of calls), so this is clearly the critical method to optimise. Even the…
Andrew Ingram
  • 5,160
  • 2
  • 25
  • 37
2
votes
2 answers

Understanding floating point variables and operators in c++ (Also a possible book error)

I am working through a beginning C++ class and my book(Starting Out with C++ Early Objects 7th edition) has a very poor example of how to check the value of a floating point variable. The book example in question(filename pr4-04.cpp): // This…
Robert
  • 123
  • 4
2
votes
1 answer

Calculate correlations based on tf-idf values?

Does it make sense to calculate pearson correlation coefficients based on a tf-idf matrix to see which terms occur in combination with other terms? Is it mathematically correct? My output is a correlation matrix with correlation coefficients in each…
user1341610
  • 21
  • 1
  • 2
1
vote
1 answer

How to get Pearson´s correlation with a matrix in Matlab

I have some vectors, for example, let´s call them a, b and c. All of them have the same size. I want to get the correlation between a and c, b and c. I have tried…
user1297712
  • 73
  • 2
  • 7
1
vote
1 answer

"Correlation & Significance if more than 30 pairs" using R and ddply

Part of the solution to my problem I found here: How to calculate correlation In R set.seed(123) X <- data.frame(ID = rep(1:2, each=5), a = sample(1:10), b = sample(1:10)) ddply(X, .(ID), summarize, cor_a_b = cor(a,b)) In addition to cor (which…
Thomas Langkamp
  • 153
  • 1
  • 9
1 2
3
10 11