Questions tagged [pearson]

in statistics, Pearson's r, the Pearson product moment correlation coefficient, shows the extent of a linear relationship between two data sets on a scale from -1 to 1.

Overview

Pearson product-moment correlation coefficient is given by the following equation:

enter image description here

where,

pXY = Pearson’s correlation coefficient;
Cov(X,Y) = covariance of random variables X and Y;
Var(X) = variance of random variable X;
Var(Y) = variance of random variable Y;


Tag usage

Questions on tag should be about implementation and programming problems, not about the statistical or theoretical properties of the technique. Consider whether your question might be better suited to Cross Validated, the StackExchange site for statistics, machine learning and data analysis.

155 questions
4
votes
4 answers

What is wrong with this python function from "Programming Collective Intelligence"?

This is the function in question. It calculates the Pearson correlation coefficient for p1 and p2, which is supposed to be a number between -1 and 1. When I use this with real user data, it sometimes returns a number greater than 1, like in this…
Hobhouse
  • 15,463
  • 12
  • 35
  • 43
3
votes
1 answer

Apache Mahout + Pearson Correlation Ignores Users With Same Preference For Every Item

I'm using Mahout with the Pearson Correlation algorithm to compare and find similar users based on their preferences for several items. The problem I'm running into is that Mahout and/or Pearson is ignoring users that select the same preference for…
SGT Grumpy Pants
  • 4,118
  • 4
  • 42
  • 64
3
votes
1 answer

Statistical correlation: Pearson or Spearman?

I have 2 series of 45 values in the interval [0,1]. The first series is a human-generated standard, the second one is computer-generated (full series here http://www.copypastecode.com/74844/). The first series is sorted decreasingly. 0.909090909…
Mulone
  • 3,603
  • 9
  • 47
  • 69
3
votes
2 answers

Efficiently calculate and store similarity matrix

For a recommender-system project in class I am currently trying to build and store an item-based similarity matrix for a dataset with about 7000 users (rows) and 4000 movies (columns). So what I have is a pivot table with UserIDs as index, MovieIDs…
kbk
  • 45
  • 1
  • 5
3
votes
3 answers

How to generate correlation plot of my data.frame in R?

It might be a simple question. I have a df and I want to generate a correlation plot for my data in R. head(df) x y 1 -0.10967469 1 2 1.06814661 93 3 0.71805993 46 4 0.60566332 84 5 0.73714006 12 6 -0.06029712 5 I've found a…
user3576287
  • 932
  • 3
  • 16
  • 30
3
votes
1 answer

R: logistic regression using frequency table, cannot find correct Pearson Chi Square statistics

I was implement logistic regression to the following data frame and got a reasonable (the same as using STATA) results. But the Pearson chi square and degree of freedom I got from R is very different from STATA, which in turn gave me an very small…
J.Liu
  • 51
  • 1
  • 3
3
votes
0 answers

Full form of matching methods for " matchTemplate" available in OpenCV?

I know this is a noob question to ask, but i had to. I got confused with these matching methods matchtemplate () parameters full form. parameters =['cv2.TM_CCOEFF', 'cv2.TM_CCOEFF_NORMED', 'cv2.TM_CCORR','cv2.TM_CCORR_NORMED', 'cv2.TM_SQDIFF', …
Jonas
  • 375
  • 2
  • 6
  • 20
3
votes
1 answer

How to measure signifiance of a data point (X,Y) in Pearson's Correlation?

I had a set of data points (let's say X, Y, Z, etc) and showed that they have a Pearson Correlation Coefficient of 0.7. Is it possible to see how each data point contributes to the correlation coefficient ? i.e. be able to say point X contributes…
Ben Wong
  • 31
  • 1
3
votes
1 answer

chi square test in R when your data is a list of observations

Is it possible to calculate chi squared in R when your data is in the form of a list of observations? What I mean is, it is simple to get chi squared if you know the cross. For instance, if you have a survey and you ask for gender and a true-false…
user2047228
  • 73
  • 1
  • 3
  • 10
3
votes
2 answers

Determining Perfect Hash Lookup Table for Pearson Hash

I'm developing a programming language, and in my programming language, I'm storing objects as hash tables. The hash function I'm using is Pearson Hashing, which depends on a 256-bit lookup table. Here's the function: char* pearson(char* name,…
Imagist
  • 18,086
  • 12
  • 58
  • 77
3
votes
2 answers

Pearson Algorithm from Programming Collective Intelligence still not working

I ran the code to calculate the Pearson Correlation Coefficient and the function (pasted below) stubbornly returns a 0. In line with earlier suggestions on this issue here on SO (see #1, #2 below), I did make sure that the function is able to…
2
votes
3 answers

Error detection code for 33 bytes, detecting bit flipped in first 32 bytes

Could you please suggest an error detection scheme for detecting one possible bit flip in the first 32 bytes of a 33-byte message using no more than 8 bits of additional data? Could Pearson hashing be a solution?
robel
  • 51
  • 5
2
votes
0 answers

Correlation between continuous variables and multi class categorical variables in python

I was trying to figure out a way of finding a correlation between continuous variables and a non-binary target categorical label. The only thing I though of is by fitting the labels into Multinomial Logistic Regression and then extracting the…
zoump
  • 151
  • 2
  • 5
2
votes
1 answer

Is it possible to use pearson correlation metric in sklearn?

I have a matrix X that I am trying to use KNN with pearson correlation metric. Is it possible to use the pearson correlation as the sklearn metric? I have tried something like this: def pearson_calc(M): P = (1 - np.array([[pearsonr(a,b)[0] for…
Mike El Jackson
  • 771
  • 3
  • 14
  • 23
2
votes
1 answer

Correlation between two quantitative variables with NAs and by group

I have this dataset: dbppre dbppost per1pre per1post per2pre per2post 0.544331824055634 0.426482748529805 1.10388140870983 1.14622255457398 1.007302668 1.489675646 0.44544008292805 …
1
2
3
10 11