Questions tagged [pearson]

in statistics, Pearson's r, the Pearson product moment correlation coefficient, shows the extent of a linear relationship between two data sets on a scale from -1 to 1.

Overview

Pearson product-moment correlation coefficient is given by the following equation:

enter image description here

where,

pXY = Pearson’s correlation coefficient;
Cov(X,Y) = covariance of random variables X and Y;
Var(X) = variance of random variable X;
Var(Y) = variance of random variable Y;


Tag usage

Questions on tag should be about implementation and programming problems, not about the statistical or theoretical properties of the technique. Consider whether your question might be better suited to Cross Validated, the StackExchange site for statistics, machine learning and data analysis.

155 questions
1
vote
1 answer

replacement length error which doing correlation in R using cor function

I am trying to use cor function in R to do correlation for genes across many samples. I have two input files: actual expression and predicted expression. Both these files have 5 rows which correspond to genes and columns as sample. When i use cor…
rheabedi1
  • 65
  • 7
1
vote
1 answer

How to slice and calculate the pearson correlation coefficient between one big and small array with "overlapping" windows arrays

Suppose I have two very simple arrays with numpy: import numpy as np reference=np.array([0,1,2,3,0,0,0,7,8,9,10]) probe=np.zeros(3) I would like to find which slice of array reference has the highest pearson's correlation coefficient with array…
mad
  • 2,677
  • 8
  • 35
  • 78
1
vote
0 answers

stats.pearsonr unsupported operand type(s) for +: 'float' and 'numpy.str_'

i am using stats.pearsonr to find the r values which can be used to calculate the similarities between two vectors (lists) of numbers, such as gene expression values. both dic[gene[i]] and dic[gene[j]] are lists dic[gene[j]] = ['145.544', '135.24',…
1
vote
1 answer

Estimate Pearson correlation coefficient from stream of data

Is there a way to estimate the correlation of two variables if the data is received in chunks without storing the received pairs? For example, we receive the pairs: [(x1, y1), (x2, y2), (x3, y3)] [(x4, y4)] [(x5, y5), (x6, y6)] and we have to…
Ramon
  • 501
  • 3
  • 13
1
vote
1 answer

pearson correlation

I have a dataframe as follows: x <- data.frame(Name=c("a", "b","c", "d", "e"),A=(1:5), B=(2:6), C=(7:11), D=c(1,1,1,1,1)) I want to get a dataframe including all the pearson coefficients of a vs b, a vs c, a vs d, a vs e, b vs a, b vs c, b vs d, b…
a83
  • 11
  • 2
1
vote
0 answers

How to get r to run bnlearn mc-x2 test?

I'm attempting to perform two versions of Pearson's X2 test on a network. Here is the code i'm running: library(bnlearn) library(Rgraphviz) library(gRain) library(graph) library(grid) library(snow) dag <- empty.graph(names(alarm)) modelstring(dag)…
1
vote
0 answers

Correlation Analysis

I want to try and compare data on employee performance to an engagement survey. If I have employees who are ranked 1-5 for certain performance categories, can I correlate this with engagement scores? I am using Pearson Correlation, is that…
MeganMills
  • 11
  • 1
1
vote
2 answers

Finding related texts(correlation between two texts)

I'm trying to find similar articles in database via correlation. So i split text in array of words, then delete frequently used words (articles,pronouns and so on), then compare two text with pearson coefficient function. For some text it's works…
x2.
  • 9,554
  • 6
  • 41
  • 62
1
vote
1 answer

Why does spearman produce different result on zscore?

Hi it seems that spearman correlation should produce the same result regardless if its zscore or raw. Here are two examples.…
Ahdee
  • 4,679
  • 4
  • 34
  • 58
1
vote
1 answer

Python - vectorized function for calculating pairwise Pearson correlation coefficient

I want to calculate correlation coefficient between the rows of the matrix X (N x k). Applying numpy.corrcoef(X) on a big matrix X is not efficient and it is slowing down my code so would like to make it faster. Can somebody help me how can I…
pelah
  • 11
  • 3
1
vote
1 answer

Pearson vs Euclidean vs Manhattan Results

Using Python 3.6. I am not getting logical results when using Manhattan distance for similarity measurement. Even comparing to the results from Pearson and Euclidean correlation, the units for Euclidean and Manhattan looks off? I am working on a…
user1940212
  • 199
  • 1
  • 4
  • 14
1
vote
0 answers

python - Proper way to find correlations between features containing missing data

1) Most of the features are NOT normally distributed Just realized that SciPy pearsonr I used requires normal distribution (what's indeed weird). Numpy.corrcoef description says nothing about such requirements. Should I use it? Other…
Acia Delilah
  • 73
  • 2
  • 6
1
vote
1 answer

r function cor.test(): How is the p-value calculated for pearson correlation?

It's probably a very easy question. I can't find the methodology behind the pvalue calculation in the cor.test() function in R.
DataAdventurer
  • 300
  • 1
  • 3
  • 10
1
vote
1 answer

non-linear correlation formula

There are some formula that calculate the linear correlation between two random variables, for example Pearson or Spearman Correlation. My question is that is there any formula that can calculate the non-linear correlation between two random…
hrkad
  • 105
  • 8
1
vote
2 answers

Homemade pearson's correlation implementation returning 0.999...2 when passing two of the same sets of data to it

I was getting fed up of scipy and numpy, and decided to go ahead and work on another implementation, based on a SO answer somewhere. from statistics import pstdev, mean def pearson(x, y): sx = [] sy = [] mx = mean(x) my = mean(y) …