I have a dataset of 130 variables and I have to check the correlation of all variables, is there any way to check it in once I am new to data science and using pandas, please suggest a way. Do I have to do hypothesis....
Asked
Active
Viewed 46 times
-1
-
1You say you want to check "it" once, but there is no clear referent for "it". Correlation is a property of two series -- a pair of variables, or two data columns. 130 variables will have 130*129 / 2 total correlations. – Prune Aug 27 '19 at 16:11
-
Hi @Prune as I want to drop multiple variables and I guess for that I have to check the correlation of variables please correct if I am missing anything. – Ishwar Bisht Aug 27 '19 at 17:03
-
You're missing a research step: how to reduce variables (dimensions) in data preprocessing. Use search terms such as "dimension reduction" and "eliminate variables". I expect that a good starting point is "principal component analysis". – Prune Aug 27 '19 at 17:58
1 Answers
1
You could have easily found out answer to this question elsewhere but since you're new to datascience...
# First read the dataset into a dataframe
data = pd.read_excel(file_name)
# Find correlation among all the columns (features / variables)
# methods can be changed
data.corr(method='pearson')

TheSHETTY-Paradise
- 1,024
- 2
- 9
- 19