0

I have two sets of Likert data on a scale from 0 - 100 where 0 is strongly disagree and 100 is strongly agree. The first set consists of answers from a sample of 500 users. The second set also consists of numerical answers from the same sample of 500 users. These data sets are related in this way: the ith user in the first set has matched with the ith user in the second data in numerous occasions of a particular gaming platform (ex: a party on playstation network) for i = 1,...,500. The question asked to the user is: Do you like dogs? Here's an example of how the data looks:

user_1_data = [100,60,98, 50,0,...,20,100]
user_2_data = [50,75,12,...,100,20]

where user_1_data[0] is the user who matched with user_2_data[0] and their responses are 100, and 50 respectively to the question Do you like dogs? and so on so forth until i = 500. I managed to plot the actual data in the probability distribution below. Where the x axis is the rating from 0 - 100, and the y- axis is the probability of picking that particular rating. User 1 and 2 data

Although the distributions look similar, I need some sort of test to prove some significance between them (if any). Ultimately I'd like to answer the question: Does a similar distribution of answers imply that the users will play together on different occasions?

Please feel free to edit this question for formatting and to be easier to understand.

This is a statistics question. Please use statistics terms and math language if possible. I am new to data science and would love to learn how to answer my own question in the future.

I code in python.

anthonym650
  • 49
  • 1
  • 7
  • Does this article help? https://medium.com/@sourcedexter/how-to-find-the-similarity-between-two-probability-distributions-using-python-a7546e90a08d . Also you can try the KS test, implemented in python in ```scipy.stats.kstest``` – sin tribu Nov 19 '20 at 02:51
  • @sintribu I can see that the Jensen-Shannon Divergence measures dissimilarity in a probability distribution. But what kind of conclusion can I make from this? – anthonym650 Nov 19 '20 at 22:17
  • It gives you a metric to measure the similarity of a distribution, but if you're looking for some kind of confidence interval around that metric I'm honestly am not qualified to answer. You might want to ask this question at Math stack exchange. – sin tribu Nov 20 '20 at 17:44
  • @sintribu Got it. Nevertheless, thank you for your help. – anthonym650 Nov 21 '20 at 03:15

0 Answers0