0

I'm using MySQL & PHP to calculate Pearson's R measuring the relationship over a number of years between political donations and tenders received by a particular business.

I've run into trouble with the MySQL query that is feeding values into the algorithm. The algorithm itself is fine and evaluates correctly. The problem is in the query used to get the data.

The formula I'm using for Pearson's R is at http://www.statisticshowto.com/how-to-compute-pearsons-correlation-coefficients/

Here is the basic MySQL query that spits out values for each year:

SELECT count( distinct year) as count,name,sum(donations), sum(tenders), sum(donations * tenders) as xy,(sum(donations)*sum(donations)) as x2, (sum(tenders)*sum(tenders)) as y2 from money_by_year where name='$name' group by name,year

Here is the query WITH ROLLUP to get only the final values:

SELECT count( distinct year) as count,name,sum(donations), sum(tenders), sum(donations * tenders) as xy,(sum(donations)*sum(donations)) as x2, (sum(tenders)*sum(tenders)) as y2 from money_by_year where name='$name' group by name with rollup LIMIT 1

The problem is that the totals from the second query are wrong in the sum xy, x2 & y2. This is being caused by the query itself, probably the ROLLUP and I'd like to know what is going on with it.

You can see working examples of the code with the values resulting from both the above queries and the algorithm at https://openaus.net.au/follow_the_money.php?name=KPMG

I have tried various changes to sum(donations * tenders) as xy for example implementing it as sum(donations) * sum(tenders) as in:

SELECT count( distinct year) as count,name,sum(donations), sum(tenders), sum(donations) * sum(tenders) as xy,(sum(donations)*sum(donations)) as x2, (sum(tenders)*sum(tenders)) as y2 from money_by_year where name='KPMG' group by name with rollup LIMIT 1

however the ROLLUP totals are incorrect, much bigger than they should be. The values I want may not be possible via a single MySQL query however I would appreciate knowing why this is the case, what the ROLLUP is doing to the figures and why.

user3470715
  • 25
  • 1
  • 5
  • your query should be like following: "SELECT COUNT( DISTINCT YEAR) AS COUNT,NAME,YEAR,SUM(donations), SUM(tenders), SUM(donations * tenders) AS xy,(SUM(donations)*SUM(donations)) AS x2, (SUM(tenders)*SUM(tenders)) AS y2 FROM money_by_year WHERE NAME='$name' GROUP BY NAME,YEAR WITH ROLLUP;"..plz check it and share the issue. – Zafar Malik Feb 08 '16 at 04:42
  • That query does not produce the correct sum totals. Can you clarify what it was that is different between your suggestion and the options already posted? – user3470715 Feb 08 '16 at 05:41
  • in your query you are doing group by on single field even your where condition will fetch only single name so there is no need of rollup even no need of group by..... further if you create a sqlfiddle and let us know what output you are getting and what you need that we can help you. – Zafar Malik Feb 08 '16 at 05:51
  • 1
    The rollup is necessary to get the final values to put into the algorithm. The output has been posted in the url included in the question. It would be helpful if people visit the page with the code, values etc. That is why it is included in the question. – user3470715 Feb 08 '16 at 05:55

0 Answers0