0

I have two queries in a Microsoft Access database. They are named Average and Home_Runs. They both share the same first three columns Name, [Year] and Month.

Query: Average

Name    Year    Month   Average  
Cabrera  2013   5       .379 
Fielder  2013   5       .245 
Martinez 2013   5       .235  
Cabrera  2013   6       .378  
Fielder  2013   6       .278 
Martinez 2013   6       .240 

Query: Home_Runs

Name    Year    Month   Home Runs  
Cabrera  2013   5       12 
Fielder  2013   5       2
Martinez 2013   5       2 
Cabrera  2013   6       9  
Fielder  2013   6       4 
Martinez 2013   6       4 

I need to offset the data before I begin the calculations. I need to determine how the Home Runs from one month relate the the Average from the previous month. So it is not a direct month-to-month comparison. I need to perform a month-to-previous-month comparison.

I need to do two calculate two things from these two queries.

First: With Average being the X-axis and Home_Runs being the Y-Axis. I need to find the correlation between these data points.

Second: With Average being the X-axis and Home_Runs being the Y-Axis. I need to find the equation of the best-fit-line between all of these data points. More specifically I need to find the value of the Y variable when the X variable equals certain values.

Additional Information:

In the end I need to return a table that looks like this:

Calculation            Tier 1   Tier 2  Tier 3  Correlation 
Avgerage to Home Runs  .04      3.00    6.00    .80

What is the best way to accomplish these things?

gromit1
  • 577
  • 2
  • 14
  • 36

1 Answers1

1

Here is the SQL Fiddle example for you to play with and tweak to get it exactly right:

SELECT (Avg(A.Paverage * H.HomeRuns) - Avg(A.Paverage) * Avg(H.HomeRuns)) /     
    (StDevP(A.Paverage) * StDevP(H.HomeRuns)) AS Correlation,
    (Sum(A.Paverage * H.HomeRuns) - (Sum(A.Paverage) * Sum(H.HomeRuns) / 
    Count(*))) / (Sum(A.Paverage * A.Paverage) - (Sum(A.Paverage) * Sum(A.Paverage) / Count(*))) AS LineBestFit
FROM Averages AS A 
   INNER JOIN Home_Runs AS H 
   ON (A.Pname = H.Pname) 
   AND (A.Pyear = H.Pyear) 
   AND ((A.Pmonth - 1) = H.Pmonth)
Linger
  • 14,942
  • 23
  • 52
  • 79
  • 1
    **@gromit1**, if you provide more information on exactly what you are looking for when you say ***correlation*** and ***equation of the best-fit-line***, I can help fit that into the query. Also, if you provide the original two queries, you could probably skip creating them entirely and just use one query. – Linger Oct 16 '13 at 18:28
  • I have edited my original post. I forgot to mention that I need to offset my data by one Month. I need to compare how the Home Runs from 2013-6 compare to the average of 2013-5. How can I offset these? – gromit1 Oct 16 '13 at 18:29
  • I will work on providing more information on exactly what I mean by **correlation** and **equation of the best-fit-line**. I need to display the data from the original two queries in my program so they must exist anyways. Would it still be more efficient to just use one query? – gromit1 Oct 16 '13 at 18:34
  • 1
    One query vs three usually is more efficient as long as they don't get too complex. – Linger Oct 16 '13 at 18:41
  • 1
    I modified my answer to join home runs of 2013-6 to average of 2013-5. – Linger Oct 16 '13 at 18:51
  • This is actually an example of data used to illustrate my problem in an effort to protect sensitive information. I appreciate you trying to help my program be as efficient as possible. But I don't think I can provide a clear enough example with the data in this scenario for you to help me skip creating the original two queries entirely. And I suspect that creating one query may become too complex. So to proceed from here, I will provide more information on exactly what I mean by **correlation** and **equation of the best-fit-line**. – gromit1 Oct 16 '13 at 18:56
  • When I say **correlation** I mean that I need to calculate the Pearson correlation. The closest instructions I could find on this came from here http://www.experts-exchange.com/Microsoft/Development/MS_Access/A_2728-Covariance-Correlation-and-Simple-Linear-Regression-in-MS-Access.html – gromit1 Oct 16 '13 at 19:09
  • 1
    I modified my answer to include the correlation. – Linger Oct 16 '13 at 19:19
  • That looks great! As for **equation of the best-fit-line**, I need find the best-fit-line equation for the data points (formula here: http://hotmath.com/hotmath_help/topics/line-of-best-fit.html). Using that equation I need to find what `Home_Runs` (Y-variable) equals when `Average` (X-variable) equals .200, .250, and .300 respectively. – gromit1 Oct 16 '13 at 19:28
  • I've updated my original post to provide an example of what I am looking to have as an output. – gromit1 Oct 16 '13 at 19:44
  • 1
    I updated my answer to show you how to get **equation of the best-fit-line**. – Linger Oct 16 '13 at 19:57
  • In regards to the offset, what about when `A.Month` equals 1? I need it to equal the 12th month of the previous year. While the code that you provided compensate for that? – gromit1 Oct 17 '13 at 13:03
  • I get this error when I try to run just code for **Correlation**: `Additional information: You tried to execute a query that does not include the specified expression 'Name' as part of an aggregate function.` – gromit1 Oct 17 '13 at 14:11
  • 1
    In order to get the calculated fields to work you may have to add `GROUP BY A.Name, A.Year, A.Month, A.Average, H.[Home Runs]` to the end of the query – Linger Oct 17 '13 at 16:41
  • 1
    To link the months the way you want try changing `AND ((A.Month - 1) = H.Month` to `AND (IIF(H.Month = 1, 12, A.Month - 1))` – Linger Oct 17 '13 at 16:44
  • I'm getting this when I try and set the command to a `DataGridView`. `The provider could not determine the Double value. For example, the row was just created, the default for the Double column was not available, and the consumer had not yet set a new Double value.` – gromit1 Oct 17 '13 at 17:32
  • Any thoughts on how to get around this error: `The provider could not determine the Double value. For example, the row was just created, the default for the Double column was not available, and the consumer had not yet set a new Double value.`? Research hasn't turned up anything helpful. – gromit1 Oct 18 '13 at 17:14
  • 1
    let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/39511/discussion-between-linger-and-gromit1) – Linger Oct 18 '13 at 17:16