3

In Django I have model similar to this example:

class Currency(models.Model):
    name = models.CharField(max_length=3, unique=True)
    full_name = models.CharField(max_length=20)


class ExchangeRate(models.Model):
    currency = models.ForeignKey('Currency')
    start_date = models.DateFiled()
    end_date = models.DateField()
    exchange_rate = models.DecimalField(max_digits=12, decimal_places=4)

Let's simplify this and assume that we have only one currency and ExchangeRate table looks like:

+---------------------+-------------------+------------+------------+---------------+
| currency_from__name | currency_to__name | start_date |  end_date  | exchange_rate |
+---------------------+-------------------+------------+------------+---------------+
|        PLN          |        USD        | 2014-03-01 | 2014-08-01 |    3.00000    |
|        PLN          |        USD        | 2014-08-01 | 2014-12-01 |    6.00000    |
+---------------------+-------------------+------------+------------+---------------+

Note that this is example to simplify math operations!

In this table data density is once per month and valid record for one month is for example when start_date = 2014.03.01 and end_date = 2014.04.01, so start_date is inclusive and end_date is exclusive.

I want to calculate average exchange rate for time period:

[2014.06.01 ; 2012.09.01)

Which means: >= 2014.06.01 and < 2014.09.01

When in Django I write:

start_date = date(2014, 6, 1)
end_date = date(2014, 9, 1)

ExchangeRate.objects.all().filter(
        (
            Q(start_date__lt=start_date) & 
            Q(end_date__gt=start_date)
        ) | (
            Q(start_date__gte=start_date) & 
            Q(start_date__lt=end_date) & 
            Q(end_date__gt=start_date) 
        )
).annotate(
    currency_from_name = 'currency_from__name', 
    currency_to_name = 'currency_to__name'
).values(  # GROUP BY
    'currency_from_name',
    'currency_to_name'
).aggregate(
    F('currency_from_name'), 
    F('currency_to_name'), 
    Avg('exchange_rate')
)

After this query I'm receiving value 4.5000 which from mathematical reason is correct but wrong when You need to take care of time range.
Correct answer is 4.000.

I only came up with this solution to annotate extra column with this formula and then calculate average value from this column:

https://www.codecogs.com/eqnedit.php?latex=\inline&space;Abs&space;\left&space;(&space;\frac{months&space;\left&space;(&space;greater(ER_{start_date}\&space;,\&space;start_date),&space;smaller(ER_{start_date}\&space;,\&space;end_date)&space;\right&space;)&space;}{months(start_date\&space;,\&space;end_date)}&space;\right&space;)&space;*&space;ER_{exchange_rate}

Where:

  • Abs is function for absolute value abs()
  • months is function to calculate months between two dates months_between()
  • greater, smaller are functions to choose accordingly greater and smaller value from arguments - greatest(), least()
  • ER means column from ExchangeRate - e.g. F('exchange_rate')

I'm using 9.3 PostgreSQL DB and Django 1.8.4.

Maybe there is a simple function for that?
Maybe I'm overcomplicating this?

Community
  • 1
  • 1
WBAR
  • 4,924
  • 7
  • 47
  • 81
  • why don't you want to do this in Python? Is that table big? – Yuri Kriachko Sep 04 '15 at 18:43
  • This __MUST__ be calculated over database side and if I must create `months_between` function also I can simply create this `average_weight` but this can be also done in Django QuerySet way – WBAR Sep 04 '15 at 18:50
  • @PabTorre _" `start_date` is inclusive and `end_date` is exclusive."_ so `6` :) – WBAR Sep 08 '15 at 19:12
  • gotcha. that explains. :) I updated my answer. – PabTorre Sep 08 '15 at 19:28
  • [there are some issues with how the bounty has been handled here](http://meta.stackoverflow.com/questions/305859/should-bounties-be-used-as-a-currency-to-hand-out-work-assignments?cb=1#comment246028_305859) – jfs Sep 12 '15 at 17:28
  • I know, because I need to choose winner and no correct answer posted.. After 7 days I found solution and posted as own answer.. StackOverflow show me a warning that I need to choose a winner but not a one answer were good ( I coundn't choose my own answer).. I panic.. – WBAR Sep 12 '15 at 17:34
  • You can also vote to reopen your question. – peterh Sep 14 '15 at 00:03
  • Having done some funky DB side work with Django, this is a very in-depth question that covers a lot of ground. Why it is closed is beyond me! –  Sep 14 '15 at 00:13

3 Answers3

3

1. months_between():

create function months_of(interval)
 returns int strict immutable language sql as $$
  select extract(years from $1)::int * 12 + extract(month from $1)::int
$$;

create function months_between(date, date)
 returns int strict immutable language sql as $$
   select months_of(age($1, $2))
$$;

2. average_weight():

create function average_weight(numeric, date, date, date, date)
 returns numeric(9,2) strict immutable language sql as $$
   select abs(months_between(GREATEST($2, $4), LEAST($3, $5))/months_between($4, $5))*$1
$$;

3. AverageWeight:

from django.db.models.aggregates import Func
from django.db.models.fields import FloatField

class AverageWeight(Func):
    function = 'average_weight'

    def __init__(self, *expressions):
        super(AverageWeight, self).__init__(*expressions, output_field=FloatField())

In your view:

ExchangeRate.objects.all().filter(
        (
            Q(start_date__lt=start_date) & 
            Q(end_date__gt=start_date)
        ) | (
            Q(start_date__gte=start_date) & 
            Q(start_date__lt=end_date) & 
            Q(end_date__gt=start_date) 
        )
).annotate(
    currency_from_name = 'currency_from__name', 
    currency_to_name = 'currency_to__name',
    weight_exchange = AverageWeight(
        F('exchange_rate'),
        start_date,
        end_date,
        F('start_date'),
        F('end_date'),
    )
).values(  # GROUP BY
    'currency_from_name',
    'currency_to_name'
).aggregate(
    F('currency_from_name'), 
    F('currency_to_name'), 
    Avg('weight_exchange')
)
WBAR
  • 4,924
  • 7
  • 47
  • 81
2

The problem with your application is the way you choose to store the exchange rates. So, to answer your question: yes, you are over complicating this.

"The Math" is telling you that the average exchange rate is 4.5 because

(3 + 6) /2 == 4.5 

Regardless of what start date or end date you choose, the system will get you the same value.

In order to address the root cause, let's try a different approach. (for simplicity sake I'll leave the foreign keys and other details not relevant to getting the average inside the specific date range out, you can add them back later)

with this model:

class ExchangeRate(models.Model):
    currency1 = models.CharField(max_length=3)
    currency2 = models.CharField(max_length=3)
    start_date = models.DateField()
    exchange_rate = models.DecimalField(max_digits=12, decimal_places=4)

and this data:

INSERT INTO exchange_rate_exchangerate(currency1, currency2, start_date, exchange_rate) VALUES ('PLN', 'USD', '2014-03-01', 3);
INSERT INTO exchange_rate_exchangerate(currency1, currency2, start_date, exchange_rate) VALUES ('PLN', 'USD', '2014-04-01', 3);
INSERT INTO exchange_rate_exchangerate(currency1, currency2, start_date, exchange_rate) VALUES ('PLN', 'USD', '2014-05-01', 3);
INSERT INTO exchange_rate_exchangerate(currency1, currency2, start_date, exchange_rate) VALUES ('PLN', 'USD', '2014-06-01', 3);
INSERT INTO exchange_rate_exchangerate(currency1, currency2, start_date, exchange_rate) VALUES ('PLN', 'USD', '2014-07-01', 3);
INSERT INTO exchange_rate_exchangerate(currency1, currency2, start_date, exchange_rate) VALUES ('PLN', 'USD', '2014-08-01', 6);
INSERT INTO exchange_rate_exchangerate(currency1, currency2, start_date, exchange_rate) VALUES ('PLN', 'USD', '2014-09-01', 6);
INSERT INTO exchange_rate_exchangerate(currency1, currency2, start_date, exchange_rate) VALUES ('PLN', 'USD', '2014-10-01', 6);
INSERT INTO exchange_rate_exchangerate(currency1, currency2, start_date, exchange_rate) VALUES ('PLN', 'USD', '2014-11-01', 6);

we can perform this query:

from django.db.models import Avg
from datetime import date

first_date = date(2014, 6, 1)
last_date = date(2014, 9, 1)
er.models.ExchangeRate.objects.filter(
    start_date__gte = first_date,
    start_date__lt = last_date

).aggregate(Avg('exchange_rate'))

To get this output:

{'exchange_rate__avg': 4.0}
PabTorre
  • 2,878
  • 21
  • 30
0

You should think of this as a weighted average so what you want to do is to calculate the weight of each line and then sum it all together.

I don't know enough Django to help you there but in SQL this would be (I am not able to test this now but I think it gives the right idea):

SELECT SUM((LEAST(end_date, @end_date) - GREATEST(start_date, @start_date)) * exchange_rate) / (@end_date - @start_date) AS weighted_avg
FROM 
  ExchangeRate
WHERE
  (start_date, end_date) OVERLAPS (@start_date, @end_date)

This uses the OVERLAPS operator to see if the periods overlap. I'm not sure if there is an of by 1 mistake in the weight calculation but think that this should be handled in the definition of the input variable (@end_date = @end_date - 1)

NuLo
  • 1,298
  • 1
  • 11
  • 16