0

Assume I have a Transaction model which has following fields [token, pair_token, amount, related_transaction], I want to generate a query like this in mysql:

SELECT token_id, pair_token_id, (
  SELECT ABS(t3.amount / t2.amount) price
    FROM 
    Transaction t2 join Transaction t3 on t2.related_transaction_id=t3.id 
    WHERE t2.id=MAX(t1.id)) 
FROM 
Transaction t1 
WHERE 
t1.token_id in (1, 2, 3, 5, 6) and t2.pair_token_id in (4) and t1.timestamp > CURRENT_TIMESTAMP - interval 24 hour 
GROUP BY
token_id, pair_token_id;

This query finds the last value of two related transaction which equals the price of a combination of a token and its pair_token. To get a similar results inside django I have used Subquery as below:

trs = Transaction.objects.filter(token_id__in=[1, 2, 3], pair_token_id__in=[4], timestamp__gt=Now()-timedelta(hours=24))
last_price = Transaction.objects.annotate(price=Abs(F('related_transaction__amount') / F('amount')))
trs = trs.values('token_id', 'pair_token_id').annotate(
    price=Subquery(last_price.filter(id=Max(OuterRef('id'))).values('price'))).\
values('token_id', 'pair_token_id', 'price')

But it generates a wrong query:

SELECT
`Transaction`.`token_id`, `Transaction`.`pair_token_id`, (
        SELECT ABS((U1.`amount` / U0.`amount`)) AS `price` 
        FROM 
        `Transaction` U0 LEFT OUTER JOIN `Transaction` U1 ON (U0.`related_transaction_id` =U1.`id`) 
        HAVING U0.`id` = MAX(`Transaction`.`id`)) AS `price` 
FROM 
`Transaction` 
WHERE (`Transaction`.`pair_token_id` IN (4) AND `Transaction`.`timestamp` > (CURRENT_TIMESTAMP - INTERVAL 86400000000 MICROSECOND) AND `Transaction`.`token_id` IN (1, 2, 3, 5, 6))

Mysql generates an error for this query and it must be, I do not know how to avoid generating a having query in using Subquery. If I use this query without any Subquery a group by clause would be generated but in using Subquery group by term would be removed and having appears.

I am using django 3.1.1 and mysql 8.0.19

Update:

Transaction model:

class Token:
    name = models.CharField(max_length=20)

class Transaction:
    token = models.ForeignKey(
        Token, blank=False, null=False, on_delete=models.CASCADE
    )
    pair_token = models.ForeignKey(
        Token, blank=True, null=True, on_delete=models.SET_NULL
    )

    related_transaction = models.ForeignKey(
        Transaction, blank=True, null=True, on_delete=models.PROTECT
    )

    amount = models.DecimalFeild(
        max_digits=10, price_decimals=3
    )


t1 = Token.objects.create(name='T1')
t2 = Token.objects.create(name='T2')
t3 = Token.objects.create(name='T3')

tr11 = Transaction.objects.create(
    token=t1, pair_token=t2, amount=Decimal('2.4')
)

tr12 = Transaction.obejcts.create(
    token=t2, pair_token=t1, related_transaction=tr11, amount=Decimal('3')
)

tr21 = Transaction.objects.create(
    token=t1, pair_token=t2, amount=Decimal('1.4')
)

tr22 = Transaction.obejcts.create(
    token=t2, pair_token=t1, related_transaction=tr21, amount=Decimal('3')
)


# If I want to get price of t2 in t1 I must divide tr21.amount / tr22.amount
# Notice that there are many transactions related to each pair and I want to just list of the last price of each pairs.  
motam
  • 677
  • 1
  • 6
  • 24
  • Can you add your models and example data and output? I have found that sub queries perform terribly in MySQL, is that not an issue here? – Iain Shelvington Apr 18 '21 at 16:05
  • I updated the question and add the model code, notice that my models are more complex than this but this simple model could represent what I needed. I did not understand what is the trouble for subqueries in mysql, I get acceptable response time with this query in mysql and I am trying to generate same query in django. – motam Apr 18 '21 at 16:16
  • Does this query give the result you want? `Transaction.objects.annotate(price=Abs(F('related_transaction__amount') / F('amount'))).values('token_id', 'pair_token_id', 'price')`. Not sure you need the the sub query if you're just annotating the result of a division? – Iain Shelvington Apr 18 '21 at 16:26
  • No this does not generate the result I want. There are many transactions related to same token and pair_token and I just want to list the last price (related to max id of each token and pair_token) of each pairs. I edited the question to show more useful example – motam Apr 18 '21 at 16:40
  • Can you give an example of the result you want? Do you just want unique pairs of token and pair_token with the last price for that combination or do you want all transaction rows with the last price annotated? – Iain Shelvington Apr 18 '21 at 16:55
  • Just the last price not the transaction row associated with that price. For example something like this [{'token_id': 1, 'pair_token_id': 2, 'price': 20}, {'token_id': 3, 'pair_token_id': 2, 'price': 30}] – motam Apr 18 '21 at 17:06

1 Answers1

1

The following should give you the result you want.

First we create a subquery that calculates the price for a transaction, filters by the outer token/pair_token and is then ordered so that the max id is first

latest_prices = Transaction.objects.filter(
    related_transaction__isnull=False
).annotate(
    price=Abs(F('related_transaction__amount') / F('amount'))
).filter(
    token=OuterRef('token'),
    pair_token=OuterRef('pair_token')
).order_by('-id')

Then use this subquery to annotate each row with the latest price for the token/pair_token and use distinct to get unique values

Transaction.objects.filter(
    related_transaction__isnull=False
).annotate(
    price=Subquery(latest_prices.values('price')[:1])
).values('token_id', 'pair_token_id', 'price').distinct()
Iain Shelvington
  • 31,030
  • 3
  • 31
  • 50
  • This generates same results but not efficient as the one I have shown in my question. `Transaction` is a huge table and this causes two nested search occurs for each pair. But in my request the nested query is a simple and straight query. I try both in my table and get 100 slower query for your suggestion. – motam Apr 18 '21 at 17:57
  • 1
    100 times slower is pretty bad :D out of interest how many rows do you have? Does filtering with `related_transaction__isnull=False` help at all? This is what I mean by MySQL performing badly with the subqueries that Django generates – Iain Shelvington Apr 18 '21 at 18:14
  • It has above 10 million rows, but based on some index and some other filter it reduces a lot. I could use `explain` command to hint you more about number of rows examined before returning results. But as I said there is a significant difference between two queries which I expect than what you suggest. I must correct my previous statement it is about 20 time slower, 0.6 second vs 0.03 second. – motam Apr 18 '21 at 19:43