1

I have a table which is defined (on Azure SQL Server) as follows:

CREATE TABLE dbo.[transaction] 
(
    id INT IDENTITY(1,1) NOT NULL,
    [date] DATETIME NULL,
    amount FLOAT NULL,
    balance FLOAT NULL,
    account_id INT NULL,
    CONSTRAINT PK__transact__32F PRIMARY KEY (id)
)

I want to find the last balance for each account prior to a certain date. The columns I need returned are: account_id, date, balance.

I've tried:

select account_id, max(date) as date 
from dbo.[transaction] 
group by account_id

This works, but it does not return the balance.

Secondly, my transactions are ordered first by date, and then by id. So if multiple transactions occur on the max date, it should choose the balance on the transaction with the highest id.

My application is written in flask-sqlalchemy, so a sqlalchemy answer would be great, but I'd also be happy with an answer in SQL.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Dirk R
  • 602
  • 7
  • 14
  • 1
    This has been asked and answered hundreds if not thousands of times. You use ROW_NUMBER with a partition by account_id. – Sean Lange Dec 19 '18 at 20:41
  • **Side note:** I would recommend **not** to use `FLOAT` for any number that needs to be precise - especially when it concerns money. `FLOAT` is inherently prone to rounding error - I'd recommend using `DECIMAL(p,s)` instead which is in fact precise – marc_s Dec 19 '18 at 21:07
  • Thanks marc_s, that's very good to know! – Dirk R Dec 20 '18 at 22:07

3 Answers3

2

You could use the row_number windows function to number the rows per account id, and take the last one per account:

SELECT account_id, [date], balance
FROM   (SELECT account_id, [date], balance,
               ROW_NUMBER() OVER (PARTITION BY account_id
                                  ORDER BY [date] DESC, id DESC) AS rn
        FROM   [transaction]) t
WHERE  rn = 1
Mureinik
  • 297,002
  • 52
  • 306
  • 350
  • Brilliant! Exactly what i was looking for. I honestly could not figure out if row_number of another approach was better for my scenario especially with sorting on two different fields. Your answer was the one that finally made the penny drop on how to use row_number partition! – Dirk R Dec 20 '18 at 22:13
2

Solution: Including your Date check

CREATE TABLE #transaction (
    id int NOT NULL,
    [date] datetime NULL,
    amount float NULL,
    balance float NULL,
    account_id int NULL
) ;

Insert Into #transaction Values
(1,'2018-11-20',50,4000,100),
(2,'2018-11-21',75,2475,100),
(3,'2018-12-15',75,2400,100),
(4,'2018-11-22',25,4000,200),
(5,'2018-11-22',25,4000,300)


With CTE As
(
   Select 
      ROW_NUMBER() Over(Partition By account_id Order By [Date] Desc) As rn,
      account_id, [Date], balance 
   From #transaction
   Where [Date] < '2018-12-01'
)
Select account_id, [Date], balance From CTE
Where  rn = 1 

Result:

account_id  Date                    balance
100         2018-11-21 00:00:00.000 2475
200         2018-11-22 00:00:00.000 4000
300         2018-11-22 00:00:00.000 4000
level3looper
  • 1,015
  • 1
  • 7
  • 10
  • Nice example, with the date check and also sample data. I would like to mark both answers as the answer if I could! – Dirk R Dec 20 '18 at 22:20
1

Both the provided answers are equally brilliant.

I have taken this and converted this into a python sqlalchemy answer for reference:

from sqlalchemy import func
from datetime import datetime
import pandas as pd

start = datetime(2018,12,1)

row_number_column = func.row_number()                                    \
        .over(partition_by=Transaction.account_id,
              order_by=(Transaction.date.desc(), Transaction.id.desc()))  \
        .label('row_number')

query = DB.session.query(
            Transaction.account_id,
            Transaction.date,
            Transaction.balance,
            row_number_column)                            \
        .filter(Transaction.date < start)                 \
        .from_self().filter(row_number_column == 1)

df = pd.DataFrame(query.all(), columns=['Account Id', 'Date', 'Balance', 'RowId'])
df.drop(['RowId'], axis=1, inplace=True)
print(df)

The last three optional lines converts this into a pandas dataframe with opening balances, sorted by date and then id.

Dirk R
  • 602
  • 7
  • 14