How to organise and query SQL tables with many user entries, when each user only needs to read their own data input in the last month?

Question

I am making a budget management app which allows users to enter entries of their expenditures with the following schema:

user_id, date, item, currency, amount_of_money, category

When the app first loads, the app will only need to load the specific user’s expenditure in the previous month.

So the question comes:

Should I put all users info in one single table, or each user has an individual table, or are there better alternatives? If I put everything in one table, the app has to go through all irrelevant users info to extract the info of one user and it seems inefficient, but I also saw online making a table per user is a terrible idea too
Assume the data is appended chronologically to the table, with the first entry being the earliest date. Since we only need the last month data, there’s no need to query from the first data, but just from the bottom most till the date has passed a month ago, and stop reading the data above (as we know it’s definitely irrelevant), then just directly end the search and return the data. In SQL there is a SELECT … WHERE - but will it keep checking the top data which we know for sure they are useless? If it will, is there better methods?

*"If I put everything in one table, the app has to go through all irrelevant users info to extract the info of one user and it seems inefficient"*. What? — Joan Lara, Feb 21 '23 at 12:54
@JoanLara I am not too sure how SQL works - but if there is a big table with all users input, then does the SQL need to read each entry one by one, check if the user_id of the entry matches, then get their own data. So it might have to go through billions of other users entries to extract their own ones? — Chloe, Feb 21 '23 at 12:57
I don't know what kind of app you want to make, but I suggest you do a little study on how SQL works before you start. — Joan Lara, Feb 21 '23 at 13:06

score 1 · Answer 1 · answered Feb 21 '23 at 12:57

1

You can use the ORDER BY function, which can be using DESC and ASC on numbers or alphabetically. You can join tables using INNER JOIN to only get matching items or LEFT JOIN to use matching and nonmatching items.

answered Feb 21 '23 at 12:57

Liam Snapp

11
2

Do you mean using ORDER BY to sort the table using user_id so the same user's entry will be grouped together? – Chloe Feb 21 '23 at 13:01

Chris · Answer 2 · 2023-02-21T18:35:09.467

First, you don't need separate tables for each user. All this can go in one table just as you've designed above.

Second, never use reserved words for a column name. change the "date" column to "expenditure_date" or something like that.

Next, it's usually good to include an incrementing index of some kind in your table so that each record has a unique value. Make that your primary key. It's not always necessary but it helps to be able to refer to a specific record with a unique value.

Finally, add a non-unique index to user_id.

When you say "previous month" do you mean records for the last 30 or 31 days or do you mean if it's February, you need all the January records?

I'm assuming you mean the last 30 or 31 days...

So...

When the user successfully logs in, I assume you will have a user id which will be indicated by <loginId>. I don't know what language or environment you're using to develop this. I'm also assuming SQL server but mySQL, Oracle and PostGreSql will be nearly identical. Each has it's own function for getting dates.

Select item, currency, amount_of_money, category
From expenditure_table
Where user_id=<loginId>
    and expenditure_date >= dateadd(m,-1,getdate())

dateadd(m,-1,getdate()) means subtract 1 month from the current date.

To select last month:

declare @eom date= eomonth(dateadd(m,-1,getdate())) /*end date of last month */
declare @bom date=dateadd(d,-(datepart(d,@eom)-1),@eom) /*beginning date of last month)

Select item, currency, amount_of_money, category
From expenditure_table
Where user_id=<loginId>
    and expenditure_date between @bom and @eom

As far as mySQL, the above queries should give you a good hint. They're in Microsoft SQL but if not, let me know and I'll dummy up some mySQL code.

Thanks so much! Actually, I mean the January records rather than the last 30 days. I was just wondering does the SQL knows they don't need to check the data above automatically? — Chloe, Feb 21 '23 at 14:15
Yes. SQL, no matter the version and flavor, is incredibly efficient about how it queries data. I've worked with tables with billions of records -- phone call metadata -- and there is no problem finding what you want when you want it. SQL is not like a an excel spreadsheet, for instance. Also, remember where I said to add an index field that increments and make it your primary key? Well, that's called a "clustered index". Everytime you create a new record, a new key is generated and SQL stores the data on the harddrive in that order. What flavor of sql are you using? Micoroft, mySQL? — Chris, Feb 21 '23 at 16:04
Ok. Since you mean last month, I'll edit my answer when StackExchange is done fritzing out with editing. — Chris, Feb 21 '23 at 16:19

Neville Kuyt · Answer 3 · 2023-02-21T18:04:59.983

I'd suggest you read up on relational databases - it's a fairly dry topic, but there's a lot of good material out there that will help! I started with https://books.google.co.uk/books/about/An_Introduction_to_Database_Systems.html

To answer your question: relational databases are designed to store entities and relationships. In your model, your data ("budget item") is an entity. "User" is probably another entity (you say you have a user_id column). In nearly every case, you want to store all rows for a given entity in the same physical table.

The where clause is the common way for you to filter rows. As long as the database can use an index, that operation is blazingly fast, because relational databases are designed for this use case. Very broadly, as long as there is an index on user_id and transaction_date, you can store tens of millions of rows (probably more) in a single table and the filtering operation will take just milliseconds.

To mitigate the performance problems, RDBMS systems use indexes. An index (again, very simply!) is a copy of the columns, ordered in the way you specify, with a pointer to the rows that match the entries in those columns. For instance, the old paper phone books were indexed by "geography" (e.g. city/town), and then "surname". If you wanted to find Jones in York, you'd find the book for "york", and skip straight to "Jones", without having to page through all the other last names.

Finally, relational database design expects there to be a unique identifier on your table that uniquely identifies a given row. This is known as a primary key - for instance, user_id is probably the primary key for the users table. You want to figure out whether there is a natural primary key in your table (I don't think so, they're very rare in the wild), or create a surrogate key (essentially, a transaction_id column with some kind of mechanism of creating meaningless, guaranteed unique values such as an auto-incrementing integer, or a GUID).

Edit: this approach works for most applications, on modern hardware. However, for extreme cases - billions or more records, and this number is getting bigger every year - you may end up with performance problems.

Once you're in that position, you need more...exotic architectures. Horizontal partitioning is one such approach. However, it comes at significant cost in complexity. For instance, if you partition by "last name", a search for "first name" would have to cross every single partition.

Thank you! Just wondering how are relational databases designed for the use case above. (I thought it got to iterate item by item so I originally thought it is O(n) complexity - but I guess from what you said it used a smarter method?) Any starting point you would recommend to check out? And in particular, do you mind explaining whether the SQL query knows the topmost data are the ones not in the date range and will not scan through them? — Chloe, Feb 21 '23 at 14:14
Should I say, although user_id information is placed at different time by different users, since there is indexes, there is a copy of the column ordered probably (e.g. 1, 1, 2, 2, ...) then when I want to just find user2 detail, it will just hop to number 2 of the index column, extract info with the pointer, and then break the program? A follow up question - if this is so efficient, why are there people partitioning tables horizontally? — Chloe, Feb 21 '23 at 17:42

How to organise and query SQL tables with many user entries, when each user only needs to read their own data input in the last month?

3 Answers3