0

I have table with 35 millions rows. When I select rows as shown as that was not working and It takes a long time and eventually gives a memory error (my ram is 32GB).

I partitioned the table based on the monthly date and I have 64 table as shown as table partition

But query not working again and It takes a long time and eventually gives a memory error . How can I do solve this problem?

Tomy
  • 225
  • 5
  • 18
  • 35M isn't big. It's a moderately small fact table. If you use compression or a clustered columnstore index on its footprint is not that big. – Panagiotis Kanavos Jul 15 '20 at 10:22
  • The table gets bigger every day. – Tomy Jul 15 '20 at 10:24
  • 1
    Partitioning is *not* a performance feature either, it's a data management feature. It can actually make queries slower as the server has to search multiple tables. None of these things justifies a memory error though – Panagiotis Kanavos Jul 15 '20 at 10:24
  • What is `studentIndex`? Your query is referring to two tables. – Gordon Linoff Jul 15 '20 at 10:26
  • 1
    The problem is the bad query - subqueries aside, those `try_convert(numeric(38, 12), student.Length )` operations means no indexing on the fields can be used so the database engine has to scan the entire table. This also means the table uses the wrong types. Numbers should be stored as *numbers*, not text. If the fields meant to be used for filtering, they should be indexed. – Panagiotis Kanavos Jul 15 '20 at 10:27
  • 1
    Be clear - is the error a *SQL Server* error message, with `Msg`, `Level`, `State`, etc, or is it a *Management Studio* error complaining that you're trying to load too many rows into a grid? – Damien_The_Unbeliever Jul 15 '20 at 10:27
  • @Tomy please **DON'T** revert the formatted query. Without it it's impossible to read the query and realise what the problems are – Panagiotis Kanavos Jul 15 '20 at 10:28
  • 1
    As for the error I suspect it complains about space in general - this query would require a *lot* of tempdb usage to store the parsed values, probably reaching either tempdb's maximum size or exhausting the hard drive – Panagiotis Kanavos Jul 15 '20 at 10:30
  • If you change the types of `Length` and `Weight` - or rather, `Lat` and `Lon` to numeric types, you can add a simple index to cover both of them. The condition `where Lat between lat1 and lat2 and Lon between lon1 and lon2` is a simple range search, easily handled by an index – Panagiotis Kanavos Jul 15 '20 at 10:33
  • Even `student.id = (select max(x2.id) ...` could be replaced with a JOIN and ranking functions like `ROW_NUMBER` or `FIRST_VALUE` etc. The way the query is written now, the server has to calculate `MAX` for every matching `student` row. With the JOIN and ranking functions the server may be able to limit the result set first, then produce the rank and filter in a *single* pass – Panagiotis Kanavos Jul 15 '20 at 10:39
  • BTW removing the coordinates like this, giving meaningless names to tables, makes it a *lot* harder* to answer the question. If you just said `I want to find the latest events in a certain area` people could write queries that solve this specific problems, using eg windowing functions, ranking, better joins. Or link to the books and puzzles people have written in the last 20 years to solve spatial and temporal problems – Panagiotis Kanavos Jul 15 '20 at 10:40

1 Answers1

1

This is basically your query:

select s.*
from student s
where s.id = (select max(si.id)
              from studentIndex si
              where si.family = s.name and si.name_ship = s.name_ship
             )    and 
      try_convert(numeric(38, 12), s.Length ) between 18.485411 and 23.51031 and
      try_convert(numeric(38, 12), s.Weight ) between 21.77079 and 77.13644 and
      s.time > 1594326600;

This is basically a full table scan, so I don't see why it would run out of member. The exception is the correlated subquery. For that, you need an index on studentIndex(family, name_ship) -- and to be sure that the columns are of the same time (otherwise, the index may not be used).

Your length and width comparisons look an awful lot like geographic coordinates. I might also suggest a GIS solution. However, if those are latitudes and longitudes, the area is pretty big, so GIS might not really be that helpful.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • The conversion though prevents the use of indexing. A `lat between Lat1 and Lat2 and Lon between Lon1 and Lon2` is just a range query that can easily be accelerated by an index – Panagiotis Kanavos Jul 15 '20 at 10:30
  • @PanagiotisKanavos . . . Not really. This is an inequality and needs GIS functionality for performance. But that doesn't matter. Scanning a table should not generate a memory error, so the issue is more likely the correlated subquery. – Gordon Linoff Jul 15 '20 at 12:56
  • It *does* matter, and a spatial index won't help here. That inequality is still a range search - the OP is searching for points inside a simple box, not a complex shape. Fixing this could convert the table scan to a seek returning a small number of results – Panagiotis Kanavos Jul 15 '20 at 13:01
  • 1
    I've tried such things in the past - searching for hotels in a certain radius. SQL Server's spatial indexes on a Point were no better than an index over a Lat Lon once you calculated the bounding box you needed. It would be a *very* different story if one had to search for shapes instead of points – Panagiotis Kanavos Jul 15 '20 at 13:03