6

Let we have a table of payments having 35 columns with a primary key (autoinc bigint) and 3 non-clustered, non-unique indeces (each on one int column).

Among the table's columns we have two datetime fields:

  1. payment_date datetime NOT NULL

  2. edit_date datetime NULL

The table has about 1 200 000 rows. Only ~1000 of rows have edit_date column = null. 9000 of rows have edit_date not null and not equal to payment_date Others have edit_date=payment_date

When we run the following query 1:

select top 1 *
from payments
where edit_date is not null and (payment_date=edit_date or payment_date<>edit_date)
order by payment_date desc

enter image description here

server needs a couple of seconds to do it. But if we run query 2:

select top 1 *
from payments
where edit_date is not null
order by payment_date desc

enter image description here

the execution ends up with The log file for database 'tempdb' is full. Back up the transaction log for the database to free up some log space.

If we replace * with some certain column, see query 3

select top 1 payment_date
from payments
where edit_date is not null
order by payment_date desc

enter image description here

it also finishes in a couple of seconds.

Where is the magic?

EDIT I've changed query 1 so that it operates over exactly the same number of rows as the 2nd query. And still it returns in a second, while query 2 fills tempdb.

ANSWER I followed the advice to add an index, did this for both date fields - everything started working quick, as expected. Though, the question was - why in this exact situation sql server behave differently on similar queries (query 1 vs query 2); I wanted to understand the logic of the server optimization. I would agree if both queries did used tempdb similarly, but they didn't....

In the end I mark as the answer the first one, where I saw the must-be symptoms of my problem and the first, as well, thoughts on how to avoid this (i.e. indeces)

horgh
  • 17,918
  • 22
  • 68
  • 123
  • 1
    Have you considered following the advice of the error message? Some queries will require use of tempdb, and if the log file is full, no other transactions can be performed. – Andrew Barber Aug 09 '12 at 03:11
  • 1
    @AndrewBarber I didn't..but what I am trying to find out is why almost similar queries are executed completely differently...and how am I expected to build queries to avoid this kind of behaivour.. – horgh Aug 09 '12 at 03:46
  • 2
    Don't use `*` to select all columns, for starters. And back up that transaction log. – Andrew Barber Aug 09 '12 at 03:47
  • did you tried this, as you are having 35*(90,000) records to be generated???? select top 1 payment_date from payments where edit_date is not null and payment_date=edit_date order by payment_date desc – NG. Aug 09 '12 at 06:14
  • Queries 1,3 are executed instantly; only the 2nd query lacks of the tempdb size – horgh Aug 09 '12 at 06:39
  • Seems strange to me that the queries with the simpler predicates get a filter operator in the plan whereas the ones with the more convoluted predicate gets pushed down to the scan fine. Without the filter you would get a TOP N sort that only has to keep track of a single row not sort the entire result set. – Martin Smith Aug 09 '12 at 08:56
  • But that is it, I didn't mix anything...as was surprised myself to see no filter in the 1st query execution plan. – horgh Aug 09 '12 at 12:06

2 Answers2

5

This is happening cause certain steps in an execution plan can trigger writes to tempdb in particular certain sorts and joins involving lots of data.

Since you are sorting a table with a boat load of columns, SQL decides it would be crazy to perform the sort alone in temp db without the associated data. If it did that it would need to do a gazzilion inefficient bookmark lookups on the underlying table.

Follow these rules:

  1. Try to select only the data you need
  2. Size tempdb appropriately, if you need to do crazy queries that sort a gazzilion rows, you better have an appropriately sized tempdb
Sam Saffron
  • 128,308
  • 78
  • 326
  • 506
  • But why doesn't it do the same "crazy" operation when I try to execute the first (see the question) query...if we count the number of rows resulting from the first and the second queries, their almost equal (about one million)...but taking `TOP 1` in the 1st one works good, but in the 2nd - it fails......What in the 2nd query causes log usage? – horgh Aug 09 '12 at 04:24
  • @horgh *All transactions* cause log usage. The question you should be asking is "what causes tempdb usage". – Andrew Barber Aug 09 '12 at 04:45
  • 1
    @horgh you can see from the plan, hover over the fat line, its moving all the data into tempdb so it can sort it (including all columns in the first and 1 column in the second) ... after it moves all the data there it selects the first row. sure the plan is poop, but if you added an index on paymentdate this would be ultra fast in both cases. alternatively you can hand code it so it does not do the data shuffle. – Sam Saffron Aug 09 '12 at 06:04
5

Usually, tempdb fills up when you are low on disk space, or when you have set an unreasonably low maximum size for database growth. Many people think that tempdb is only used for #temp tables. When in fact, you can easily fill up tempdb without ever creating a single temp table. Some other scenarios that can cause tempdb to fill up:

  • any sorting that requires more memory than has been allocated to SQL Server will be forced to do its work in tempdb;
  • if the sorting requires more space than you have allocated to tempdb, one of the above errors will occur;
  • DBCC CheckDB('any database') will perform its work in tempdb -- on larger databases, this can consume quite a bit of space;
  • DBCC DBREINDEX or similar DBCC commands with 'Sort in tempdb' option set will also potentially fill up tempdb;
  • large resultsets involving unions, order by / group by, cartesian joins, outer joins, cursors, temp tables, table variables, and hashing can often require help from tempdb;
  • any transactions left uncommitted and not rolled back can leave objects orphaned in tempdb;
  • use of an ODBC DSN with the option 'create temporary stored procedures' set can leave objects there for the life of the connection.

    USE tempdb GO

        SELECT name 
            FROM tempdb..sysobjects 
    
        SELECT OBJECT_NAME(id), rowcnt 
            FROM tempdb..sysindexes 
            WHERE OBJECT_NAME(id) LIKE '#%' 
            ORDER BY rowcnt DESC
    

The higher rowcount, values will likely indicate the biggest temporary tables that are consuming space.

Short-term fix

DBCC OPENTRAN -- or DBCC OPENTRAN('tempdb')
DBCC INPUTBUFFER(<number>)
KILL <number>

Long-term prevention

-- SQL Server 7.0, should show 'trunc. log on chkpt.' 
-- or 'recovery=SIMPLE' as part of status column: 

EXEC sp_helpdb 'tempdb' 

-- SQL Server 2000, should yield 'SIMPLE': 

SELECT DATABASEPROPERTYEX('tempdb', 'recovery')
ALTER DATABASE tempdb SET RECOVERY SIMPLE

Reference : https://web.archive.org/web/20080509095429/http://sqlserver2000.databases.aspfaq.com:80/why-is-tempdb-full-and-how-can-i-prevent-this-from-happening.html
Other references : http://social.msdn.microsoft.com/Forums/is/transactsql/thread/af493428-2062-4445-88e4-07ac65fedb76

Community
  • 1
  • 1
NG.
  • 5,695
  • 2
  • 19
  • 30
  • did you tried this???? select top 1 payment_date from payments where edit_date is not null and payment_date=edit_date order by payment_date desc – NG. Aug 09 '12 at 06:12
  • I tested all the queries provided in the question. Or what are you asking? – horgh Aug 09 '12 at 06:27
  • I wanted to know, how much time does this query takes to execute – NG. Aug 09 '12 at 06:36
  • It is written in the bottom of the question, just after that query and its execution plan: **it also finishes in a couple of seconds.** – horgh Aug 09 '12 at 06:38
  • So do you need all the 35*90,000 columns. This is why the issue is coming because the query has to fetch the large no. of records – NG. Aug 09 '12 at 06:51
  • Doesn't query 1 has to fetch same no. of records? – horgh Aug 09 '12 at 07:12
  • then I would suggest you to add an index on paymentdate, it should show performance improvement – NG. Aug 09 '12 at 07:46
  • ohhh, dint saw it, anyways glad to see that your issue is resolved – NG. Aug 09 '12 at 08:01
  • 1
    tempdb can only be in simple recovery mode. – Martin Smith Aug 09 '12 at 09:02
  • As of SQL 2014: `Option 'RECOVERY' cannot be set in database 'tempdb'.` tempdb is ALWAYS in `RECOVERY=SIMPLE` now. – Ross Presser Nov 01 '17 at 21:18