0

I have 100GB SAS dataset on LINUX. Space allocated for the dataset after deleting 50 GB data from the dataset is still 100 GB. I'm not sure if this is related to SAS or LINUX.

The below code is performing the delete:

Proc SQL; Delete * from dataset where Column=value; Quit;
Robert Penridge
  • 8,424
  • 2
  • 34
  • 55
SAS_learner
  • 521
  • 1
  • 13
  • 30
  • 2
    Deleting records does not automatically reduce the size of data: http://support.sas.com/kb/32/042.html. – Gordon Linoff Nov 29 '15 at 17:03
  • 1
    @shawn 's answer makes sense. Please show how you deleted the observations, to confirm he is correct that you did a logical delete. – Quentin Nov 29 '15 at 19:11
  • I'm just using Proc SQL; Delete * from dataset where Column=value; Quit; – SAS_learner Nov 29 '15 at 23:49
  • @Sampath_T Please edit that into the question; that should've been part of the question to begin with. – Joe Nov 30 '15 at 16:08

2 Answers2

1

The issue is that you have performed a logical delete rather than a physical delete. Gordon's correct comment pointing you to SAS Note 32042 gives the rather terse official description of this behavior. The simplest way to physically remove the deleted records will be recreate the data file, from the current file, which will cause SAS to clean up the logically deleted observations. This could be something like one of the following code patterns.

data mydata;
set mydata;
run;

proc sql;
create table mydata as
select * from mydata;
quit;
shawn
  • 56
  • 2
  • When I tried to recreate the data file from the current file, It did clean up the deleted observations and freed up the space. But, I got following Warning. WARNING: This CREATE TABLE statement recursively references the target table. A consequence of this is a possible data integrity problem. – SAS_learner Nov 30 '15 at 01:05
  • The warning message is bothersome but isn't something to worry about if you are using a sql pattern like i presented (basic overwriting of a single file). – shawn Nov 30 '15 at 03:19
  • I did a proc sort after the SQL delete and it reclaimed the space allocated for deleted observations. – SAS_learner Nov 30 '15 at 19:35
  • Note that the reason proc sort performed the cleanup is due to it recreating the file. – shawn Dec 03 '15 at 23:59
0

As others noted, PROC SQL DELETE performs a logical delete, not a physical delete.

While you could do this in two steps, it will be faster to do this in the data step (so you have only one step). If your DELETE is a simple where clause:

data have;
  set have;
  where column ne value;
run;

The WHERE clause uses SQL syntax, so LIKE and BETWEEN and such are legal there. This will process the delete as a physical delete and your new dataset will be smaller.

Joe
  • 62,789
  • 6
  • 49
  • 67
  • Thanks Joe. I did a proc sort after the SQL delete and it reclaimed the space allocated for deleted observations. – SAS_learner Nov 30 '15 at 19:34
  • @Sampath_T That would also work - and you should post that as an answer! I suspect it's not as speedy as my solution, but if you need to sort anyway, it probably doesn't make too much difference. – Joe Nov 30 '15 at 20:14