0

I have a table with 320 million+ rows and 34 columns, all of varchar(max) datatype, and with no indexing.

I am finding it extremely time consuming to summarize the whole table. Can anyone suggest best way to optimize this considering the following purpose I need to use it:

  • simple select queries
  • sum on a few column

Should I go for creating a non-clustered index on sequential row number column that I have added?

My table size on disk is 185 GB approx.

I am using

  • Windows Server 2008 R2
  • Xeon Processor 2.09 x 2
  • 16 GB RAM
  • SQL Server 2008 R2

Since data has no unique column so I have already added a sequenced with integer and it took 2 days to complete the operation.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
  • 4
    Start by fixing the datatypes. Why on earth would all 34 columns be varchar(max)? You want to do SUM operations so there must be some ones that should be numeric – Martin Smith Dec 30 '18 at 13:13
  • Each record is an average of 578 bytes so probably few if any of the columns actually need MAX even if they are strings. – Martin Smith Dec 30 '18 at 13:27
  • Actually, data was loaded through bulk insert with non-xml format file from more than 16000 thousand fixed length files. Just to ensure all data would load correctly, I used the varchar(max) type. – Shahbaz Khan Dec 30 '18 at 13:32
  • There are two column that needs to be converted to bigint. Let me try doing that for the record. Do you suggest any indexing on id column which is already an sequencial integer? – Shahbaz Khan Dec 30 '18 at 13:33
  • 3
    So that should at most be used for staging. Then load it into a table with a sane schema and indexes for operational querying. varchar(max) can't be indexed – Martin Smith Dec 30 '18 at 13:33
  • One more issue with column with strings character is that, since the source files were fixed length, all leading and trailing space were loader as a part of the field. Do you have any idea how I can truncate those for each field or column or should I consider it doing before loading in to mssql server? – Shahbaz Khan Dec 30 '18 at 13:37
  • 3
    @ShahbazKhan, I recommend trimming the spaces before loading into the database. You could also perform other conversion and validation in the ETL process instead of ELT. – Dan Guzman Dec 30 '18 at 13:47

1 Answers1

1
  1. Create new table with suitable data types. It's the most important part - to define type for your columns. nvarchar(max) is the most common type, so it can not be optimized. Use int or bigint for numbers, use nvarchar(N) where N - max length. Use bit for boolean, and so on
  2. Create primary key, indexes for search
  3. Copy data from the old table to new one with portions of 10000 or 100000 rows.
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Backs
  • 24,430
  • 5
  • 58
  • 85