0

I plan a database schema to store Amazon product attributes and extra marketplace-specific values (such as title, artist, weight, etc.)

So far there is a Products table with GTIN varchar(13) (PK) column. In my particular case GTIN may be an EAN/UPC/ISBN identifier. There is also an ASIN char(10) column in Products to associate GTIN with ASIN.

A behavior when both EAN and UPC exist for the same ASIN are programmatically caught and properly handled, so consider all ASINs unique. I defined a UNIQUE NONCLUSTERED CONSTRAINT on ASIN and related it to Products table as one-to-many.

The second table ProductsData defines ASIN char(10) (FK) and mid tinyint (Marketplace ID). All ASINs are stored together with the respective marketplace IDs:

rowid    ASIN          mid
1        B0002DB5GS    1
2        B0002DB5GS    44
3        B0002DB5GS    39
4        B0002Y4SYS    1
5        B0002Y4SYS    44
6        B0002Y4SYS    39

As you noticed there is also a rowid int IDENTITY(1,1) column which is dummy but implements uniqueness.

Assumning the follwing facts:

  • very rare updates
  • relatively rare inserts (each added product creates 3 records in a transaction)
  • no deletions
  • intensive selects on ASIN column
  • rowid is a dummy which just ensures uniqueness.

Here come three questions:

  1. Does it worth to make a composite index on ASIN and mid?
  2. If yes, clustered or non-clustered?
  3. May I get rid of a clustered index on rowid cause I don't realy need it?
Interface Unknown
  • 713
  • 10
  • 26

1 Answers1

1

From what you stated above, if performance was an issue and I felt that indexing was the solution I would implement a non-clustered covering index on ASIN and mid. Something like this:

CREATE NONCLUSTERED INDEX IX_ASIN_COVERING_mid ON ProductsData (ASIN) INCLUDE (mid)

This way when you join to the ProductsData table, you can leverage the index for performance and because the mid is 'included', it will be stored with the index and the query engine will not need to go any deeper.

There are of course many paths forward, but based on your post, this is what I would lean toward. Hope it helps!

So to sort of sum up your questions

  1. My opinion is to use a covering index rather than a composite index. This is because it sounds like your link between Products and ProductsData is the ASIN, and mid is just along for the ride. Therefore, it isn't necessary to have it be composed with ASIN in the index...including it will work great here - sort of what it is designed for in my opinion.

  2. Non-clustered as stated in 1, because Clustered indexes should be unique. Also, clustered indexes maintain an ordering of the data, so if you create a new product and its ASIN fits somewhere in the middle of the table, there is overhead here because SQL Server will need to re-order the entire table

  3. I would think that you could get rid of it...if you aren't using that column for anything and it is just a dummy value that you won't be using in any queries then if it were me I'd probably drop it.

How 'bout a Fresca
  • 2,267
  • 1
  • 15
  • 26
  • The idea of a covering index looks very nice to me. Thank you. What about `rowid` column? Do I ready need it this this schema? Can I ger rid of the PK Clustered index which is actually never used? – Interface Unknown Sep 24 '15 at 17:50
  • See updated answer...I think that you could get rid of this column altogether, since it looks like your ProductsData table is just a lookup/mapping table for ASIN and mid – How 'bout a Fresca Sep 24 '15 at 17:53
  • Thank you for such a detailed answer. Much appreciated! – Interface Unknown Sep 24 '15 at 18:22