I am writing a quick-and-dirty application to load sales plan data into SQL Server (2008 FWIW, though I don't think the specific version matters).
The data set is the corporate sales plan: a few thousand rows of Units, Dollars and Price for each combination of customer, part number and month. This data is updated every few weeks, and it's important to track who changed it and what the changes were.
-- Metadata columns are suffixed with ' ##', to enable an automated
-- tool I wrote to handle repetitive tasks such as de-duplication of
-- records whose values didn't change in successive versions of the
-- forecast.
CREATE TABLE [SlsPlan].[PlanDetail]
(
[CustID] [char](15) NOT NULL,
[InvtID] [char](30) NOT NULL,
[FiscalYear] [int] NOT NULL,
[FiscalMonth] [int] NOT NULL,
[Version Number ##] [int] IDENTITY(1,1) NOT NULL,
[Units] [decimal](18, 6) NULL,
[Unit Price] [decimal](18, 6) NULL,
[Dollars] [decimal](18, 6) NULL,
[Batch GUID ##] [uniqueidentifier] NOT NULL,
[Record GUID ##] [uniqueidentifier] NOT NULL DEFAULT (NEWSEQUENTIALID()),
[Time Created ##] [datetime] NOT NULL,
[User ID ##] [varchar](64) NULL DEFAULT (ORIGINAL_LOGIN()),
CONSTRAINT [PlanByProduct_PK] PRIMARY KEY CLUSTERED
([CustID], [InvtID], [FiscalYear], [FiscalMonth], [Version Number ##])
)
To track changes, I'm using an IDENTITY column as part of the primary key to enable multiple version with the same primary key. To track who did the change, and also to enable backing out an entire bad update if someone does something completely stupid, I am inserting the Active Directory logon of the creator of that version of the record, a time stamp, and two GUIDs.
The "Batch GUID" column should be the same for all records in a batch; the "Record GUID" column is obviously unique to that particular record and is used for de-duplication only, not for any sort of query.
I would strongly prefer to generate the batch GUID inside a query rather than by writing a stored procedure that does the obvious:
DECLARE @BatchGUID UNIQUEIDENTIFIER = NEWID()
INSERT INTO MyTable
SELECT I.*, @BatchGUID
FROM InputTable I
I figured the easy way to do this is to construct a single-row result with the timestamp, the user ID and a call to NEWID() to create the batch GUID. Then, do a CROSS JOIN to append that single row to each of the rows being inserted. I tried doing this a couple different ways, and it appears that the query execution engine is essentially executing the GETDATE() once, because a single time stamp appears in all rows (even for a 5-million row test case). However, I get a different GUID for each row in the result set.
The below examples just focus on the query, and omit the insert logic around them.
WITH MySingleRow AS
(
Select NewID() as [Batch GUID ##],
ORIGINAL_LOGIN() as [User ID ##],
getdate() as [Time Created ##]
)
SELECT N.*, R1.*
FROM util.zzIntegers N
CROSS JOIN MySingleRow R1
WHERE N.Sequence < 10000000
In the above query, "util.zzIntegers" is just a table of integers from 0 to 10 million. The query takes about 10 seconds to run on my server with a cold cache, so if SQL Server were executing the GETDATE() function with each row of the main table, it would certainly have a different value at least in the milliseconds column, but all 10 million rows have the same timestamp. But I get a different GUID for each row. As I said before, the goal is to have the same GUID in each row.
I also decided to try a version with an explicit table value constructor in hopes that I would be able to fool the optimizer into doing the right thing. I also ran it against a real table rather than a relatively "synthetic" test like a single-column list of integers. The following produced the same result.
WITH AnotherSingleRow AS
(
SELECT SingleRow.*
FROM (
VALUES (NewID(), Original_Login(), getdate())
)
AS SingleRow(GUID, UserID, TimeStamp)
)
SELECT R1.*, S.*
FROM SalesOrderLineItems S
CROSS JOIN AnotherSingleRow R1
The SalesOrderLineItems is a table with 6 million rows and 135 columns, to make doubly sure that runtime was sufficiently long that the GETDATE() would increment if SQL Server were completely optimizing away the table value constructor and just calling the function each time the query runs.
I've been lurking here for a while, and this is my first question, so I definitely wanted to do good research and avoid criticism for just throwing a question out there. The following questions on this site deal with GUIDs but aren't directly relevant. I also spent a half hour searching Google with various combinations of phrases didn't seem to turn up anything.
Azure actually does what I want, as evidenced in the following question I turned up in my research: Guid.NewGuid() always return same Guid for all rows. However, I'm not on Azure and not going to go there anytime soon.
Someone tried to do the same thing in SSIS (How to insert the same guid in SSIS import) but the answer to that query came back that you generate the GUID in SSIS as a variable and insert it into each row. I could certainly do the equivalent in a stored procedure but for the sake of elegance and maintainability (my colleagues have less experience with SQL Server queries than I do), I would prefer to keep the creation of the batch GUID in a query, and to simplify any stored procedures as much as possible.
BTW, my experience level is 1-2 years with SQL Server as a data analyst/SQL developer as part of 10+ years spent writing code, but for the last 20 years I've been mostly a numbers guy rather than an IT guy. Early in my career, I worked for a pioneering database vendor as one of the developers of the query optimizer, so I have a pretty good idea what a query optimizer does, but haven't had time to really dig into how SQL Server does it. So I could be completely missing something that's obvious to others.
Thank you in advance for your help.