2

I need a backup from a database that makes a backup from the whole database but changes a few fields in table with random data (instead of the real data).

What is a good way to do that?

Thanks in advance ;-)

jarlh
  • 42,561
  • 8
  • 45
  • 63
shwartel
  • 21
  • 1
  • "...but changes a few fields in table with random data (instead of the real data)..." -- you are probably talking about "sanitizing data". – The Impaler Aug 09 '19 at 12:49
  • Not really... the customer doesn´t allow to see real names. So I need to provide a script that makes a backup and changing some values to random data. – shwartel Aug 09 '19 at 12:52
  • This is probably closer to "anonymizing"... If you explain the reasoning, then you will get a more targeted answer, in this broad form this question is in danger of being closed. Include a specific scenario, perhaps with a mock table definition to make your question more specific. – Chris Schaller Aug 09 '19 at 12:55
  • Short answer is you can't. You can't anonymize (or obfuscate) values in tables DURING a backup. You must do that in the table before you make a backup. So the first step requires that you create a copy of the existing database before you attempt to change whatever values you desire. Once you figure out how to do that (hint - backup and restore), you can then focus on defining which columns in which tables need to be anonymized. After you have changed the desired values (and verified that you did not break anything) you can make a backup and drop the copied database. – SMor Aug 09 '19 at 13:50

1 Answers1

1

There is no silver bullet or one size fits all script for this issue, however the process itself is not a big deal, if I follow your question to the letter, you are looking for guidance on how to script the following 3 operations

  1. Backup the database
  2. Restore into a temporary location
  3. Execute scripts to anonymize the data
  4. Backup the anonymized data
    • See issue 1

This is a common development scenario, but also comes up when we want to otherwise test or demonstrate an application to prospective clients. Read over this tip for detailed information as well as a solution Automatically Create and Anonymize Downstream Databases from Azure

Scripting Backups

MSDN is a good source of scripts for common SQL Server tasks, there are many different flavours for scripting backups and many of them will be dependent on what resources your database is hosted on, this is a good start, but google or SO is your friend here: Create a Full Database Backup (SQL Server)

USE SQLTestDB;  
GO  
BACKUP DATABASE SQLTestDB  
TO DISK = 'E:\Backups\SQLTestDB.Bak'  
   WITH FORMAT,  
      MEDIANAME = 'E_Backups',  
      NAME = 'Full Backup of SQLTestDB';  
GO  

Restoring backups

You can easily use Backup to clone an existing database and restore it into the same server as long as you use a different database name, and you restore to different logical files on disk

by default a restore operation would try to use the original filenames but that will not work when we restore side-by-side the original source database, because those files are still in use by the source database!

Have a read over Restore a Database to a New Location (SQL Server)

First you need to know the names of the files stored within the backup, then you can construct a query that will restore the database mapped to new files. This answer to How do you backup and restore a database as a copy on the same server? should help a lot.

Putting that together we get:

BACKUP DATABASE SQLTestDB TO DISK = 'E:\Backups\SQLTestDB.Bak'  
GO  
-- use the filelistonly command to work out  what the logical names 
-- are to use in the MOVE commands.  the logical name needs to
-- stay the same, the physical name can change
restore filelistonly from disk='E:\Backups\SQLTestDB.Bak'
-- --------------------------------------------------
--|  LogicalName  |           PhysicalName           |
-- --------------------------------------------------
--| SQLTestDB     | C:\mssql\data\SQLTestDB.mdf      |
--| SQLTestDB_log | C:\mssql\data\SQLTestDB_log.ldf  |
-- -------------------------------------------------

restore database SQLTestDB_Temp from disk='E:\Backups\SQLTestDB.Bak'
with move 'SQLTestDB' to 'C:\mssql\data\SQLTestDB_Temp.mdf',
move 'SQLTestDB_log' to 'C:\mssql\data\SQLTestDB_log.ldf'

It is possible to put this script into a stored proc so you can reuse it, one issue is how to use the results from RESTORE FILELISTONLY, you'll find this answer will help if you want to go down that path: https://stackoverflow.com/a/4018782/1690217

Anonymizing Data

This is where things get specific, now that your database has been restored to a temporary location, you can pretty much do whatever you want to the data in a series of INSERT, UPDATE, or `DELETE' statements that you need, you could even modify the schema to remove particularly sensitive or audit or other logging tables that you may have that you don't need to distribute.

Do not leave audit tables in the database that you have anonymized, unless you plan on anonymizing the context within those logs as well! Depending on your circumstances, also consider nulling out all IMAGE and VARBINARY columns as their contents will be particularly hard to process sufficiently.

I wont go into the specifics, there are already healthy discussions on this topic on SO:

Backup the Anonymized data

When you have finished scrubbing or anonymizing your database, simply complete your script with a call to backup the temporary DB that has been anonymized.

Chris Schaller
  • 13,704
  • 3
  • 43
  • 81