1

I am trying to load multiple files via polybase and its throwing me an error while creating view.

select * from dbo.test

fname lname
a.csv null
b.csv null

I am able to successfully create data source and external table without any issue.

Code Error

Error in the code.
Msg 207, Level 16, State 1, Line 4
Invalid column name 'DATA_FILE_NAME'.
Invalid column name 'DATA_FILE_NAME'.
Invalid column name 'DATA_FILE_NAME'.
Invalid column name 'DATA_FILE_NAME'.
Invalid column name 'DATA_FILE_NAME'.
Invalid column name 'DATA_FILE_NAME'.
Invalid column name 'DATA_FILE_NAME'.
Invalid column name 'DATA_FILE_NAME'.
Invalid column name 'DATA_FILE_NAME'.
Invalid column name 'DATA_FILE_NAME'.

CODE

 IF OBJECT_ID('dbo.TEST_V', 'V') IS NOT NULL
DROP VIEW [dbo].[TEST_V]
GO

 -- Create new view
CREATE VIEW [dbo].[TEST_V]
AS

-- Add dynamic T-SQL to temporary table 

SELECT 

-- Auto increment number
ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS ROW_ID,

-- Data file name
DATA_FILE_NAME AS DATA_FILE,

-- Drop external table if it exists
'IF EXISTS (SELECT * FROM sys.external_tables WHERE NAME = ' + 
CHAR(39) + SUBSTRING(DATA_FILE_NAME, 1, LEN(DATA_FILE_NAME) - 11) + CHAR(39) + ') ' + 
'DROP EXTERNAL TABLE [dbo].[' + SUBSTRING(DATA_FILE_NAME, 1, LEN(DATA_FILE_NAME) - 11) + '];' AS DROP_STMT,

-- Create new external table
'CREATE EXTERNAL TABLE [dbo].[' + SUBSTRING(DATA_FILE_NAME, 1, LEN(DATA_FILE_NAME) - 11) + '] ' + 
'( ' + 
'fname [varchar] (70), ' +
'lname [varchar] (70) ' +
   ') ' + 
'WITH ' +
'( ' +
'LOCATION=' + CHAR(39) + '/NEW/' + DATA_FILE_NAME + CHAR(39) + ' ' +
', DATA_SOURCE = SIMPLE' +
', FILE_FORMAT = SIMPLE_HEADER' +
', REJECT_TYPE = VALUE ' + 
', REJECT_VALUE = 1 ' +
') ' AS CREATE_STMT,

-- Move data into staging table
'INSERT INTO dbo.fname ' + 
'SELECT ' +
'  fname ,' +
'  lname ' +
'FROM [dbo].[' + SUBSTRING(DATA_FILE_NAME, 1, LEN(DATA_FILE_NAME) - 11) + '] ' +
';' AS INSERT_STMT

FROM 
[dbo].[test];
GO

I am trying to create procedure using the above view to load data using multiple files dynamically.

Jo Joseph
  • 59
  • 10
  • So the table `dbo.test` does not have a column called ... `DATA_FILE_NAME`. – wBob Feb 05 '19 at 22:34
  • dbo.test contains only fname and lname columns. – Jo Joseph Feb 06 '19 at 00:25
  • Have you considered adding the column `DATA_FILE_NAME` to your table called `dbo.test`? Or changing all the references in your view which point to `DATA_FILE_NAME` to one of the columns in your table, say `fname`? – wBob Feb 06 '19 at 01:20
  • If I change this one DATA_FILE_NAME AS fname, then I am getting same error. – Jo Joseph Feb 06 '19 at 02:18

1 Answers1

2

Since they are all targeting the same table, you don't need to use this file-by-file method. Polybase will load a wildcard-like set of files. If you put all your data files of the same type in a folder, and set the FOLDER as the location, all the files in the folder will be loaded in parallel. You will find your loads hugely faster … at the moment you're processing files in series, using the method I suggest will process them in parallel as fast as readers are available.

Ron Dunn
  • 2,971
  • 20
  • 27
  • Thanks Ron for your comment but How to achieve incremental loads in PolyBase? – Jo Joseph Feb 06 '19 at 03:50
  • Load into a transient table, then insert into the target table with a WHERE NOT EXISTS clause. A slightly better approach may be to CTAS into a new table from a union of the old table, and a WHERE NOT EXISTS from the new table. – Ron Dunn Feb 06 '19 at 07:59
  • You mean to say, If I have a daily file a_02_05_19.txt and a_02_06_19.txt. You mean to say load both the files into a temporary table and select only today's data from external table and load into permanent table. – Jo Joseph Feb 06 '19 at 18:46
  • No, sorry if I wasn't clear. Put all your files into a folder "a". Set this as your external data source. One CTAS statement will load all files from this folder into a temp table. Then CTAS or INSERT the rows from the temp table into your permanent table, using a WHERE NOT EXISTS clause to ignore the rows already in that permanent table. – Ron Dunn Feb 06 '19 at 21:44
  • Thanks Ron for your response. Let's say if I have 100 files and each file contain 1 million record. If I loaded 99 files already and If I need to insert 100th one then do we need to load all the 100 files and remove the 99 file records from the temp table and load it into permanent table. In this case, How the performance will be? – Jo Joseph Feb 06 '19 at 22:01
  • This is turning into a more complex architectural discussion than I can handle in StackOverflow comments. Please email me … put a '.' between Ron and Dunn, and add @microsoft.com on the end. We can discuss this in detail. – Ron Dunn Feb 07 '19 at 01:45