0

Is "Derived Column" step in Data Flow of Azure Data Factory able to dynamically detect empty value and replace with NULL value while looping through several files with different numbers of columns?

What I am trying to do is to copy 10 csv files to 10 tables straight without any manipulation but only replace empty value with NULL. Each of these 10 files contain 5-25 columns. I know I can build 10 data flow to achieve this but was wondering if there is a smarter way to LOOKUP the files list and FOREACH file dynamically replace empty values?

Update:

Thanks to Joseph's answer. It seems data flow automatically treats empty value as NULL. But I then encountered another parsing problem that date / time / datetime columns in Sink receive NULL value.

Source: The positions of date / time / datetime columns are not fixed in each file, neither with the same column name. Since the source is in CSV format, all columns are treated as string type and cannot use "type" for pattern matching: https://i.stack.imgur.com/X1C7z.png

Sink: From the example above, all the highlighted columns are transferred as NULL in Sink: https://i.stack.imgur.com/PecRb.png

Ben S
  • 159
  • 3
  • 3
  • 10

1 Answers1

1

Yes, we can. In my test, I copied 2 csv files to 2 tables by one data flow.

  1. I created an Array variable named Objects. It's format is like this [{"source":"nullValue1.csv","sink":"TestNull"},{"source":"nullValue2.csv","sink":"TestNull2"}]. In your case, you can add 10 Objects to the array.

enter image description here

  1. Then we can foreach the Objects array:

enter image description here

  1. In the ForEach activity, we can define a dataflow.

enter image description here

  1. We can enter dynamic content @item().source to specify the filename dynamically.

enter image description here

  1. Here comes the point, in the DerivedColumn, we can add a column pattern and use iifNull($$,toString(null())) to detect empty value in each column and replace with NULL value.

enter image description here

  1. In the sink, we can enter dynamic content @item().sink to specify the tablename in SQL.

enter image description here



Update

My origin date format is like this: 11/04/2020 1:17:40.

According to this documnet, we can use name option to specify the column which need to be date format convert.

enter image description here

Here we can use the expression toString(toTimestamp(toString($$),'MM/dd/yyyy HH:mm:ss'),'yyyy-MM-dd HH:mm:SS') to convert the date format.

enter image description here

The result shows, we can see the Column4: enter image description here

Part2
We should replace the / with -.

First pattern 1==1&&name!='ClockingDate'&&name!='CreationDate'&&name!='ClockingTime'
Others are like follows: enter image description here

The data preview like this:

enter image description here

Joseph Xu
  • 5,607
  • 2
  • 5
  • 15
  • Thank you so much Joseph. I followed your suggestion and now hit another error - when I did this conversion, the datetime type got converted to NULL value regardless. Would you have any clue? – Ben S Oct 26 '20 at 07:23
  • 1
    It maybe your date format that ADF can’t recognize, you can reference this [question](https://stackoverflow.com/questions/63625287/azure-data-factory-data-flow-silently-nulling-date-column/63626648#63626648) to convert the date format. – Joseph Xu Oct 26 '20 at 07:40
  • Thanks Joseph, so the same question would come back to these date fields...how can we dynamically convert all date / time fields with different numbers of columns? – Ben S Oct 26 '20 at 11:18
  • 1
    Could you please show me a sample data about the datetime type? – Joseph Xu Oct 27 '20 at 02:18
  • 1
    Is the order position of the date column fixed? – Joseph Xu Oct 27 '20 at 02:56
  • 1
    Hi @Ben S, please see my updated part of my answer. – Joseph Xu Oct 27 '20 at 05:45
  • Hi Joseph, I have edited my question with screenshots of the source and sink. The order position of data columns are not fixed in each file. And also FYI, I noticed that Data Flow will treat empty value as NULL by default, unlike "Copy Data". So now my problems are more around date / time / datetime conversion. – Ben S Oct 27 '20 at 11:59
  • 1
    Hi @Ben S, please see my updated part2 of my answer. – Joseph Xu Oct 28 '20 at 05:48
  • Thank you so much Joseph. But it defeats the purpose of Foreach as each file has different header names. And not every "/" should be replaced with "-", for instance "/" in addresses; so even if there is a way to detect if a value contains "/", we cannot just simply replace it. – Ben S Oct 28 '20 at 10:52
  • 1
    The `name == 'string'` is used to specify the date(time,datetime) type column which will be replaced. Other columns will not be replaced. – Joseph Xu Oct 28 '20 at 12:42
  • 1
    Because we pass the files dynamically, the ADF cannot detect the type of the columns, every columns are string type. We must use `name` to force conversion date column type – Joseph Xu Oct 29 '20 at 01:17
  • Hi Joseph, I found a solution can work around the slash / problem in a datetime format - ticking the checkbox "Infer drifted column types" in source and data flow will be smart enough to convert the format properly. However, the ClockingTime still turn out as NULL regardless how I concat it. I have tried ":00.00" to "00.0000000", none works. Any clue? – Ben S Oct 30 '20 at 02:16
  • 1
    Hi @Ben S , I used to stuck in it for several hours but not solved.It always be null until I convert it to 'yyyy-MM-dd HH:mm:SS'. So I think `Time` type is not supported in ADF. We need to convert it to Timestamp or Datetime then sink into Azure SQL. – Joseph Xu Oct 30 '20 at 02:45
  • cool. I've worked around it by derive it to concat('1900-01-01 ', $$, ':00') – Ben S Oct 30 '20 at 04:10
  • BTW, I used "instr(name,"Time")==0" to match and pick up any column with a header that contains "Time" for deriving – Ben S Oct 30 '20 at 04:11
  • All roads lead to Rome. Nice, we togther solved the problems. Since I spent a lot of time on your two questions . If my answer is helpful for you, please accept(mark) it as answer. This can be beneficial to other community members. Very thanks! – Joseph Xu Oct 30 '20 at 05:38