1

I am facing a strange issue while using SSIS "Data Conversion component" to convert string to decimal datatype. I use SSIS 2016.

The source data input has values of mixed data types- string, integer, decimal and is defined as varchar in the flat file source. The target data type expected is numeric. When explicit type conversion happens from string to decimal, we expect the alphanumeric values to get rejected to error table and only the numeric values to pass through. Instead, we are seeing some alphanumeric values shedding the characters in the value and passing through successfully with no error.

Examples: Value "3,5" converted to 35 Value "11+" converted to 11

We do not have control over source data and will not be able to replace char data before passing data into Data conversion component. We have tried the below steps as a workaround and it has worked. i.e,

  1. First Data Conversion from DT_STR to DT_NUMERIC
  2. Capture error rows that fail the above conversion
  3. Second Data Conversion from DT_NUMERIC to DT_DECIMAL

But as the source data is not reliable, we may have to apply this workaround wherever there are numeric fields (int types & deicmals) which is not a friendly solution. So checking with you all to understand if there is an easier and better solution tried out by anyone.

Kavya
  • 21
  • 1
  • 4
  • 1
    11+ is how I've seen data stored in mainframe sources (postfix sign notation). 3,5 could be 3.5 if data was stored using say a German locale but it could also just be a thousands marker in the "wrong" place which apparently is the behavior the Data Conversion component is using. – billinkc Jul 19 '20 at 15:41
  • Can you help me understand your limitation `will not be able to replace char data before passing data into Data conversion component`? – billinkc Jul 19 '20 at 15:41
  • That doesn't sounds like SSIS behaviour, which is incredibly strict with data types and doesn't allow implicit conversion (from say a `DT_WSTR` to a `DT_STR`), and neither of those values convert to a `decimal` in T-SQL. YOu need to supply a [mre] here. – Thom A Jul 19 '20 at 16:02
  • 1
    @Larnu I just tested using an expression task, and I see the same behaviour as the OP (using standard regional settings for Australia, so nothing "unusual"). I expect all of the unexpected successful conversions are going to be like these... "arguably" valid numerics, in that they can coherently be interpreted as numeric. I think a "smarter" component might need to be manually configured prior to the conversion step, like a derived column transform, or a script task using regex. – allmhuran Jul 19 '20 at 16:08

2 Answers2

1

I did not expect this result, but I tried an expression task and it worked for DT_DECIMAL:

(DT_DECIMAL,1)"11+" -- evaluates to 11.0

But it does not work for DT_NUMERIC. SSIS won't allow a direct numeric result, but it can be nested inside a cast to DT_DECIMAL. Just to demonstrate that, in an expression task even this "numerically valid" cast would not be permitted, because the output simply can't be of type DT_NUMERIC:

(DT_NUMERIC, 3, 0)123

But this is permitted:

(DT_DECIMAL,0)((DT_NUMERIC, 3, 0)123)

So as long as you are happy to specify a precision and scale big enough to hold your data during the "validity" check done by DT_NUMERIC, and then cast it from there to DT_DECIMAL, all in a derived column transform, then DT_NUMERIC seems to enforce the strict semantics you want.

SSIS allows this:

(DT_DECIMAL,0)((DT_NUMERIC, 2, 0)"11") 

But not either of these:

(DT_DECIMAL,0)((DT_NUMERIC, 2, 0)"11+") 

(DT_DECIMAL,0)((DT_NUMERIC, 2, 0)"3,5") 
allmhuran
  • 4,154
  • 1
  • 8
  • 27
  • Thanks @allmhuran. I should have mentioned in my post earlier that we have tried this as a workaround and it has worked.(I will edit my post to add this info) i.e, 1. First Data Conversion from DT_STR to DT_NUMERIC 2. Capture error rows that fail the above conversion 3. Second Data Conversion from DT_NUMERIC to DT_DECIMAL But as the source data is not reliable, we may have to apply this workaround wherever there are numeric fields (int types & decimals) which is not a friendly solution. So checking with you all to understand if there is an easier and better solution tried out by anyone. – Kavya Jul 20 '20 at 06:24
  • @Kavya You should be able to do both conversions chained together as I have, no? That is to say, you don't need two separate conversion tasks. Just configure the task doing both conversions at once (as per my example) with an error rows path. You needed at least one conversion task anyway, right? So it's just a slightly longer expression in the task. – allmhuran Jul 20 '20 at 06:28
  • Are you referring to derived column task? We are doing this operation using the Data Conversion component available in SSIS. And in the component, we can configure only 1 datatype conversion at a time, as far as I am aware. – Kavya Jul 20 '20 at 06:49
  • @Kavya Yes, sorry, I do mean a derived column transform, I mentioned that in my original answer but then reverted to saying "conversion". – allmhuran Jul 20 '20 at 06:58
  • Thanks for your response. But unfortunately, we are required to make this work using the SSIS Data Conversion component. Appreciate if you or anyone can share any solution that can work for us. – Kavya Jul 20 '20 at 07:08
  • 1
    If your only option is to use the data conversion component - you can't change the source query, and you can't use *any* other kind of task - not even, say, a script task - then I don't think that leaves any room for anyone to suggest anything other than the workaround you are already using. – allmhuran Jul 20 '20 at 07:12
  • @Kavya which gets back to my [comment/request for clarification](https://stackoverflow.com/questions/62982300/data-conversion-text-to-numeric-in-ssis-is-removing-characters#comment111376419_62982300) on where these restrictions are coming from. There are no knobs, buttons or switches to flip on the Data Conversion component. It does what it does and if it doesn't yield the desired results, it's the wrong tool for the job. I assume a Lead or Architect is telling you that you have to make it work as is and that's an untenable position. – billinkc Jul 20 '20 at 13:27
0

@billinkc Sorry for not responding to you earlier. We are working under some restrictions: (1) All we want to do is capture datatype issues in input data, so we wanted to harness the capability of SSIS Data Conversion Component in SSIS. (2) DBA doesn't want us to use SQL for type conversions, so we are required to do these conversions between flat file source and flat file destination using SSIS. (3) We are required to capture the type conversion errors at every step of conversion into an error output file with error column name and error description, to be used later. So we cannot remove char data in the field before passing it to Data Conversion component.

@allmhuran - We have used Derived column task before Data Conversion component to replace unnecessary characters in one of the other fields, but using the same for type conversion makes achieving (3) difficult. Because error output from Derived column task and Data Conversion component cannot be redirected to the same error output file.

We can completely ignore Data Conversion component and use only Derived column task to do all type conversions, whether single or nested. I am trying this and the error descriptions do not always look good, but the cons of the former method can be overcome. I will try this out!

Kavya
  • 21
  • 1
  • 4