5

We use AWS DMS to dump SQL Server DBs into S3 as parquet files. Idea is to run some analytics with spark over parquets. When a full load is complete then it's not possible to read parquets since they have UINT fields in the schema. Spark declines to read them with Parquet type not supported: INT32 (UINT_8). We use transformation rules to overwrite data type of UINT columns. But it looks like they are not picked up by DMS engine. Why?

There are number of rules like "convert unit to int" see below (mind UINT1 is 1 byte unsigned DMS DataTypes):

{
  "rule-type": "transformation",
  "rule-id": "7",
  "rule-name": "uintToInt",
  "rule-action": "change-data-type",
  "rule-target": "column",
  "object-locator": {
    "schema-name": "%",
    "table-name": "%",
    "column-name": "%",
    "data-type": "uint1"
  },
  "data-type": {
    "type": "int4"
  }
}

S3 DataFormat=parquet;ParquetVersion=parquet_2_0 and DMS Engine version is 3.3.2

However still getting parquet schemas with uint. See below:

id: int32
name: string
value: string
status: uint8

Attempt to read such parquet using spark gives me

org.apache.spark.sql.AnalysisException: Parquet type not supported: INT32 (UINT_8);
    at org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.typeNotSupported$1(ParquetSchemaConverter.scala:100)
    at org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.convertPrimitiveField(ParquetSchemaConverter.scala:136)

Why the DMS transformation rule is not triggered?

Anton
  • 1,432
  • 13
  • 17
  • Is it limitation I hit? `Changes to the source table structure during full load are not supported` https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Target.S3.html – Anton May 26 '20 at 10:31
  • Did you figure out how to make this work? Thank you. – jcunhafonte Jun 18 '20 at 15:53
  • @Anton are you still facing this issue? I used the solution proposed below and it worked (I applied the changed to all schemas-%): { "rule-type": "transformation", "rule-id": "2", "rule-name": "uint1-to-int4", "rule-action": "change-data-type", "rule-target": "column", "object-locator": { "schema-name": "%", "table-name": "%", "column-name": "%", "data-type": "uint1" }, "data-type": { "type": "int1" } } – dp6000 Mar 14 '22 at 16:43
  • @dp6000, I didn't try it since DMS 3.3.2, we ditch DMS in favor of other solution – Anton Mar 15 '22 at 12:50

3 Answers3

3

Transforming the data directly from UINT to INT on DMS fixes this issue. Your mapping rules should look like:

{
"rules": [
    ...
    {
        "rule-type": "transformation",
        "rule-id": "2",
        "rule-name": "unit1-to-int1",
        "rule-action": "change-data-type",
        "rule-target": "column",
        "object-locator": {
            "schema-name": "schema",
            "table-name": "%",
            "column-name": "%",
            "data-type": "uint1"
        },
        "data-type": {
            "type": "int1"
        }
    },
    {
        "rule-type": "transformation",
        "rule-id": "3",
        "rule-name": "unit2-to-int2",
        "rule-action": "change-data-type",
        "rule-target": "column",
        "object-locator": {
            "schema-name": "schema",
            "table-name": "%",
            "column-name": "%",
            "data-type": "uint2"
        },
        "data-type": {
            "type": "int2"
        }
    },
    {
        "rule-type": "transformation",
        "rule-id": "4",
        "rule-name": "unit4-to-int4",
        "rule-action": "change-data-type",
        "rule-target": "column",
        "object-locator": {
            "schema-name": "schema",
            "table-name": "%",
            "column-name": "%",
            "data-type": "uint4"
        },
        "data-type": {
            "type": "int4"
        }
    },
    {
        "rule-type": "transformation",
        "rule-id": "5",
        "rule-name": "unit8-to-int8",
        "rule-action": "change-data-type",
        "rule-target": "column",
        "object-locator": {
            "schema-name": "schema",
            "table-name": "%",
            "column-name": "%",
            "data-type": "uint8"
        },
        "data-type": {
            "type": "int8"
        }
    }
]}

Documentation: https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Tasks.CustomizingTasks.TableMapping.html#CHAP_Tasks.CustomizingTasks.TableMapping.SelectionTransformation.Transformations

jcunhafonte
  • 429
  • 7
  • 17
  • Can you share the table schemas you are trying to migrate and your mapping rules? – jcunhafonte Jun 23 '20 at 21:12
  • 1
    I have columns of smallint type in SQL Server DB. I tried all kind of combinations to convert intX to uintX (please check example in the question) but it looks like it ignores it. As I mentioned about could be I hit this limitation https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Target.S3.html `Changes to the source table structure during full load are not supported`. BTW. you example converts uint to int, doesnt it? I need other way around. – Anton Jun 29 '20 at 11:21
0

The only way I was able to get the transformation working on the parquet files was by specifying the exact column to transform. So for example:

{
   "rules": [
   ...
   {
    "rule-type": "transformation",
    "rule-id": "2",
    "rule-name": "unit1-to-int1",
    "rule-action": "change-data-type",
    "rule-target": "column",
    "object-locator": {
        "schema-name": "acessa",
        "table-name": "<table_name>",
        "column-name": "<column_name>"
    },
    "data-type": {
        "type": "int1"
    }
   }
  ]
}

Using the wildcard % for the column name in the object locator just wasn't working

ashah27
  • 19
  • 3
0

If you need to select specific columns and convert all UINTS to INTS there is no way around specifically naming columns you want to convert. If you do a wildcard data-type conversion it will include override you column selections and include all columns.

hoosier
  • 1
  • 1