0

I have an XML file that has several records. I have an implemented process that takes that file and inserts it into a table. The problem is that there is an xml record in the Description column that is almost 20000 characters long. What I wanted to do was to have a process that would open the file, look for the string longer than 255 characters between the and strings and make a left of the string for 255. And finally I saved the file with that change made.

Exemple:

I have in the file:

<Row>
    <Attributes>
        <Column>
            <Name>Column1</Name>
            <Value>111111</Value>
        </Column>
        <Column>
            <Name>Column2</Name>
            <Value>AAAAAA</Value>
        </Column>
        <Column>
            <Name>Column3</Name>
            <Value>XXXX</Value>
        </Column>
        <Column>
            <Name>Column4</Name>
            <Value>XXXXXX</Value>
        </Column>
        <Column>
            <Name>Column5</Name>
            <Value>FGGGGG</Value>
        </Column>
        <Column>
            <Name>Description</Name>
            <Value>PING10,10,10,10TESTEREMOTE,PING20,20,20,20TESTEREMOTE,PING30,30,30,30TESTEREMOTEPING10,10,10,10TESTEREMOTE,PING20,20,20,20TESTEREMOTE,PINGPING10,10,10,10TESTEREMOTE,PING20,20,20,20TESTEREMOTE,PING30,30,30,30TESTEREMOTE30,30,30,30TESTEREMOTE</Value>
        </Column>
        <Column>
            <Name>COLUMN6</Name>
            <Value>XX</Value>
        </Column>
    </Attributes>
</Row>

I want:

Find big string between < Value > and < /Value > and left string to 250 characters.

< Row >< Attributes >< Column >< Name >Column1< /Name >< Value >111111< /Value >< /Column >< Column >< Name >Column2< /Name >< Value >AAAAAA< /Value >< /Column >< Column >< Name >Column3< /Name >< Value >XXXX< /Value >< /Column >< Column >< Name >Column4< /Name >< Value >XXXXXX< /Value >< /Column >< Column >< Name >Column5< /Name >< Value >FGGGGG< /Value >< /Column >< Column >< Name >Description< /Name >< Value >PING 10,10,10,10 TESTE REMOTE, PING 20,20,20,20 TESTE REMOTE, PING 30,30,30,30 TESTE REMOTE PING 10,10,10,10 TESTE REMOTE, PING 20,20,20,20 TESTE REMOTE, PING PING 10,10,10,10 TESTE REMOTE, PING 20,20,20,20 TESTE REMOTE, PING 30,30,30,30 TESTE REMOTE< /Value >< /Column >< Column >< Name >COLUMN6< /Name >< Value >XX< /Value >< /Column >< /Attributes >< /Row >

Thank You

KeithL
  • 5,348
  • 3
  • 19
  • 25
  • I guess you can simply do this in Derived column transformation. Use `LEFT` of column `Value` – Arun Palanisamy Dec 11 '20 at 12:58
  • But this has to be done inside the file, because if I make an extract flat file, it will give an error because only a column with 8000 characters is allowed and in my situation it has 22000. – user3303981 Dec 11 '20 at 13:41
  • 1
    Do you mean you are using `Flat file source`? If so, you can directly use `XML source` where you don't have such restrictions – Arun Palanisamy Dec 11 '20 at 14:08
  • Do you actually have that whitespace in your tags? – KeithL Dec 11 '20 at 14:29
  • There is not whitespace. I put it to get the tags. – user3303981 Dec 11 '20 at 14:43
  • please answer the question of Arun Palanisamy. Let's say you are using XML Source, you might worry about the warning message "No maximum length was specified for the XML-source .... The SSIS Data Flow Task data type "DT_WSTR" with a length of 255 will be used"? you can resolve it by modifying the XSD-File (adjust the maximum length of your Description column.) Then apply the suggested solution of derived column transformation. – Mathias Fogel Dec 11 '20 at 16:01

1 Answers1

0

I always defer to C# when dealing with XML. So here is that solution:

Create script component source: set your two output columns name/value both string length 50

Enter the code:

add the following namespaces:

using System.Xml.Serialization;
using System.Collections.Generic;

create a class object to let the XML serialize into

[XmlRoot(ElementName = "Column")]
public class Column
{
    [XmlElement(ElementName = "Name")]
    public string Name { get; set; }
    [XmlElement(ElementName = "Value")]
    public string Value { get; set; }
}

[XmlRoot(ElementName = "Attributes")]
public class Attributes
{
    [XmlElement(ElementName = "Column")]
    public List<Column> Column { get; set; }
}

[XmlRoot(ElementName = "Row")]
public class Row
{
    [XmlElement(ElementName = "Attributes")]
    public List<Attributes> Attributes { get; set; }
}

And in create new ouput rows add this:

    string xml = @"<Row>
                            <Attributes>
                                <Column>
                                    <Name>Column1</Name>
                                    <Value>111111</Value>
                                </Column>
                                <Column>
                                    <Name>Column2</Name>
                                    <Value>AAAAAA</Value>
                                </Column>
                                <Column>
                                    <Name>Column3</Name>
                                    <Value>XXXX</Value>
                                </Column>
                                <Column>
                                    <Name>Column4</Name>
                                    <Value>XXXXXX</Value>
                                </Column>
                                <Column>
                                    <Name>Column5</Name>
                                    <Value>FGGGGG</Value>
                                </Column>
                                <Column>
                                    <Name>Description</Name>
                                    <Value>PING10,10,10,10TESTEREMOTE,PING20,20,20,20TESTEREMOTE,PING30,30,30,30TESTEREMOTEPING10,10,10,10TESTEREMOTE,PING20,20,20,20TESTEREMOTE,PINGPING10,10,10,10TESTEREMOTE,PING20,20,20,20TESTEREMOTE,PING30,30,30,30TESTEREMOTE30,30,30,30TESTEREMOTE</Value>
                                </Column>
                                <Column>
                                    <Name>COLUMN6</Name>
                                    <Value>XX</Value>
                                </Column>
                            </Attributes>
                        </Row>";

    System.IO.MemoryStream stream = new System.IO.MemoryStream(System.Text.Encoding.UTF8.GetBytes(xml));

    XmlSerializer serializer = new XmlSerializer(typeof(Row));
    var x = (Row)serializer.Deserialize(stream);

    foreach (var att in x.Attributes)
        foreach (var col in att.Column)
        {
            Output0Buffer.AddRow();
            Output0Buffer.name = col.Name;
            Output0Buffer.value = col.Value.Substring(0, Math.Min(30, col.Value.Length));
        }

Notes:

  1. I chose a length of 30 you can change that to whatever you want
  2. You can easily load a file into memorystream instead of converting a string

Final output (based on 30)

enter image description here

KeithL
  • 5,348
  • 3
  • 19
  • 25