0

I am trying to generate a text file for each XML file from a third party system using SSIS package script task and XSLT. Sometimes few special characters appear in these XML files. One such character is the right single quotation mark . Because of this the script task fails with message Invalid Character in the given encoding. XML looks like below:

<firstelement>
    <Id>1112</Id>
    <State>AP</State>
    <City>E’Godavari</City>
</firstelement>

XSLT is

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" >
    <xsl:output method="text" encoding="iso-8859-1"/>
    <xsl:template match="/firstelement">
        <xsl:text>|@|Id|@|State|@|City|@|</xsl:text>
        <xsl:text>&#xd;&#xa;</xsl:text>
        |@|<xsl:value-of select="Id"/>|@|<xsl:value-of select="State"/>|@|<xsl:value-of select="City"/>|@|
    </xsl:template>
</xsl:stylesheet>

The Script task code is below:

    var xmlPath = @"C:\Cities.xml";
    var xsltPath = @"C:\Cities.xslt";
    var outputTxt = @"C:\Cities.txt";
    var xslCompiledTransformObj = new XslCompiledTransform();
    xslCompiledTransformObj.Load(xsltPath);
    xslCompiledTransformObj.Transform(xmlPath, outputTxt);

I have tried changing the encoding to UTF-8 in XSLT file's xsl:output tag but i was still getting same error.

ninja_md
  • 165
  • 3
  • 15
  • So does the input XML declare its encoding? Which encoding does it declare? – Martin Honnen Mar 14 '22 at 17:29
  • SSIS has a built-in XML Task, operation XSLT transformation. – miriamka Mar 14 '22 at 20:19
  • @MartinHonnen input XML encoding looks to be ANSI – ninja_md Mar 15 '22 at 03:52
  • @miriamka I tried with XML task in SSIS, but it is also throwing same error message. – ninja_md Mar 15 '22 at 04:29
  • Does the XML input have an XML declaration in the form `` or not and if so, which exact encoding name is given there? – Martin Honnen Mar 15 '22 at 08:38
  • No such declaration is there in the input XML. I opened it with Notepad.. and did Save As.. the default encoding that was shown is ANSI – ninja_md Mar 15 '22 at 09:07
  • And which encoding or code page is ANSI exactly for you or on your localized OS? Anyway, the `Transform` method, when you pass in a string as the first argument, feeds that input file to an XML parser and an XML parser without an XML declaration defaults to UTF-8 (or perhaps, with some BOM, to UTF-16), but never to a ANSI (8bit?) code page. So you either need to fix those XML files to start with an XML declaration stating the used encoding in the `encoding` pseudo attribtue of the XML declaration or you need to pull in the files with .NET code, reading them with a TextReader for your code page – Martin Honnen Mar 15 '22 at 10:48
  • I added the XML declaration to the input file `````` and tried running the task.. it still fails with same error. Also i tried iso-8859-1 and ANSI, still fails with same error. – ninja_md Mar 15 '22 at 11:59
  • Well, we can't guess the original encoding and tell you which one to put into the `encoding` of the XML declaration so ask the author or look at the code producing the input to find out which encoding was used when creating the input. – Martin Honnen Mar 16 '22 at 16:14
  • 1
    ANSI is usually a Windows code page to be declared like `Windows-1252` (but the number can be different, it depends on the localization of the OS or the region settings, I think). – Martin Honnen Mar 16 '22 at 16:16
  • Of course, if editing the input with an editor works correctly, check whether the editor shows which encoding it used to decode. And if you hand edit the inputs anyway I would do me a favour and safe them all as UTF-8 back before feeding them to an XML parser/XSLT. – Martin Honnen Mar 16 '22 at 16:19

2 Answers2

0

Please try the following c#. It guaranties that XSLT's OutputSettings are taken into account.

c#

void Main()
{
   const string SOURCEXMLFILE = @"e:\Temp\input.xml";
   const string XSLTFILE = @"e:\Temp\process.xslt";
   const string OUTPUTFILE = @"e:\temp\output.txt";
   bool paramXSLT = true;

   try
   {
      XsltArgumentList xslArg = new XsltArgumentList();

      using (XmlReader src = XmlReader.Create(SOURCEXMLFILE))
      {
         XslCompiledTransform xslt = new XslCompiledTransform();
         xslt.Load(XSLTFILE, new XsltSettings(true, true), new XmlUrlResolver());

         XmlWriterSettings settings = xslt.OutputSettings.Clone();

         using (XmlWriter result = XmlWriter.Create(OUTPUTFILE, settings))
         {
            xslt.Transform(src, xslArg, result, new XmlUrlResolver());
            result.Close();
         }
      }
      Console.WriteLine("File '{0}' has been generated.", OUTPUTFILE);
   }
   catch (Exception ex)
   {
      Console.WriteLine(ex.Message);
   }
}
miriamka
  • 459
  • 5
  • 9
  • I used this in my SSIS's Script Task but it is still throwing same error *Invalid character in the given encoding* – ninja_md Mar 16 '22 at 08:47
0

The below code has worked for me. I used ANSI 1252 encoding and then Regex to find and replace the special character.

            var xsltPath = @"C:\Cities.xslt";
            var outputTxt = @"C:\Cities.txt";
            var xslCompiledTransformObj = new XslCompiledTransform();
            

            StreamReader streamReader = new StreamReader(xmlPath, Encoding.GetEncoding(1252));
            string content = streamReader.ReadToEnd();
            streamReader.Close();
            string xmlContent = Regex.Replace(content, "’", "'", RegexOptions.Compiled);
            XmlDocument newXml = new XmlDocument();
            newXml.LoadXml(xmlContent);
            newXml.PreserveWhitespace = true;
            newXml.Save(xmlPath);

            xslCompiledTransformObj.Load(xsltPath);
            xslCompiledTransformObj.Transform(xmlPath, outputTxt);
ninja_md
  • 165
  • 3
  • 15