0

So the other day I asked this question and got pointed to this question which got me part way through my issues.

My original data example was reduced and simplified to get my initial issue solved but in the process of finally being able to deserialize the xml into my classes I noticed that about 10 objects into the deserialization it would error out. After some trouble shooting I played with the data to find the actual and it appears that the <b></b> within the <p></p> is my issue. I have used a couple of converters online to help with my classes but none seem to properly resolve that issue.

There are a ton of XML tags prior to this in various hierarchies...

<description type="formattedtext">
    <p>Some stuff in here.</p>
    <p>Some other stuff was here.</p>
    <p><b>Title of Table</b></p>
    <table>
        <tr>
            <td><b>Size</b></td>
            <td><b>weight</b></td>
            <td><b>Length</b></td>
        </tr>
        <tr>
            <td>Tiny</td>
            <td>20</td>
            <td>18</td>
        </tr>
        <tr>
            <td>Small</td>
            <td>25</td>
            <td>16</td>
        </tr>
        <tr>
            <td>Medium</td>
            <td>40</td>
            <td>13</td>
        </tr>
        <tr>
            <td>Large</td>
            <td>50</td>
            <td>10</td>
        </tr>
        <tr>
            <td>Huge</td>
            <td>80</td>
            <td>10</td>
        </tr>
    </table>
    <p>Some extra description goes here.</p>
    <p>Some other extra stuff for describing things goes here.</p>
    <p><b>Additional notes. </b>Final stuff goes here.</p>
</description>

There are a lot of XML tags after this in various hierarchies...

I think the real issue is that last bit with the "additional notes" where the <b> tag is apart of the <p> but it also has text in it. The deserializer treats all the tags normally associated as HTML as XML so I have classes set up for the table.

All the rest of the classes work but the output for these from the online converters were as follows:

[XmlRoot(ElementName="p")]
public class P { 

    [XmlElement(ElementName="b")] 
    public string B { get; set; } 
}

[XmlRoot(ElementName="description")]
public class Description { 

    [XmlElement(ElementName="p")] 
    public List<string> P { get; set; } 

    [XmlAttribute(AttributeName="type")] 
    public string Type { get; set; } 

    [XmlText] 
    public string Text { get; set; } 
}

I would assume that I need to make the List<string> P into a List<P> P but not sure what other property would be necessary, I did try to add a Text like in the Description parent but no joy on the stick.

Also if there is a way to just ignore the structure under an element that could be helpful. As far as I am concerned I don't need to have the array of P under description in its own object it could be a string but I have not found a setting/decorator that would allow that and I assume there might be an issue with the deserializer on the tags underneath so unsure out to escape them to have that be ignored.

The overall purpose is that this data is consumed by an application and within that application I have the ability to add more functionality within the element manually through the UI which translates to additional elements in the XML storage. However, since there are hundreds of these from a half dozen files I am trying to load all the pertinent items and parse them then output into a new data file for consumption by the program with the UI.

ΩmegaMan
  • 29,542
  • 12
  • 100
  • 122
Slagmoth
  • 173
  • 1
  • 10
  • You really don't want to be using data binding approaches when handling XML mixed content. It's just a poor fit. But you haven't given a high-level description of what your requirements are, so can't really advise. – Michael Kay Apr 23 '21 at 14:00
  • @MichaelKay This xml snippet is part of a data file that a system works with and I am trying to take various collections out and parse the text in a couple of fields including the description to create new elements that evaluate to additional functionality in the framework when accessing that element in the collection. – Slagmoth Apr 23 '21 at 14:13
  • @MichaelKay I have updated the question with more detail. If there is a "better" way of doing that I am all ears... I was just toying with deserialization because my new job apparently uses more xml than any of my previous. So this doubles as practice as well. – Slagmoth Apr 23 '21 at 14:19
  • @jdweng No HTML calls in this at all... just XML as a database for the application and I am trying to manipulate the data outside of the UI. Not everyone can choose the data they get to work with so I can't simply change the XML provided to JSON then back when I am done. – Slagmoth Apr 23 '21 at 16:22
  • @jdweng I am aware of what HTML is, I do API programming normally at work. I have not dealt with XML data to this degree ever. The Description tag contains a set of HTML tags for display in the UI of the program that access the XML data. I know that this is not a normal well-formed XML but I have to work with what I have. – Slagmoth Apr 23 '21 at 18:13
  • @jdweng If you are indeed trying to be helpful you will have to explain to me how going from XML to JSON then back to XML is in any way in my best interests. – Slagmoth Apr 23 '21 at 21:47
  • @jdweng This is not HTML, but it is HTML-like, in that it uses mixed content. XML of course uses mixed content just as HTML does. The problem is that data binding technologies, which map XML to objects in a language such Java or C#, aren't well suited to handling mixed content, because the data structure is too flexible for strongly typed languages to deal with nicely. – Michael Kay Apr 23 '21 at 21:48
  • @Slagmoth I would be using XSLT for this job. But I've no idea whether your project has constraints that make this impossible, – Michael Kay Apr 23 '21 at 21:49
  • @MichaelKay I had considered XSLT but as I was able to start deserializing I thought I was making progress. I Tried looking through the docs on XDocument to see if there was a way to just ignore the contents of an element and just have it present as text since for my purposes I don't really care about the formatting part but was unsuccessful in find a solution that way. My original question had that as an answer posted so I guess I will have to try and get that to fit my needs then have another to get it back to what it needs to be when the UI reads it. – Slagmoth Apr 23 '21 at 21:53

0 Answers0