0

I am working to read in a XML file capture its elements do any transforming/calculations that need to be done for that particular file and then upload to SQL DB. right now I just need help on how I will parse and keep all the xml data so that I can put it where it goes. I keep confusing myself on how to do this with this format of XML. I can typically get the PList, Name, Desc values, but not any of the values, what i am looking for is something like this:

output like: Plist, Name, Desc, Code, Price. Where the Plist, Name, Desc will stay the same until the next is found and so on.

Sample XML

<?xml version="1.0" encoding="iso-8859-1" ?>
<CoatSchedules>
<Cot PList="02" Name="CDC" Desc="PAR/PAV/EX3/RCH/EXP">
<Color Code="ARC" Price="39.58"/>
<Color Code="BAR" Price="39.58" **Default="50.00"**/>
<Color Code="BEP" Price="58.54"/>
<Color Code="BEX" Price="51.54"/>
</Cot>
<Cot PList="0A" Name="E6C" Desc="PAR/PAV" **Deduct="2.00"**>
<Color Code="BPA" Price="24.00"/>
<Color Code="BPV" Price="24.00"/>
<Color Code="COT" Price="0.00"/>
<Color Code="PAR" Price="24.00"/>
<Color Code="PAV" Price="24.00"/>
<Color Code="UTP" Price="25.00"/>
<Color Code="UTV" Price="20.00"/>
<Color Code="UV" Price="6.72"/>
</Cot>
</CoatSchedules>

EDIT: Default and Deduct both added to show "unknown" items not to be expected but we want to take note that we saw it in the file I have tried XDocument, but i am thinking that this may need to use XmlReader since these files can be very large and i read that reader is better suited for that.

For Each element As XElement In xd.Root.Elements("Cot")
            Console.WriteLine("PList: {0}; Name: {1}; Desc:{2}; Code: {3}; Price: {4}", CStr(element.Element("PList").Value), CStr(element.Element("Name")), CStr(element.Element("Desc")), CStr(element.Element("Code")), CStr(element.Element("Price")))
            Dim PListValue = element.Attribute("PList").Value
            Dim NameValue = element.Attribute("Name").Value
            Dim DescriptionValue = element.Attribute("Desc").Value
            Dim ColorCodeValue
            Dim PriceValue

            Console.WriteLine("PList: {0}; Name: {1}; Desc:{2};", PListValue, NameValue, DescriptionValue)

            For Each child As XElement In element.Elements("Color")
                Dim ColorCodeValue = child.Attribute("Code").Value
                Dim PriceValue = child.Attribute("Price").Value
                Console.WriteLine(" Code: {0}; Price: {1}", ColorCodeValue, PriceValue)
            Next
        Next

how would i capture these elements/attributes values into just variables for now so that i can manipulate and/or send to SQL db as needed?

EDIT2: Here is a small sampling of the Large XML that i will need to read, from what i can tell i will need to use XMLReader and possibly even LINQ with it to get all the data out of this large file. an example for this would be much appreciated as well you guys are awesome!

 <?xml version="1.0" encoding="iso-8859-1" ?>
<Styles>
<Material PList="02" Code="B53">
<Style Name="ARRAY *" Fin="S" Sph="92.70" POW="012" PRS="001" BCV="002"
 COL="CLR" TNT="002" COT="R8C" EDG="602" FRM="001"/>
<Style Name="ARRAY 2 *" Fin="S" Sph="92.70" POW="012" PRS="001" BCV="002"
 COL="CLR" TNT="002" COT="R8C" EDG="602" FRM="001"/>
<Style Name="ARRAY 2 W *" Fin="S" Sph="92.70" POW="012" PRS="001" BCV="002"
 COL="CLR" TNT="002" COT="R8C" EDG="602" FRM="001"/>
<Style Name="ARRAY W *" Fin="S" Sph="92.70" POW="012" PRS="001" BCV="002"
 COL="CLR" TNT="002" COT="R8C" EDG="602" FRM="001"/>
</Material>
<Material PList="02" Code="B67">
<Style Name="ARRAY *" Fin="S" Sph="92.70" POW="013" PRS="001" BCV="002"
 COL="CLR" TNT="002" COT="R8C" EDG="604" FRM="001"/>
<Style Name="ARRAY 2 *" Fin="S" Sph="92.70" POW="013" PRS="001" BCV="002"
 COL="CLR" TNT="002" COT="R8C" EDG="604" FRM="001"/>
<Style Name="ARRAY 2 W *" Fin="S" Sph="92.70" POW="013" PRS="001" BCV="002"
 COL="CLR" TNT="002" COT="R8C" EDG="604" FRM="001"/>
<Style Name="ARRAY W *" Fin="S" Sph="92.70" POW="013" PRS="001" BCV="002"
 COL="CLR" TNT="002" COT="R8C" EDG="604" FRM="001"/>
</Material>
.
.
.
</Material>
</Styles>
Ledz3p
  • 17
  • 5
  • 1
    Huge xml files I usually use a combination of XmlReader and Xml Linq. See my c# sample here where it can easily be converted to VB.net : https://stackoverflow.com/questions/61607180/parse-big-xml-file-using-xmlreader – jdweng Apr 11 '23 at 20:53
  • @jdweng i have edited (EDIT2) a small sampling of the Large file that i would get, i checked out your answer and i do think it will work for me but i'm still having issues reading this file in a timely manner so if you could expound on this or use my sample as a reference to help me grasp how i would utilize xmlreader and linq. thanks! – Ledz3p Apr 13 '23 at 17:21
  • 1
    material is a XElement like in your code : For Each style As XElement In material.Elements("Style") – jdweng Apr 14 '23 at 02:30

3 Answers3

1

You could, should IMHO, use XElement

    Dim xe As XElement
    ' xe = XElement.Load("path here")
    ' for testing use literal
    xe = <CoatSchedules>
             <Cot PList="02" Name="CDC" Desc="PAR/PAV/EX3/RCH/EXP">
                 <Color Code="ARC" Price="39.58"/>
                 <Color Code="BAR" Price="39.58"/>
                 <Color Code="BEP" Price="58.54"/>
                 <Color Code="BEX" Price="51.54"/>
             </Cot>
             <Cot PList="0A" Name="E6C" Desc="PAR/PAV">
                 <Color Code="BPA" Price="24.00"/>
                 <Color Code="BPV" Price="24.00"/>
                 <Color Code="COT" Price="0.00"/>
                 <Color Code="PAR" Price="24.00"/>
                 <Color Code="PAV" Price="24.00"/>
                 <Color Code="UTP" Price="25.00"/>
                 <Color Code="UTV" Price="20.00"/>
                 <Color Code="UV" Price="6.72"/>
             </Cot>
         </CoatSchedules>

    Dim errs As String = VerifyXML(xe) ' <<<<<< One change  <<<<<<<

    For Each cot As XElement In xe.<Cot>
        Dim PListValue As String = cot.@PList
        Dim NameValue As String = cot.@Name
        Dim DescriptionValue As String = cot.@Desc
        For Each clr As XElement In cot.<Color>
            Dim ColorCodeValue As String = clr.@Code
            Dim PriceValue As String = clr.@Price
            Stop ' for debug
        Next
    Next

EDIT:

Verify XML. See one change to code above

Private Function VerifyXML(someXML As XElement) As String
    Dim sb As New System.Text.StringBuilder
    Dim attrs As List(Of XAttribute) = someXML.Attributes.ToList
    Select Case someXML.Name.LocalName
        Case "CoatSchedules"
        Case "Cot"
            ' PList="" Name="" Desc=""
            For Each attr As XAttribute In attrs
                Select Case attr.Name
                    Case "PList", "Name", "Desc"
                    Case Else
                        sb.AppendFormat("Cot unknown attribute {0}", attr.Name)
                        sb.AppendLine()
                End Select
            Next
        Case "Color"
            ' Code="" Price=""
            For Each attr As XAttribute In attrs
                Select Case attr.Name
                    Case "Code", "Price"
                    Case Else
                        sb.AppendFormat("Color unknown attribute {0}", attr.Name)
                        sb.AppendLine()
                End Select
            Next
        Case Else
            sb.AppendFormat("Unknown element {0}", someXML.Name.LocalName)
            sb.AppendLine()
    End Select

    For Each el As XElement In someXML.Elements
        sb.Append(VerifyXML(el))
    Next
    Return sb.ToString
End Function
dbasnett
  • 11,334
  • 2
  • 25
  • 33
  • this is great and basically what i am looking for, is there a way with this piece of code to find "unknown" elements and log those, i.e. dont know what the name of the element would be just that its not one of the Plist, Name, Desc, Color, Price elements and want to take note of it? – Ledz3p Apr 11 '23 at 20:30
  • and i guess 2nd questions about Xelement, one of the xmls that i will be getting can be upt to 200MB, will Xelement fall apart with such a big xml? thanks for your help – Ledz3p Apr 11 '23 at 20:32
  • @Ledz3p - Do you mean elements or attributes? 200MB will take a little bit to load but should be OK after that. – dbasnett Apr 12 '23 at 13:11
  • could be either i suppose so i would need to capture either if they are unknowns – Ledz3p Apr 12 '23 at 14:26
  • @Ledz3p - edit your original post to contain those things that are "unknown". – dbasnett Apr 12 '23 at 15:38
  • i added those to my edits – Ledz3p Apr 12 '23 at 19:33
  • 1
    @Ledz3p - added verification. Recursive method that returns a string of errors. – dbasnett Apr 13 '23 at 14:26
1

Try following

Imports System.Xml
Imports System.Xml.Linq

Module Module1
    Const FILENAME As String = "c:\temp\test.xml"
    Sub Main()
        Dim reader As XmlReader = XmlReader.Create(FILENAME)
        While (Not reader.EOF)
            If (reader.Name <> "Material") Then
                reader.ReadToFollowing("Material")
            End If
            If (Not reader.EOF) Then
                Dim material As XElement = XElement.ReadFrom(reader)
                Dim plist As String = material.Attribute("PList").Value
                Dim code As String = material.Attribute("Code").Value
                Console.WriteLine("Material = {0}, Code = {1}", plist, code)
            End If
        End While
        Console.ReadLine()
    End Sub

End Module
jdweng
  • 33,250
  • 2
  • 15
  • 20
  • i like this, it is short and sweet and to the point, but after getting the Material PList & Code values like you've shown above, how do i get each Style thats under that Material before going on to the next Material? Sorry still figuring out this XML thing – Ledz3p Apr 13 '23 at 20:32
0

Sorry I don't have code I can share.

My approach in the past has been to pre-parse XML documents into key-value pairs where the key is the XPATH address of the node, and the value is the contents of the node (if any). I think I used a recursive algorithm to build the XPATH from walking the System.Xml.XmlNode children.

This allows a simple linear parsing algorithm based on XPATH, and can handle unrecognized nodes.

SSS
  • 4,807
  • 1
  • 23
  • 44