Parse using an XMLTextReader and saving the entire node

Question

I am using vb.net and I am pulling in an url xml file using the following code

    Dim PMIDList As String = "25241892,25451079"

    Dim sb As New StringBuilder
    Dim sw As New StringWriter(sb)
    Dim writer As JsonWriter = New JsonTextWriter(sw)

    Dim url As String = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=" + PMIDList + "&rettype=fasta&retmode=xml"
    Dim pmid As String = ""
    Dim pmcid As String = ""
    Dim nihmsid As String = ""



    Dim inStream As StreamReader
    Dim webRequest As WebRequest
    Dim webresponse As WebResponse
    webRequest = webRequest.Create(url)
    webresponse = webRequest.GetResponse()
    inStream = New StreamReader(webresponse.GetResponseStream())

    Dim response As String = inStream.ReadToEnd
    Dim pubXML As String = ""



    Using reader As XmlTextReader = New XmlTextReader(New StringReader(response))

        While reader.ReadToFollowing("PubmedArticle") 'Read till citation

I can pull the elements out that I want with reader.ReadToFollowing("ArticleIds") 'Go to First ArticlesId While reader.Read()

                If reader.Value = "pubmed" Then 'Get
                    reader.ReadToFollowing("Value")
                    pmid = reader.ReadInnerXml()
                End If

                If reader.Value = "pmc" Then
                    reader.ReadToFollowing("Value")
                    pmcid = reader.ReadInnerXml()
                End If

                If reader.Value = "mid" Then
                    reader.ReadToFollowing("Value")
                    nihmsid = reader.ReadInnerXml()
                End If
                If reader.Name = "History" Then Exit While 'Exit loop End of ArticleIds

            End While

but I also want to save the entire PubmedArticle node. I know that the XMLTextreader is forward reading only but is there a way that I can create another reader using the pubXML string below??

     pubXML = "<PubmedArticle>" + reader.ReadInnerXml() + "</PubmedArticle>"

I ended up with a hack

      Private Sub parseXMLPMID()
    Dim PMIDList As String = "25241892,25451079"

    Dim sb As New StringBuilder
    Dim sw As New StringWriter(sb)
    Dim writer As JsonWriter = New JsonTextWriter(sw)

    Dim url As String = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=" + PMIDList + "&rettype=fasta&retmode=xml"
    Dim pmid As String = ""
    Dim pmcid As String = ""
    Dim nihmsid As String = ""



    Dim inStream As StreamReader
    Dim webRequest As WebRequest
    Dim webresponse As WebResponse
    webRequest = webRequest.Create(url)
    webresponse = webRequest.GetResponse()
    inStream = New StreamReader(webresponse.GetResponseStream())

    Dim response As String = inStream.ReadToEnd
    Dim pubXML As String = ""
    Dim myEncoder As New System.Text.UTF8Encoding


    Using reader As XmlTextReader = New XmlTextReader(New StringReader(response))

        While reader.ReadToFollowing("PubmedArticle") 'Read till citation
            pubXML = reader.ReadOuterXml()
            Dim bytes As Byte() = myEncoder.GetBytes(pubXML)
            Dim ms As MemoryStream = New MemoryStream(bytes)
            Dim stream_reader As New StreamReader(ms)

            While stream_reader.Peek() >= 0
                Try
                    Dim line As String = stream_reader.ReadLine()
                    If line.Contains("<ArticleId IdType=""pubmed"">") Then
                        pmid = Strip_Line(line)
                    End If
                    If line.Contains("<ArticleId IdType=""pmc"">") Then
                        pmcid = Strip_Line(line)
                    End If
                    If line.Contains("<ArticleId IdType=""mid"">") Then
                        nihmsid = Strip_Line(line)
                    End If

                Catch ex As Exception

                End Try

            End While
            MessageBox.Show(pmid + " " + pmcid + " " + nihmsid + " " + pubXML)
        End While
    End Using



End Sub

The strip line just pulls out the inner text. I'd rather have clean code

Is the document you fetch that large that you can't use LINQ to XML or DOM to select and extract the parts you need? — Martin Honnen, Nov 29 '16 at 14:28
I've got hundreds of pulls to do with maybe as many as 200 PMID's in the url string. I have the code to do it with a DOM but now I'm working on speed. — Bill, Nov 29 '16 at 14:35
There is the `ReadNode` method https://msdn.microsoft.com/en-us/library/system.xml.xmldocument.readnode(v=vs.110).aspx to combine XmlReader with DOM, that might help if your task is to read through till you have identified the node you want and then want a DOM representation. LINQ to XML has a similar method. — Martin Honnen, Nov 29 '16 at 14:42
Thanks but don't to want to use a XmlDocument. I'd rather use a standard textreader and loop line by line if I have to — Bill, Nov 29 '16 at 14:47

Parse using an XMLTextReader and saving the entire node

0 Answers0