First of all, I am sorry if this is a repeated question. I tried for several hours already and I see different solutions for PHP or other languages but not for R.
I am retrieving data from the last.fm website using their API. You do need an API key to retrieve the data I am trying to get but I will make it simpler here and hopefully you can answer my question.
Here is my problem: At certain point, when retrieving the data, I encounter an error which stops my request. I skipped it once but it comes back again and again. I always get the same: PCDATA invalid Char value #
Here is an example:
string = "<?xml version=\"1.0\" encoding=\"utf-8\"?>\n<lfm status=\"ok\">\n<results for=\"a\" xmlns:opensearch=\"http://a9.com/-/spec/opensearch/1.1/\">\n<opensearch:Query role=\"request\" searchTerms=\"a\" startPage=\"1382\" />\n<opensearch:totalResults>212588</opensearch:totalResults>\n<opensearch:startIndex>1381</opensearch:startIndex>\n<opensearch:itemsPerPage>1</opensearch:itemsPerPage><artistmatches>\n<artist>\n <name>!B0A \0348E09;>2</name>\n <listeners>1672</listeners>\n <mbid></mbid>\n <url>http://www.last.fm/music/!B0A+%1C8E09;%3E2</url>\n <streamable>0</streamable>\n <image size=\"small\">http://userserve-ak.last.fm/serve/34/88015017.png</image>\n <image size=\"medium\">http://userserve-ak.last.fm/serve/64/88015017.png</image>\n <image size=\"large\">http://userserve-ak.last.fm/serve/126/88015017.png</image>\n <image size=\"extralarge\">http://userserve-ak.last.fm/serve/252/88015017.png</image>\n <image size=\"mega\">http://userserve-ak.last.fm/serve/_/88015017/B0A+8E092+15286997.png</image>\n </artist></artistmatches>\n</results></lfm>\n"
When I try to parse this text I get the error:
doc = xmlParse(string, asText = TRUE)
PCDATA invalid Char value 28
Error: 1: PCDATA invalid Char value 28
I believe the part that is making this happen comes from this part of the string:
<name>!B0A \0348E09;>2</name>\n
But I can't be sure now.
What I am looking for is one of these solutions, being the first one the ideally situation but any of the others will make me happy:
1 - Allow R to receive these invalid characters
2 - Eliminate the invalid characters and continue with the parse without stopping.
3 - Skip the string with the invalid characters and continue with the parse
4 - Create a function to find the invalid characters so I can include that when retrieving the data from last.fm
I hope you can understand the question and help me with it. Thanks in advance