0

Here's my problem:

function parse_xml_to_json($url) {
    $fileContents = file_get_contents($url);
    $simpleXml = simplexml_load_string($fileContents, null
    , LIBXML_NOCDATA);
    $json = json_encode($simpleXml);
    return $json;
}
$jsonto = parse_xml_to_json('myxmlfile.html');
echo $jsonto;

Essentially I need to use an XML file from an external source and loop it through to display nicely some data.

I created a function that gets content from the external URL (file_get_contents), then I turn the string of XML into an object (I use LIBXML_NOCDATA as a parameter because it contains ), right after I turn the object into a JSON file and for the very last step, I echo the result.

So far so good, it worked but I'm wondering if I can do anything if the XML file contains a malicious script or else.

Is the function simplexml_load_string and then the JSON encode enough to prevent a malicious script or an invalid XML?

  • 1
    What sort of injection are you referring to? There is nothing in XML or JSON which means you can delete data using any type of injection. It's possible to create invalid structures, but that is a different thing. – Nigel Ren Feb 23 '21 at 11:45
  • @NigelRen yes, even to spot an invalid structure. I'm receiving this xml file from an external server.. I'm wondering what could happen if they will hack – Italianspiderman80 Feb 23 '21 at 11:55

1 Answers1

0

You code is prone to a Denial of Service (DOS) attack.

$fileContents = file_get_contents($url);

This can blow your memory limit. Or come close to, while taking a long time (the server you request the data from stales in the middle after providing a lot of content - and then only some little bytes each couple of seconds). So your script will "hang" while consuming the memory.

If the script can then be triggered with another HTTP request multiple times, this can consume your servers resources (the echo statement suggests this is entirely possible).

hakre
  • 193,403
  • 52
  • 435
  • 836
  • Could you suggest a different approach for that purpose? curl? – Italianspiderman80 Jun 17 '21 at 06:45
  • Not merely a different approach, just the reminder to gain more knowledgeable understanding of the data-handling and protocols in use. Consider the error cases, not "just working". E.g. everything can fail, consider where something can fail (e.g. are the return values as expected?). Another place to check are limits. There are always limits, find the existing ones. Also consider your own limits: What do you want to limit (e.g. what the amount of data or time)? Think more low level, it's less a technical problem (unless it is but then you know about the limitations - good!) – hakre Jun 17 '21 at 07:05
  • Ok, I get it. Could I improve the code if I get rid of the file_get_contents function and leave the simplexml_load_file only? The only problem that I could have is the authentication mode: $context = stream_context_create(array( 'http' => array( 'header' => "Authorization: Basic " . base64_encode("$username:$password"), ) )); that is bounded to file_get_contents, but you are suggesting limiting the request on a server side level (every tot seconds), if I undestrand correctly – Italianspiderman80 Jun 17 '21 at 07:38