0

I have a huge php file that contains a number of arrays

<?php

$quadrature_weights = array();

$quadrature_weights[2] = array(
    1.0000000000000000,
    1.0000000000000000);

$quadrature_weights[3] = array(
    0.8888888888888888,
    0.5555555555555555,
    0.5555555555555555);

$quadrature_weights[4] = array(
    0.6521451548625461,
    0.6521451548625461,
    0.3478548451374538,
    0.3478548451374538);
?>

The real file contains 64 quadrature_weights and the actual number of decimals of the numbers inside the arrays is in the order of 200 (I have reduced the number here for readability).

I would like to load this file into python and determine how many decimals to keep. Lets say i decide to keep 4 decimals the output should be a dictionary (or some other container) like this

quadrature_weights = {
    2: [1.0000, 
        1.0000],
    3: [0.8888,
        0.5555,
        0.5555],
    4: [0.6521,
        0.6521,
        0.3478,
        0.3478]
 }

I am not familiar with php and quite frankly I have no idea how to do this. I suppose it would be possible read every single line and then do some sort of "decoding" manually but I was really hoping to avoid that.

mortysporty
  • 2,749
  • 6
  • 28
  • 51
  • 2
    Option #1: somehow parse PHP code in Python to extract the data. ☹️ Option #2: modify that PHP code and let it output the data in some easy format like JSON (`echo json_encode($quadrature_weights);`), then read that into Python. ☺️ – deceze Oct 15 '21 at 08:15
  • Does this answer your question? [Read a plain text file with php](https://stackoverflow.com/questions/4103287/read-a-plain-text-file-with-php) – 9ilsdx 9rvj 0lo Oct 15 '21 at 08:15
  • @9ilsdx 9rvj 0lo it seems I need to know how to use php for that to work... so no. But thank you for the suggestion. – mortysporty Oct 15 '21 at 08:19
  • 1
    Note though that "deciding how many decimals to keep" isn't a thing with floats. You can type "1.0000" all you want, the machine is going to forget about those zeros immediately, because they're irrelevant to the *value* of the number. `1.0` is exactly the same as `1.000000000`, and floats don't store the useless information of how many zeros you typed. – deceze Oct 15 '21 at 08:19
  • @deceze ♦ Option #2 seems like the most viable route given what I know, Regarding the number of decimals, your point is valid for 1.00000 but not for the other numbers. – mortysporty Oct 15 '21 at 08:21
  • Even for the other numbers, if you truncate the number to, say, `0.3478`, whether that's the actual value that'll be stored internally is not guaranteed. What the machine will store internally is the *closest number to `0.3478` representable by floating point mechanics*, which *may* be something like `0.347799999999` in certain cases. – deceze Oct 15 '21 at 08:23
  • That is true. Im not sure what the precision on floats is in python. But regardless... lets say the acutal precision is something like 20 digits "rouding" to 10 digits is still possible even if the number is internally stored with 20 digits (and it isnt necessarily exact on that last digit). – mortysporty Oct 15 '21 at 08:27

1 Answers1

1

If the number of decimals is in the 200s in the PHP file, then PHP will truncate that and there is a 100% guarantee you will not get the same numbers out if you let PHP parse those arrays and output JSON to then use in Python.

Personally, I'd read the file line-by-line and parse it in Python using regular expressions. Something along the lines of:

import re

quadrature_weights = {}
max_decimals = 6

with open("/path/to/file.php") as phpfile:
    key = None
    
    for line in phpfile:
        match_key = re.search("\[(\d+)\]", line)
        if match_key:
            key = match_key.group(1)
            quadrature_weights[key] = []
            continue

        match_value = re.search("([\d.]+)", line)
        if match_value:
            quadrature_weights[key].append(
                round(float(match_value.group(1)), max_decimals)
            )

print(quadrature_weights)

With an input file like yours, the output of this will be (indented for readability):

{
    '2': [1.0, 1.0],
    '3': [0.888889, 0.555556, 0.555556],
    '4': [0.652145, 0.652145, 0.347855, 0.347855]
}

If you want to always keep the correct number of decimals, even if the number is "1.0000000000000", then you should treat the numbers as strings:

match_value = re.search("([\d.]+)", line)
if match_value:
    value = match_value.group(1)
    period = value.index('.')
    max_length = period + 1 + max_decimals

    quadrature_weights[key].append(value[0:max_length])

With this change, the dict will look like (indented for readability):

{
    '2': ['1.000000', '1.000000'],
    '3': ['0.888888', '0.555555', '0.555555'],
    '4': ['0.652145', '0.652145', '0.347854', '0.347854']
}

Then you can convert the values to float when you actually need to use the numerical values for calculations.

rickdenhaan
  • 10,857
  • 28
  • 37
  • It didnt work straight away but I will tinker a bit with it and see if I can get it to work. Thank you. – mortysporty Oct 15 '21 at 09:10
  • @mortysporty I updated my answer so that it now contains working code. Specifically, `re.match()` should have been `re.search()` and I've added the float conversion and rounding to a specific number of decimals (as specified in `max_decimals`)) – rickdenhaan Oct 15 '21 at 12:42