How to to remove a double/nested for-loop? String to float transformation in python

Question

I have the following polygon of a geographic area that I fetch via a request in CAP/XML format from an API

The raw data looks like this:

I store the requested items in a dictionary and then work through them to transform to a GeoJSON list object that is suitable for ingestion into Elasticsearch according to the schema I'm working with. I've removed irrelevant code here for ease of reading.

# fetches and store data in a dictionary
r = requests.get("https://alerts.weather.gov/cap/ny.php?x=0")
xpars = xmltodict.parse(r.text)
json_entry = json.dumps(xpars['feed']['entry'])
dict_entry = json.loads(json_entry)

# transform items if necessary
for entry in dict_entry:

    if entry['cap:polygon']:
        polygon = entry['cap:polygon']
        polygon = polygon.split(" ") 
        coordinates = []
        # take the split list items swap their positions and enclose them in their own arrays
        for p in polygon:
            p = p.split(",")
            p[0], p[1] = float(p[1]), float(p[0]) # swap lon/lat
            coordinates += [p]

        # more code adding fields to new dict object, not relevant to the question

The output of the p in polygon loop looks like:

[ [113.8659, 22.3243], [113.8691, 22.3333], [113.8691, 22.4288], [113.8742, 22.4316], [113.9478, 22.4724], [113.9951, 22.5101], [113.9985, 22.5099], [114.0017, 22.508], [114.0051, 22.5046], [114.0085, 22.5018], [114.0112, 22.5007], [114.0125, 22.5007], [114.0166, 22.502], [114.0204, 22.5038], [114.0245, 22.5066], [114.0281, 22.5067], [114.0371, 22.5057], [114.0409, 22.5051], [114.0453, 22.5041], [114.0494, 22.5025], [114.0511, 22.5023], [114.0549, 22.5035], [114.0564, 22.5047], [114.057, 22.5059], [114.0576, 22.5104], [114.0584, 22.512], [114.0608, 22.5144], [114.0637, 22.5163], [114.0657, 22.517], [114.0683, 22.5172], [114.0717, 22.5181], [114.0739, 22.5173] ]

Is there a way to do this that is better than O(N^2)? Thank you for taking the time to read.

Actually now that I'm looking at it with fresh eyes I think it might be O(n^3) due to the p.split()? — Isaac Keleher, Nov 25 '21 at 00:58
This is not `O(N^2)` - this is `O(MxN)` because there are M entries and of those, there are N points in a polygon (if there is a polygon). — Larry the Llama, Nov 25 '21 at 00:58
@Barmar the p in polygon loop transforms it to GeoJSON -> https://geojson.org/ — Isaac Keleher, Nov 25 '21 at 00:59
I'm talking about `dict_entry = json.loads(json_entry)`. Why not just `dict_entry = xpars['feed']['entry']` — Barmar, Nov 25 '21 at 00:59
@IsaacKeleher I am not sure you understand O(N^2), etc. It is only a power of N if it is actually dependent on N. It would not be O(N^3) but rather O(KxMxN) because those variables are unrelated — Larry the Llama, Nov 25 '21 at 01:00
It's really O(N) where N is the total number of coordinates in the JSON. — Barmar, Nov 25 '21 at 01:00
The nested loops aren't multiplying the complexity because they're processing smaller pieces of the original data. — Barmar, Nov 25 '21 at 01:01
@Barmar difference is that my way gets rid of additional nesting e.g. `[OrderedDict([('id', 'https://alerts.weather.gov/cap/ny.php?x=0'), ('updated', '2021-11-25T01:03:09+00:00'),` vs `[{'id': 'https://alerts.weather.gov/cap/ny.php?x=0', 'updated': '2021-11-25T01:03:09+00:00',` Just makes it easier to read when working with the raw data — Isaac Keleher, Nov 25 '21 at 01:07
@LarrytheLlama could you please expand on O(KxMxN) or give me a link to learn more about it please? — Isaac Keleher, Nov 25 '21 at 01:10
@Barmar I'm a bit confused now, why does processing smaller parts of it change the time complexity? :) — Isaac Keleher, Nov 25 '21 at 01:10
Consider: `for i in range(0, 20, 5): for j in range(5): do something` versus `for i in range(0, 20):`. They both iterate 20 times, but the first one does it in 4 groups of 5. — Barmar, Nov 25 '21 at 01:12
No matter how you organize the loops, you're just executing the inner calls to `float()` once for each number in the input data. — Barmar, Nov 25 '21 at 01:13
Hmm so say I have 5 entries from the API feed and each polygon point is 2 items That would be `5 * M * 2` ? Where `M` = the number of ordered pairs/coordinates? — Isaac Keleher, Nov 25 '21 at 01:20

score 1 · Accepted Answer · answered Nov 25 '21 at 01:24

1

O(KxNxM)

This process involves three obvious loops. These are:

Checking each entry (K)
Splitting valid entries into points (MxN) and iterating through those points (N)
Splitting those points into respective coordinates (M)

The amount of letters in a polygon string is ~MxN because there are N points each roughly M letters long, so it iterates through MxN characters.

Now that we know all of this, let's pinpoint where each occurs.

ENTRIES (K):
    IF:
        SPLIT (MxN)
        POINTS (N):
            COORDS(M)

So, we can finally conclude that this is O(K(MxN + MxN)) which is just O(KxNxM).

answered Nov 25 '21 at 01:24

Larry the Llama

958
3
13

Thank you for taking the time to answer this Larry. – Isaac Keleher Nov 25 '21 at 01:36
Just a followup q: In the example data of my original post there are 32 points to be split, so 64 individual coordinates. So assuming 5 entries with polygons we have: `K * (M*N + M*N) = 5 * (32*2 + 32*2) = 640` operations. How could this be equivalent to `O(n)` where `N = 64` as discussed in the comments of the original question? – Isaac Keleher Nov 25 '21 at 02:04
1

@IsaacKeleher Yes, sort of. The M is the length of a point before it is split, so if each point is about ~15-20 characters, then M will be that, because the split iterates through _each_ character, the amount of coords in a point is negligible. – Larry the Llama Nov 25 '21 at 02:37

How to to remove a double/nested for-loop? String to float transformation in python

1 Answers1

O(KxNxM)