1

I'm missing something and its driving me bananas,. For some reason, the lookup to the aliases list of dicts is not working.

import urllib.parse


def parse_source_from_url(url):
    # Clean up the url
    parse_url = urllib.parse.urlparse(url.strip().replace('\n', ''))
    
    # Parse the interesting parts
    url = parse_url.netloc.split('.')
    
    # Words to omit
    keywords = ['feeds', 'com']
    
    # Determine which url words are not keywords to omit
    feed_source = [u for u in url if u not in keywords][0]
    
    # Map out aliases as needed
    aliases = [
        {'feed_source': 'example', 'alias': 'good'},
        {'feed_source': 'feedburner', 'alias': 'notgood'}]
    
    # Select alias if feed_source is shown in list of dict
    # This is where the problem is
    feed_source = [
        a['alias']
        if a['feed_source'] == feed_source
        else feed_source
        for a in aliases]
    
    # Return only the aliased name
    return feed_source[0]

urls = ['https://feeds.example.com/', 'https://feeds.feedburner.com/']

for u in urls:
    test = parse_source_from_url(u)
    print(test)

Results:

  • The first result is correct ("example" aliased "good") but the second URL is incorrect.
  • "feedburner" should be aliased as "notgood"
good
feedburner

Tried:

  • The method here using next() and the get() methods.
  • Same issue
SeaDude
  • 3,725
  • 6
  • 31
  • 68
  • 2
    I'm confused what `feed_source` is supposed to be. I have the impression you are using the same variable name for at least 2 different things. – mkrieger1 Jan 12 '22 at 00:55
  • Initially, its the list of `.netloc.split('.')` part of the url (Ex. `[feeds, feedburner, com]`). Then the alias check. If feed_source should be aliased, then get the 'alias' from the dict with the corresponding 'feed_source', Otherwise, leave it. – SeaDude Jan 12 '22 at 01:01

1 Answers1

2

You kinda miss the if placement in your loop. Now your feed_source list contains as much records as aliases list. And all of them are equal feed_source except eventually the one that has alias, but you takes first of it anyway.

Correct code should be:

    aliased = [a["alias"] for a in aliases if a["feed_source"] == feed_source]
    return aliased[0] if aliased else feed_source

So aliased has maximum one element now. If it, it's alias, if not (so aliased is empty) you just return feed_source

And one hint - if your aliases element always have two keys, maybe you could just use dict:

aliases = {'example': 'good', 'feedburner': 'notgood'}
kosciej16
  • 6,294
  • 1
  • 18
  • 29
  • Thank you for putting eye on the code. I'm new to list comprehensions. I like your idea for renaming the variable to `aliased`. I don't quite understand why putting `else feed_source` as part of the `return` vs. the original list comp. But its working! I'll study. – SeaDude Jan 12 '22 at 01:15
  • Because where there is no `alias` for `feed_source` the `aliased` list will be empty and probably you don't want to return emtpy list here. – kosciej16 Jan 12 '22 at 01:18
  • I mean, it depends. You can always add to `aliases` record like {"feed_source": "without_alias", "alias": "without_alias"} – kosciej16 Jan 12 '22 at 01:20
  • 1
    Anyway, think about my last hint I edited - it will make your code significantly easier. – kosciej16 Jan 12 '22 at 01:21