0

I am using the feedparser module to create a news feed in my program.

The Yahoo! Finance API link element actually has two links: the Yahoo link, and the actual article link (external site/source). The two are separated by an asterisks, with the following being an example:

'http://us.rd.yahoo.com/finance/external/investors/rss/SIG=12shc077a/*http://www.investors.com/news/technology/click/pokemon-go-hurting-facebook-snapchat-usage/'

Note the asterisk between the two items.

I was just wondering if there is a pythonic way to separate these two, and only read the second link to a file.

Thank you for your time.

Here is my relevant code:

def parse_feed(news_feed_message, rss_url):
    ''' This function parses the Yahoo! RSS API for data of the latest five articles, and writes it to the company news text file'''

    # Define the RSS feed to parse from, as the url passed in of the company the user chose
    feed = feedparser.parse(rss_url)

    # Define the file to write the news data to the company news text file
    outFile = open('C:\\Users\\nicks_000\\PycharmProjects\\untitled\\SAT\\GUI\\Text Files\\companyNews.txt', mode='w')

    # Create a list to store the news data parsed from the Yahoo! RSS
    news_data_write = []
    # Initialise a count
    count = 0
    # For the number of articles to append to the file, append the article's title, link, and published date to the news_elements list
    for count in range(10):
        news_data_write.append(feed['entries'][count].title)
        news_data_write.append(feed['entries'][count].published)
        news_data_write.append(feed['entries'][count].link)
        # Add one to the count, so that the next article is parsed
        count+=1
        # For each item in the news_elements list, convert it to a string and write it to the company news text file
        for item in news_data_write:
            item = str(item)
            outFile.write(item+'\n')
        # For each article, write a new line to the company news text file, so that each article's data is on its own line
        outFile.write('\n')
        # Clear the news_elements list so that data is not written to the file more than once
        del(news_data_write[:])
    outFile.close()

    read_news_file(news_feed_message)
Cœur
  • 37,241
  • 25
  • 195
  • 267
Nick
  • 63
  • 2
  • 9

1 Answers1

0

You can split this the following way:

link = 'http://us.rd.yahoo.com/finance/external/investors/rss/SIG=12shc077a/*http://www.investors.com/news/technology/click/pokemon-go-hurting-facebook-snapchat-usage/'

rss_link, article_link = link.split('*')

Keep in mind that this requires the link to always contain the asterisk, otherwise you'll get the following exception:

ValueError: not enough values to unpack (expected 2, got 1)

If you only need the second link, you could also write:

_, article_link = link.split('*')

This indicates that you want to discard the first return value. Another alternative is:

article_link = link.split('*')[1]

Regarding your code: if you have an exception anywhere after you've opened your output file, it won't be closed properly. Either use the open context manager (docs) or a try ... finally block (docs) to make sure you close your file whatever happens.

Context manager:

with open('youroutputfile', 'w') as f:
    # your code
    f.write(…)

Exception handler:

try:
    f = open('youroutputfile', 'w')
    f.write(…)
finally:
    f.close()
DocZerø
  • 8,037
  • 11
  • 38
  • 66