0

i'll start off by saying i'm a beginner, I'm scraping a website and planning on providing a notification with a Title and URL via the pushover API when a new item is discovered.

I'm wanting to run this once an hour, compare the new list with the previous list (saved in a .txt file) then send me a notification with the values.

I'm however having trouble with retrieving the data from the text file, in the same format as it's written. I'm sure it's a simple fix but i'm missing it, i've tried a couple of options but it doesn't seem to work

Relevant Code

def filewriter(filename, filetowrite):
    with open(filename, "w", newline="") as file:
        for each in filetowrite:
            file.write(str(each)+"\n")


def filereader(filename):
    old_videos = []
    with open(filename, "r") as file:
        for each in file:
            old_videos.append(each.strip("\n"))
    return old_videos


list = []
for each in (filereader("videos.txt")):
    replaced = each.replace("'", "")
    list.append(replaced)


print(current_videos)
print(list)

.txt file contents

When I run the filewriter function it writes the stripped HTML from the website into this format (taken from the .txt file). (Title:Seperator:Link)

FREE DOWNLOAD: Gran Turismo Sport HDR Sampler%https://www.digitalfoundry.net/2017-10-17-free-download-gran-turismo-sport-hdr-sampler
FREE DOWNLOAD: Horizon Zero Dawn PS4 Pro 4K Showcase%https://www.digitalfoundry.net/2017-02-23-free-download-horizon-zero-dawn-ps4-pro-4k-showcase
Nintendo Switch OLED Model Review: A Brilliant Display Upgrade - But Is That Enough?%https://www.digitalfoundry.net/2021-10-06-nintendo-switch-oled-model-review-a-brilliant-display-upgrade-but-is-that-enough
Ride 4: PS5/Xbox Series X - Is It Really The Next Level In Photo-Realism?%https://www.digitalfoundry.net/2021-10-02-ride-4-ps5-vs-xbox-series-x-tested-a-next-gen-racing-showcase
DF Direct Weekly #31: Nintendo Switch 4K Denials, Xbox Dolby Vision & 4K Dash, Nvidia DLAA!%https://www.digitalfoundry.net/2021-10-02-df-direct-weekly-31-nintendo-switch-4k-denials-xbox-dolby-vision-4k-dash-nvidia-dlaa
DF Retro - Zool Redimensioned: The Return and Revival of a 90s Platforming Icon%https://www.digitalfoundry.net/2021-10-01-zool-redimensioned-how-a-home-computer-icon-returned
Halo Infinite Big Performance Mode Boosts For Series X/One X - But What About PC?%https://www.digitalfoundry.net/2021-10-01-halo-infinite-big-performance-mode-boosts-for-series-xone-x-but-what-about-pc
The Touryst PS5 - The First 8K 60fps Console Game!%https://www.digitalfoundry.net/2021-09-30-the-touryst-ps5-the-first-8k-60fps-console-game
Call of Duty Vanguard Beta: PS5 vs Xbox Series X/S Multiplayer + 120Hz Mode Tested!%https://www.digitalfoundry.net/2021-09-25-call-of-duty-vanguard-beta-ps5-vs-xbox-series-xs-multiplayer-120hz-mode-tested
DF Direct Weekly #30: Nintendo Direct Reaction, PS4 CBOMB Fixed!%https://www.digitalfoundry.net/2021-09-25-df-direct-weekly-30-nintendo-direct-reaction-ps4-cbomb-fixed

Console output

I then read this in with the filereader function, then i'm planning on comparing them as sets and finding if there is a new video uploaded.

However when I print both the list that's written and the list that is read, they look like this in the console:

[FREE DOWNLOAD: Gran Turismo Sport HDR Sampler%https://www.digitalfoundry.net/2017-10-17-free-download-gran-turismo-sport-hdr-sampler, FREE DOWNLOAD: Horizon Zero Dawn PS4 Pro 4K Showcase%https://www.digitalfoundry.net/2017-02-23-free-download-horizon-zero-dawn-ps4-pro-4k-showcase, Nintendo Switch OLED Model Review: A Brilliant Display Upgrade - But Is That Enough?%https://www.digitalfoundry.net/2021-10-06-nintendo-switch-oled-model-review-a-brilliant-display-upgrade-but-is-that-enough, Ride 4: PS5/Xbox Series X - Is It Really The Next Level In Photo-Realism?%https://www.digitalfoundry.net/2021-10-02-ride-4-ps5-vs-xbox-series-x-tested-a-next-gen-racing-showcase, DF Direct Weekly #31: Nintendo Switch 4K Denials, Xbox Dolby Vision & 4K Dash, Nvidia DLAA!%https://www.digitalfoundry.net/2021-10-02-df-direct-weekly-31-nintendo-switch-4k-denials-xbox-dolby-vision-4k-dash-nvidia-dlaa, DF Retro - Zool Redimensioned: The Return and Revival of a 90s Platforming Icon%https://www.digitalfoundry.net/2021-10-01-zool-redimensioned-how-a-home-computer-icon-returned, Halo Infinite Big Performance Mode Boosts For Series X/One X - But What About PC?%https://www.digitalfoundry.net/2021-10-01-halo-infinite-big-performance-mode-boosts-for-series-xone-x-but-what-about-pc, The Touryst PS5 - The First 8K 60fps Console Game!%https://www.digitalfoundry.net/2021-09-30-the-touryst-ps5-the-first-8k-60fps-console-game, Call of Duty Vanguard Beta: PS5 vs Xbox Series X/S Multiplayer + 120Hz Mode Tested!%https://www.digitalfoundry.net/2021-09-25-call-of-duty-vanguard-beta-ps5-vs-xbox-series-xs-multiplayer-120hz-mode-tested, DF Direct Weekly #30: Nintendo Direct Reaction, PS4 CBOMB Fixed!%https://www.digitalfoundry.net/2021-09-25-df-direct-weekly-30-nintendo-direct-reaction-ps4-cbomb-fixed]
['FREE DOWNLOAD: Gran Turismo Sport HDR Sampler%https://www.digitalfoundry.net/2017-10-17-free-download-gran-turismo-sport-hdr-sampler', 'FREE DOWNLOAD: Horizon Zero Dawn PS4 Pro 4K Showcase%https://www.digitalfoundry.net/2017-02-23-free-download-horizon-zero-dawn-ps4-pro-4k-showcase', 'Nintendo Switch OLED Model Review: A Brilliant Display Upgrade - But Is That Enough?%https://www.digitalfoundry.net/2021-10-06-nintendo-switch-oled-model-review-a-brilliant-display-upgrade-but-is-that-enough', 'Ride 4: PS5/Xbox Series X - Is It Really The Next Level In Photo-Realism?%https://www.digitalfoundry.net/2021-10-02-ride-4-ps5-vs-xbox-series-x-tested-a-next-gen-racing-showcase', 'DF Direct Weekly #31: Nintendo Switch 4K Denials, Xbox Dolby Vision & 4K Dash, Nvidia DLAA!%https://www.digitalfoundry.net/2021-10-02-df-direct-weekly-31-nintendo-switch-4k-denials-xbox-dolby-vision-4k-dash-nvidia-dlaa', 'DF Retro - Zool Redimensioned: The Return and Revival of a 90s Platforming Icon%https://www.digitalfoundry.net/2021-10-01-zool-redimensioned-how-a-home-computer-icon-returned', 'Halo Infinite Big Performance Mode Boosts For Series X/One X - But What About PC?%https://www.digitalfoundry.net/2021-10-01-halo-infinite-big-performance-mode-boosts-for-series-xone-x-but-what-about-pc', 'The Touryst PS5 - The First 8K 60fps Console Game!%https://www.digitalfoundry.net/2021-09-30-the-touryst-ps5-the-first-8k-60fps-console-game', 'Call of Duty Vanguard Beta: PS5 vs Xbox Series X/S Multiplayer + 120Hz Mode Tested!%https://www.digitalfoundry.net/2021-09-25-call-of-duty-vanguard-beta-ps5-vs-xbox-series-xs-multiplayer-120hz-mode-tested', 'DF Direct Weekly #30: Nintendo Direct Reaction, PS4 CBOMB Fixed!%https://www.digitalfoundry.net/2021-09-25-df-direct-weekly-30-nintendo-direct-reaction-ps4-cbomb-fixed']

As you can see on the second print, it's adding a little ' to the beginning and end of each list element.

If I iterate through each element it's not printed and if I try and .strip() it or .replace() it, it won't do anything.

Any ideas would be great :)

  • 2
    What is current_videos? You reference it in the print statement but I don't see where it is declared in your code – M. Chak Oct 07 '21 at 19:17
  • Ermm it's quite complicated moving between different packages. It's a list of strings returned from a scrape. I've removed it but it's what was used to write the .txt file in the first place. I wrote it by calling the filewriter function – TwizzleBizzle Oct 07 '21 at 19:22
  • 1
    Suppose you want to make a list of strings in python. You could type `foo = ['hello', 'there']`. Those quotes aren't in the string, they just tell python that it _is_ a string. Those are string literals. If you do `repr(foo[0])` python will show you that same literal. Same thing happens with lists, python displays each element as its `repr`. You'll get something that looks like a string literal, even though those quotes aren't really in the string. – tdelaney Oct 07 '21 at 19:23
  • If I use this ` if old_videos[1] == current_videos[1]: print("True") ` At the end of the current code, it doesn't print true, so it's almost like they're not the same :( – TwizzleBizzle Oct 07 '21 at 19:26
  • 1
    Check https://stackoverflow.com/a/2626364/4046632, especially the part **Container’s __str__ uses contained objects’ __repr__**. – buran Oct 07 '21 at 19:33
  • Thanks will read it. if I print the list as a repr. I get this :"'FREE DOWNLOAD: Gran Turismo Sport HDR Sampler%https://www.digitalfoundry.net/2017-10-17-free-download-gran-turismo-sport-hdr-sampler'" it now has double quotes. I'm not sure its an str to repr issue. – TwizzleBizzle Oct 07 '21 at 19:35
  • You was right, I was getting hung up on the quotes being there, rather than not there. I iterated through each element in the list and converted it explicitly to a string – TwizzleBizzle Oct 07 '21 at 19:50

1 Answers1

0

I was misunderstanding the 2 lists it printed. One didn't have quotes because it didn't have any type.

I iterated through the list, converting each element to a string using:

current_videos = [str(e) for e in current_videos]