I want to collect the comment on reddit an I use praw to get an ID of a document like a2rp5i
. For example, I already collect a set of ID like
docArr=
['a14bfr', '9zlro3', 'a2pz6f', 'a2n60r', 'a0dlj3']
my_url = "https://old.reddit.com/r/Games/comments/a0dlj3/"
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "html.parser")
content_containers = page_soup.findAll("div", {"class":"md"})
timestamp_containers = page_soup.findAll("p", {"class":"tagline"})
time = timestamp_containers[0].time.get('datetime')
I want to use time as my filename and I want to save content as a txt file
outfile = open('%s.txt' % time , "w")
for content_container in content_containers:
if content_container == "(self.games)":
continue
data = content_container.text.encode('utf8').decode('cp950', 'ignore')
outfile.write(data)
outfile.close()
This attempt is fine for me to save only one url
But I want to save ID in docArr
at the same
url_test = "https://old.reddit.com/r/Games/comments/{}/"
for i in set(docArr):
url = url_test.format(i)
It get me the url right. But how do I save time
and content_container
of all of the url in docArr at once?