4

I am writing a python program that loops through reddit submissions, pulls data, and stores it as an object in a list. However I am having trouble writing that list to a csv file. The file is created but it just gives some kind of id tag for the objects. How should I change the csv code?

Code

import praw
from datetime import datetime
import pandas as pd

class Submission:
    def __init__(self, time, score, title, text, ofReddit, serious):
        self.time = time
        self.score = score
        self.title = title
        self.text = text
        self.ofReddit = ofReddit
        self.serious = serious
data = []

reddit = praw.Reddit(client_id=id, client_secret=secret,
                     user_agent='testscript by /u/SilentButtDeadlies')
subreddit = reddit.subreddit('AskReddit')
for submission in subreddit.new(limit=50):
    time = datetime.utcfromtimestamp(submission.created_utc).hour
    score = submission.score
    title = len(submission.title)
    text = len(submission.selftext)
    if 'of reddit' in submission.title.lower():
        ofReddit = 1
    else:
        ofReddit = 0
    if '[serious]' in submission.title.lower():
        serious = 1
    else:
        serious = 0
    data.append(Submission(time, score, title, text, ofReddit, serious))
df = pd.DataFrame(data)
filename = 'AskRedditData' + str(datetime.now()) + '.csv'
df.to_csv(filename, index=False, encoding='utf-8')

CSV File

0
<__main__.Submission instance at 0x1118f6ef0>
<__main__.Submission instance at 0x1118f68c0>
<__main__.Submission instance at 0x1118f6950>
<__main__.Submission instance at 0x1118c3758>
<__main__.Submission instance at 0x11239c638>
<__main__.Submission instance at 0x11239c5f0>
<__main__.Submission instance at 0x112398908>
<__main__.Submission instance at 0x112398998>
<__main__.Submission instance at 0x112398878>
<__main__.Submission instance at 0x1123989e0>
<__main__.Submission instance at 0x112398c68>
<__main__.Submission instance at 0x11239fe18>
<__main__.Submission instance at 0x11239fe60>
<__main__.Submission instance at 0x11239fea8>
<__main__.Submission instance at 0x11239fef0>
<__main__.Submission instance at 0x11239ff38>
<__main__.Submission instance at 0x11239ff80>
<__main__.Submission instance at 0x11239ffc8>
<__main__.Submission instance at 0x112404050>
<__main__.Submission instance at 0x112404098>
<__main__.Submission instance at 0x1124040e0>
<__main__.Submission instance at 0x112404128>
<__main__.Submission instance at 0x112404170>
<__main__.Submission instance at 0x1124041b8>
<__main__.Submission instance at 0x112404200>
<__main__.Submission instance at 0x112404248>
<__main__.Submission instance at 0x112404290>
<__main__.Submission instance at 0x1124042d8>
<__main__.Submission instance at 0x112404320>
<__main__.Submission instance at 0x112404368>
<__main__.Submission instance at 0x1124043b0>
<__main__.Submission instance at 0x1124043f8>
<__main__.Submission instance at 0x112404440>
<__main__.Submission instance at 0x112404488>
<__main__.Submission instance at 0x1124044d0>
<__main__.Submission instance at 0x112404518>
<__main__.Submission instance at 0x112404560>
<__main__.Submission instance at 0x1124045a8>
<__main__.Submission instance at 0x1124045f0>
<__main__.Submission instance at 0x112404638>
<__main__.Submission instance at 0x112404680>
<__main__.Submission instance at 0x1124046c8>
<__main__.Submission instance at 0x112404710>
<__main__.Submission instance at 0x112404758>
<__main__.Submission instance at 0x1124047a0>
<__main__.Submission instance at 0x1124047e8>
<__main__.Submission instance at 0x112404830>
<__main__.Submission instance at 0x112404878>
<__main__.Submission instance at 0x1124048c0>
<__main__.Submission instance at 0x112404908>
Marjorie Pickard
  • 109
  • 1
  • 2
  • 10
  • What do you *expect* it to write? That is the default `__str__` implementation all objects inherit from the `object`. – juanpa.arrivillaga Jun 28 '17 at 21:57
  • 2
    Also, are you using `pandas` *only* to write a csv? Seems like overkill. You should just use the `csv` module. – juanpa.arrivillaga Jun 28 '17 at 21:58
  • Sorry, I am new to all this. Just using csv is better? I was hoping to write the object like: {time: ####, score: #### ...} – Marjorie Pickard Jun 28 '17 at 22:16
  • Try changing this `df = pd.DataFrame(data)` to this `df = pd.DataFrame([obj.__dict__ for obj in data])`. Pandas dataframes need to be constructed from an object that Pandas understands, one option is a list of dictionaries. – calico_ Jun 28 '17 at 22:16
  • @MarjoriePickard with the brackets and colons? Try out the answer I used below. – juanpa.arrivillaga Jun 28 '17 at 22:18

1 Answers1

4

Your submission class seems to simply function as a record type. You probably could just use a namedtuple. So replace you class definition with:

from collections import namedtuple
Submission = namedtuple('Submission', ['time', 'score', 'title', 'text', 'ofReddit', 'serious'])

Now the rest of your code should just work. pandas doesn't know how to interpret your Submission class you originally wrote. So it simply makes a single column of Submission objects, and when it writes, it uses the str(Submission()) which defaults to the object __str__ since you did not define another __str__. Really, you want to use a sequence. The namedtuple function is actually a class factory, and it created a record-type derived from tuple, so it has all the handy functions you need with a very handy constructor.

Now, since you are using Python 2, I didn't bother to change your use of pandas, even though it seems like overkill to only use it for writing a csv. That being said, getting Python 2 csv module to play nice with unicode is a pain, so you might as well keep it. If you could switch to Python 3, you could simply replace the pandas stuff with:

import csv
with open(filename, 'w', newline='', encoding='utf8') as f:
    writer = csv.writer(f)
    writer.writerow(Submission._fields) # namedtuple breaks convention public fields have single underscore
    writer.writerows(data)
juanpa.arrivillaga
  • 88,713
  • 10
  • 131
  • 172
  • Thank you so much! – Marjorie Pickard Jun 28 '17 at 22:19
  • 2
    @MarjoriePickard when you find yourself writing a class that you expect to use to create a bunch of objects that essentially function as records, (i.e. only data attributes, no methods), then you can probably just use `namedtuple`. This will write a very efficient class for you! – juanpa.arrivillaga Jun 28 '17 at 22:20
  • 1
    @MarjoriePickard and when I say it writes the class for you, I mean literally, it generates a class definition and executes it. You can see exactly what your `Submission` namedtuple class looks like, checkout `print(Submission._source)` – juanpa.arrivillaga Jun 28 '17 at 22:29