0

I got several issues on this one. So I'm trying to use feedparser and psycopg. Probleme is, i don't want to have duplicated data.

    def dbFeed():
conn_string ="host='localhost' dbname='rss_feed' user='postgres' password='somepassword'"
print ("Connecting to dababase\n ->%s" %(conn_string))

try:
    conn = psycopg2.connect(conn_string)
    cursor = conn.cursor()
    print ("Connected!\n")
except:
    print ('Unable to connect to the database')


 feeds_to_parse=open("C:\\Users\\Work\\Desktop\\feedparser_entry_tests_world.txt","r")


for line in feeds_to_parse:
    parser = fp.parse(str(line))
    x = len(parser['entries'])
    count = 0   
    while count < x:

Now i have several solutions. At first, i tried this :

cursor.execute("INSERT INTO feed (link, title, publication_date, newspaper) VALUES (%s, %s, %s, %s)",
        (parser['entries'][count]['link'], parser['entries'][count]['title'],
        parser['entries'][count]['published'],parser['feed']['title']))

But of course i had duplicated data. So i saw this post here : Avoiding duplicated data in PostgreSQL database in Python

And i tried this but I had a tuple index out of range error

cursor.execute("""INSERT INTO feed (link, title, publication_date, newspaper) SELECT %s, %s, %s, %s WHERE NOT EXISTS
              (feed.title FROM feed WHERE feed.title=%s);""",
            (parser['entries'][count]['link'], parser['entries'][count]['title'],
            parser['entries'][count]['published'],parser['feed']['title']))

But anyway, that's not the way i want to do it. I want to add a condition in my while loop that tests the existance of the data before inserting, because i don't want to test the whole database, i want to test last entries only. Once again, of course it does not work, because i guess that parser['entries'][count]['title'] is not what i think it is...

while count < x:
    if parser['entries'][count]['title'] != cursor.execute("SELECT feed.title FROM feed WHERE publication_date > current_date - 15"):

cursor.execute("INSERT INTO feed (link, title, publication_date, newspaper) VALUES (%s, %s, %s, %s)",
        (parser['entries'][count]['link'], parser['entries'][count]['title'],
        parser['entries'][count]['published'],parser['feed']['title']))




conn.commit()

cursor.close()
conn.close()
Community
  • 1
  • 1
whiplash
  • 375
  • 1
  • 3
  • 5

1 Answers1

0

You have to add second title that is used in where part, also you can add extra conditions there:

cursor.execute(
    "INSERT INTO feed (link, title, publication_date, newspaper) "
    "SELECT %s, %s, %s, %s WHERE NOT EXISTS (SELECT 1 FROM feed "
    "WHERE title = %s AND publication_date > current_date - 15);",
    (parser['entries'][count]['link'],
     parser['entries'][count]['title'],
     parser['entries'][count]['published'],
     parser['feed']['title'],
     parser['feed']['title']))
Imre L
  • 6,159
  • 24
  • 32
  • I think the last line should be replaced by parser['entries'][count]['title'] because i'm testing article's names. Anyway it seems to work except for 3 newspapers... Thank's! – whiplash Apr 30 '15 at 13:36