0

i have a dataset that haven't movies inforrmation; i wanna add movie informations to it in json format from OMDBapi

i writing this code in python 3.5 to do this for me :

import urllib.request
import csv
import json
import datetime
from collections import defaultdict
from urllib import response
i=0
columns = defaultdict(list)
with open('C:\dataset\dataset.dat') as f:
  reader = csv.DictReader(f) 
  for row in reader:
    for (k,v) in row.items(): 
        columns[k].append(v)
with open('C:\dataset\dataset.dat','r',encoding='utf-8') as csvinput:
  with open('C:\dataset\dataset_edited.dat', 'w',encoding='utf-8') as csvoutput:
    writer = csv.writer(csvoutput)
    for row in csv.reader(csvinput):
        if row[0] == "user_id":
            writer.writerow(row+["movie_in_json_format"])
        else:
                movieJson=urllib.request.urlopen("http://www.omdbapi.com/?i=tt"+str(columns['item_id'][i])+"&y=&plot=short&r=json").read()
                movieJson=movieJson.decode('utf-8')
                writer.writerow(row+[movieJson])
                i=i+1

the json format was writed in file in this format :

"{""Title"":""CitizenDog"",""Year"":""2004"",""Rated"":""N/A"",""Released"":""09 Mar 2006"",""Runtime"":""100 min"",""Genre"":""Comedy, Fantasy, Romance"",""Director"":""Wisit Sasanatieng"",""Writer"":""Koynuch (novel), Wisit Sasanatieng"",""Actors"":""Mahasamut Boonyaruk, Saengthong Gate-Uthong, Sawatwong Palakawong Na Autthaya, Nattha Wattanapaiboon"",""Plot"":""Pod is a man without a dream. He's a country bumpkin who comes to work at a tinned sardine factory in Bangkok. One day, Pod chops off his finger and packs it in the can, prompting him to go..."",""Language"":""Thai, English, Mandarin"",""Country"":""Thailand"",""Awards"":""2 wins & 1 nomination."",""Poster"":""http://ia.media-imdb.com/images/M/MV5BY2VlNDQwZTctMjBlNy00ZjYyLWEwYzAtNjA1YTNjNjVlMjU1XkEyXkFqcGdeQXVyMTIxMDUyOTI@._V1_SX300.jpg"",""Metascore"":""N/A"",""imdbRating"":""7.5"",""imdbVotes"":""1,544"",""imdbID"":""tt0444778"",""Type"":""movie"",""Response"":""True""}"

while that should be like that :

{"Title":"Citizen Dog","Year":"2004","Rated":"N/A","Released":"09 Mar 2006","Runtime":"100 min","Genre":"Comedy, Fantasy, Romance","Director":"Wisit Sasanatieng","Writer":"Koynuch (novel), Wisit Sasanatieng","Actors":"Mahasamut Boonyaruk, Saengthong Gate-Uthong, Sawatwong Palakawong Na Autthaya, Nattha Wattanapaiboon","Plot":"Pod is a man without a dream. He's a country bumpkin who comes to work at a tinned sardine factory in Bangkok. One day, Pod chops off his finger and packs it in the can, prompting him to go...","Language":"Thai, English, Mandarin","Country":"Thailand","Awards":"2 wins & 1 nomination.","Poster":"http://ia.media-imdb.com/images/M/MV5BY2VlNDQwZTctMjBlNy00ZjYyLWEwYzAtNjA1YTNjNjVlMjU1XkEyXkFqcGdeQXVyMTIxMDUyOTI@._V1_SX300.jpg","Metascore":"N/A","imdbRating":"7.5","imdbVotes":"1,544","imdbID":"tt0444778","Type":"movie","Response":"True"}

what can i do to write this json in file in correct format ?

~note that "encoding='utf-8'" added to file i/o's because of this error :

'charmap' codec can't encode character '\xf3' in position 3152: character maps to <undefined>
Jongware
  • 22,200
  • 8
  • 54
  • 100
Hossein
  • 77
  • 1
  • 1
  • 11
  • 2
    I guess the particular CSV dialect you're using requires escaping quotation marks with two quotation marks. Think about it, how would a CSV parser read the resulting CSV file? – roeland Jan 04 '17 at 03:57
  • @roeland i don't know that :( – Hossein Jan 04 '17 at 04:03
  • Try to read that file again with the CSV parser module, you should get the original string back. As an alternative you could write your data file entirely as a JSON file, instead of wrapping JSON in CSV, this will be more straightforward to parse later. – roeland Jan 04 '17 at 19:54

2 Answers2

0

problem solved with this codes :

import urllib.request
import csv
import datetime
from collections import defaultdict
from urllib import response
i=0
columns = defaultdict(list)

with open('C:\dataset\dataset.dat',encoding='utf-8') as f:
    reader = csv.DictReader(f) 
    for row in reader: 
        for (k,v) in row.items():  
            columns[k].append(v) 
with open('C:\dataset\dataset.dat','r',encoding='utf-8') as csvinput:
    f_writ = open('C:\dataset\dataset_edited.csv', 'w',encoding='utf-8')
    csvReader = csv.reader(csvinput)
    writer = csv.writer(f_writ, delimiter=',',
                lineterminator='\r\n',
                quotechar = "'"
                )
    for row in csvReader:
        if row[0] == "user_id":
            writer.writerow(row+["movie_in_json_format"])
        else:
            moviejson=urllib.request.urlopen("http://www.omdbapi.com/?i=tt"+str(columns['item_id'][i])+"&y=&plot=short&r=json").read()
            moviejson=moviejson.decode('utf-8')
            writer.writerow(row+[moviejson])
            i=i+1
Hossein
  • 77
  • 1
  • 1
  • 11
  • thanks to this [link](http://stackoverflow.com/questions/25056881/write-csv-file-with-double-quotes-for-particular-column-not-working) – Hossein Jan 05 '17 at 01:37
-1

If nothing else works, forcefully strip the extra quotation marks:

writer.writerow([field.strip('"') for field in row+[movieJson]])
DYZ
  • 55,249
  • 10
  • 64
  • 93