I am crawling data from internet and sometimes the url connection is terminated which I can't control. In order to not crawl data that's already obtained, I I have a cache that marks what's has been crawled. The resulting data is stored in a CSV. The first time when I started the program, it writes the csv header first and then its content, like the following:
with open(outputfile, 'a' encoding='utf-8') as f:
writer = csv.DictWriter(f, fieldnames=fieldnames)
writer = writeheader()
for item in items:
...
I am using the write 'a' mode to incrementally write content into the csv file. The first time it's ok when the header it's written first. The problem occurs when the program restarts because it writes the csv header again due to:
writer = writeheader()
Is there a way to know that a csv file already has a header when this codes executes:
with open(outputfile, 'a' encoding='utf-8') as f:
I don't need to write the header multiple times even though I restart the program.
EDIT: I just accepted the above similar question's answer as this question's answer. I thought it should work. However, when I tested, it doesn't work as below:
filename = '../1.csv'
with open(filename, 'a') as f:
headers = ['a']
writer = csv.DictWriter(f, fieldnames=headers)
if not os.path.isfile(filename):
writer.writeheader()
Even if I change the '1.csv' as anything, it always determines the file is existing. Why doesn't eh 'isfile' function not working here?