Calculating the frequency of tweets containing a specific word in a single year

Question

I am trying to calculate the number of tweets of a single word for a single year while writing down each day and its number of tweets and store than to store it in CSV file with "Date" and "Frequency." This is my code, but I keep getting an error after running for some time.

import pandas as pd
import twint
import nest_asyncio
from datetime import datetime,timedelta


bugun = '2020-01-01'
yarin = '2020-01-02'

df = pd.DataFrame(columns=("Data","Frequency")) 




for i in range(365):
    
    file = open("Test.csv","w")
    file.close()
    
    bugun = (datetime.strptime(bugun, '%Y-%m-%d') + timedelta(days=1)).strftime('%Y-%m-%d')

    yarin =(datetime.strptime(yarin, '%Y-%m-%d') + timedelta(days=1)).strftime('%Y-%m-%d')

    nest_asyncio.apply()
    
    c = twint.Config()
    c.Search = "Chainlink"

    #c.Hide_output=True
    c.Since= bugun
    c.Until= yarin

    c.Store_csv = True
    c.Output = "Test.csv"
    c.Count = True 

    twint.run.Search(c)


    data = pd.read_csv("Test.csv")
    frequency = str(len(data))
    
    #d = {"Data": [bugun], "Frequency": [frequency]}

    #d_f = pd.DataFrame(data=d)
    
    #df = df.append(d_f, ignore_index=True)
    

    df.loc[i] = [bugun] + [frequency]
    df.to_csv (r'C:\Users\serap\Desktop\CRYPTO 100\Chainlink.csv',index = False, header=False)

and the error I get is this

  File "C:\Users\serap\Desktop\CRYPTO 100\CODES\Binance_Coin\Binance Coin.py", line 47, in <module>
    data = pd.read_csv("Test.csv")

  File "C:\Users\serap\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\io\parsers.py", line 605, in read_csv
    return _read(filepath_or_buffer, kwds)

  File "C:\Users\serap\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\io\parsers.py", line 457, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)

  File "C:\Users\serap\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\io\parsers.py", line 814, in __init__
    self._engine = self._make_engine(self.engine)

  File "C:\Users\serap\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\io\parsers.py", line 1045, in _make_engine
    return mapping[engine](self.f, **self.options)  # type: ignore[call-arg]

  File "C:\Users\serap\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\io\parsers.py", line 1893, in __init__
    self._reader = parsers.TextReader(self.handles.handle, **kwds)

  File "pandas\_libs\parsers.pyx", line 521, in pandas._libs.parsers.TextReader.__cinit__

EmptyDataError: No columns to parse from file

Thank you for the help :)

Can you rephrase the description as well, it's still unclear what you are trying. What's *input*, what's *output*, and *how do you think* the one is related to the other? — Wolf, Jan 27 '21 at 08:27
Did you get any further with this? I tried `twint` but didn't manage to get anything working. Could it be that the developers of this package have lost touch with twitter? — Wolf, Jan 28 '21 at 13:08

Wolf · Answer 1 · 2021-01-27T09:36:41.050

1

After reading a tutorial How to Scrape Tweets from Twitter with Python Twint | by Andika Pratama | Analytics Vidhya | Medium, I think you better let Twint do the iteration:

c = twint.Config()
c.Search = "Chainlink"
c.Since = "2020–01–01"
c.Until = "2021–01–01"
c.Store_csv = True
c.Output = "Test.csv"
c.Count = True 
twint.run.Search(c)

Now you may loop over the CSV output:

data = pd.read_csv("Test.csv")
# ...

Until now, I didn't find this detail about CSV output documented, but the twint source code (master/twint/storage/write.py (line 58 ff)) tells, that for CSV the output is appended if the file already exists. So you may have to truncate it or delete an existing file before. A valid option for this could be

open(`Test.csv`, 'w').close()

... which is basically the same you do but without introducing another variable.

edited Jan 27 '21 at 09:36

answered Jan 27 '21 at 08:16

Wolf

9,679
7
62
108

The main point of test.csv is to store that days data and open/close is to wipe the data inside it, so it can be used for the next day without overlapping with the days before – Cekcenk Jan 27 '21 at 08:38
What would be a better way of doing this? – Cekcenk Jan 27 '21 at 08:45
See updated answer. Do you maybe have a link to the documentation part I was not able to find (concerning *append* vs. *rewrite* for CSV output)? – Wolf Jan 27 '21 at 09:59

Calculating the frequency of tweets containing a specific word in a single year

1 Answers1