I am trying to retrive data every few hours and since the data will have a lot of duplicate data I will remove all the duplicate data which has the same date and keep the first instance. Here is my code to do this:
import pandas as pd
import datetime
import pandas as pd
from pandas_datareader import data as web
import matplotlib.pyplot as plt
from alpha_vantage.foreignexchange import ForeignExchange
import os
from os import path
from alpha_vantage.timeseries import TimeSeries
import matplotlib.pyplot as plt
import sys
while True:
if path.exists('stockdata.csv') == True:
data1 = pd.read_csv('stockdata.csv', sep= ' ', index_col=0)
ts = TimeSeries(key='1ORS1XLM1YK1GK9Y', output_format='pandas')
data, meta_data = ts.get_intraday(symbol = 'spy', interval='1min', outputsize='full')
data = data.rename(columns={ '1. open':'Open','2. high': 'High','3. low': 'Low', '4. close':'Close', '5. volume': 'Volume'})
data1 = data1.append(data)
data1 = data1.loc[data1.duplicated(keep='first'), :] # This should remove the rows which have duplicate dates
data1.to_csv('stockdata.csv', sep= ' ')
break
else:
data1 = pd.DataFrame()
data1.to_csv('stockdata.csv', sep= ' ')
data1 = data1.loc[data1.duplicated(keep='first'), :]
This should remove 1 instance of the the rows which have duplicate dates but if i run the code 2 times i still have double the rows.
As a example to demonstrate what i am trying to accomplish:
"2019-11-20 09:57:00" 311.405 311.44 311.4 311.41 42081.0
"2019-11-20 09:56:00" 311.38 311.425 311.33 311.41 63551.0
"2019-11-20 09:55:00" 311.34 311.4525 311.34 311.39 98011.0
"2019-11-20 09:53:00" 311.55 311.59 311.46 311.4832 125541.0
"2019-11-20 09:57:00" 311.405 311.44 311.4 311.41 42081.0
"2019-11-20 09:56:00" 311.38 311.425 311.33 311.41 63551.0
"2019-11-20 09:55:00" 311.34 311.4525 311.34 311.39 98011.0
"2019-11-20 09:54:00" 311.49 311.51 311.33 311.33 85857.0
"2019-11-20 09:53:00" 311.55 311.59 311.46 311.4832 125541.0
As you can see what is happening is data is being appended thus as in the example above it is being pasted 2 times. There will however be a new unique entry when new data is appended. So here i am trying to remove all duplicates to keep just new entries.
if i run data1.loc[data1.index.duplicated(keep = 'last'), :]
in a different cell i get :
All the entries are removed.