I have a large csv file example below,
data = pd.read_csv('C:/Users/Ene_E/Desktop/Data/data.csv')
data.head()
name year value
0 Afghanistan 1800 603
1 Albania 1800 667
2 Algeria 1800 715
3 Andorra 1800 1200
4 Angola 1800 618
data.tail()
name year value
46508 Venezuela 2040 17600
46509 Vietnam 2040 12300
46510 Yemen 2040 3960
46511 Zambia 2040 6590
46512 Zimbabwe 2040 3210
my large CSV has over 200 countries and their data recorded annually from 1800 - 2040, my goal is to resample this data to monthly and interpolate value column as shown below, I have used Afghanistan, the year 1800 to illustrate how my desired end result,
Expected output:
name date value
Afghanistan Jan 1800 start_value
Afghanistan Feb 1800 .
Afghanistan Mar 1800 .
Afghanistan May 1800 .
Afghanistan Jun 1800 .
Afghanistan Jul 1800 .This column is interpolated smoothly
Afghanistan Aug 1800 .
Afghanistan Sep 1800 .
Afghanistan Oct 1800 .
Afghanistan Nov 1800 .
Afghanistan Dec 1800 603(end value in that year)
I want all my data resampled as in the above in python because this only way my model will work. NB: the dates should take above format.
I have tried severally without success,
data['year'] = pd.to_datetime(data.year, format='%Y')
head(data)
Error:
Traceback (most recent call last): File "<pyshell#12>", line 1, in <module>
head(data) NameError: name 'head' is not defined
data.head()
name year value
0 Afghanistan 1800-01-01 00:00:00 603
1 Albania 1800-01-01 00:00:00 667
2 Algeria 1800-01-01 00:00:00 715
3 Andorra 1800-01-01 00:00:00 1200
4 Angola 1800-01-01 00:00:00 618
data.resample('1M', how='interpolate')
Error:
Traceback (most recent call last):
File "<pyshell#14>", line 1, in <module>
data.resample('1M', how='interpolate')
File "C:\Python27\lib\site-packages\pandas\core\generic.py", line 8145, in resample
base=base, key=on, level=level)
File "C:\Python27\lib\site-packages\pandas\core\resample.py", line 1251, in resample
return tg._get_resampler(obj, kind=kind)
File "C:\Python27\lib\site-packages\pandas\core\resample.py", line 1381, in _get_resampler
"but got an instance of %r" % type(ax).__name__)
TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'RangeIndex'
data.groupby(name).resample('1M', how='interpolate')
Error:
Traceback (most recent call last):
File "<pyshell#15>", line 1, in <module>
data.groupby(name).resample('1M', how='interpolate')
NameError: name 'name' is not defined
Thoughts?