0

I have a txt file which is tab separated. Few of the columns have date data in the format of

"Dec-2011", "Jan-1994"

etc the date ranges from "Jan-1944 to Dec-2015"

Problem is the in original data the date format is "Jan-1994" or the year is in YYYY format but when i read the data using Spyder (3.7 or 4.0.1) or Jupyter or Notepad++, everywhere the Date is read as

"Jan-94", "Dec-11" etc

I am using the following code:

import pandas as pd
Raw_Data=pd.read_csv("XYZCorp_LendingData.txt", encoding="Latin-1", sep ='\t', low_memory=False)

The Output should be:- this output is in the office system and THIS is the correct format

Raw_Data["issue_d"].head()
Out[5]: 
0    Dec-2011
1    Dec-2011
2    Dec-2011
3    Dec-2011
4    Dec-2011
Name: issue_d, dtype: object

BUT

I am getting output as below because somehow my system is reading in YY format instead of YYYY format. this output is in my laptop which is incorrect

Raw_Data["issue_d"].head()
Out[6]: 
0    Dec-11
1    Dec-11
2    Dec-11
3    Dec-11
4    Dec-11
Name: issue_d, dtype: object

This date issue is leading to multiple errors

Also this issue is not only related to Spyder but I am facing the issue with other applications also like Notepad++ and Jupyter

**Also note I made sure I cross-validated the Date and Regional settings in both the systems, both are almost same

Jason
  • 4,346
  • 10
  • 49
  • 75
Rupanjan Nayak
  • 136
  • 1
  • 7
  • have you tried opening the file with pure python? are you sure it's a 4 digit year? – Umar.H Feb 25 '20 at 15:35
  • Yes I have tried that also. Worst part is the same set of code with same set of data when I copy paste on some other colleagues laptop it run perfectly. All the libraries are updated and system almost same – Rupanjan Nayak Feb 25 '20 at 18:17
  • It's most likely local to your machine then and not a replicable issue. – Umar.H Feb 25 '20 at 18:33
  • I am doubting system Regional and Time setting but not able to figure our what? Any suggestions – Rupanjan Nayak Feb 26 '20 at 04:59

1 Answers1

0

Are you sure that the data is being read incorrectly? You can check it by writing the df to new file checking the new file.

import pandas as pd

Raw_Data=pd.read_csv("XYZCorp_LendingData.txt", encoding="Latin-1", sep ='\t', low_memory=False)
Raw_Data.to_csv('Date_check.csv')

My guess would be the data is being read correctly but is being displayed according based on some configuration preferences that you have. If the dates aren't been read correctly you can specify how dates should be parsed by pandas. Using the date_parser optional argument in pd.read_csv(), eg:

import pandas as pd

parser = pd.to_datetime
df = pd.read_csv('file_name', parse_dates=[col_num_containing_dates], date_parser=parser)

Where parser is the name of function you'd like to use (don't include .() as this will call the function) and col_num_containing_dates is the number(s) of the column(s) containing the dates to be parsed.

If you dates are being read correctly by you'd like to change how the dates are displayed you can set/change your system's locale or specific the format which you'd like explicitly using date.strftime(format) (link to docs).

from datetime import datetime

test_string = '13-02-2019'
test_date = datetime.strptime(test_string, '%d-%m-%Y') # Create a datetime object
test_date.strftime('%d-%m-%y')

# Output: '13-02-19'

test_date.strftime('%d-%m-%Y') # Note the capital 'Y'
# Output: '13-02-2019'

Finally if you'd like to change how date are written when exporting a .csv you can specific that an option to df.to_csv(date_format='%d-%m-%Y').

Jason
  • 4,346
  • 10
  • 49
  • 75