0

I am not sure if I am making this question correctly but here's my issue:

I have a .csv file (InjectionWells.csv) that I need to split into columns based on commas. When I do it, it just doesn't work and I can only think might be an encoding but I don't know how to fix it. Can someone shed a light?

Here are few lines of the actual file:

API#,Operator,Operator ID,WellType,WellName,WellNumber,OrderNumbers,Approval Date,County,Sec,Twp,Rng,QQQQ,LAT,LONG,PSI,BBLS,ZONE,,,

3500300026,PHOENIX PETROCORP INC,19499,2R,SE EUREKA UNIT-TUCKER #1,21,133856,9/6/1977,ALFALFA,13,28N,10W,C-SE SE,36.9003240,-98.2182600,"2,500",300,CHEROKEE,,,

3500300163,CHAMPLIN EXPLORATION INC,4030,2R,CHRISTENSEN,1,470258,11/27/2002,ALFALFA,21,28N,09W,C-NW NW,36.8966360,-98.1777200,"2,400","1,000",RED FORK,,,

3500320786,LINN OPERATING INC,22182,2R,NE CHEROKEE UNIT,85,329426,8/19/1988,ALFALFA,24,27N,11W,SE NE,36.8061130,-98.3258400,"1,050","1,000",RED FORK,,,

3500321074,SANDRIDGE EXPLORATION & PRODUCTION LLC,22281,2R,VELMA,2-19,281652,7/11/1985,ALFALFA,19,28N,10W,SW NE NE SW,36.8885890,-98.3185300,"3,152","1,000",RED FORK,,,

I have tried both of these and non of them work:

1.

import pandas as pd
df=pd.read_csv('InjectionWells.csv', sep=',')
print(df)
import pandas as pd
test_data2=pd.read_csv('InjectionWells.csv', sep=',', encoding='utf-8')
test_data2.head()

enter image description here

  • It might help to show a few lines of the actual csv, if it doesn't break confidentiality. – Joffan May 21 '21 at 14:24
  • 1
    I included the link to the CSV file, but you are right, perhaps showing some lines can be easier to understand, still, there are many columns and I felt was perhaps too much to include but now you can see it. – Ricardo Ortega May 21 '21 at 14:30
  • 1
    I'm not able to reproduce your problem using the file in the provided link. It loads fine here both with and without the `sep` parameter. It may be a problem with your environment. Are you perhaps accidentally trying to load a zipped file? What version of Pandas and Python are you using? – jrbergen May 21 '21 at 14:35
  • I was also able to load the CSV correctly using the data and code you provided. – Cargo23 May 21 '21 at 14:38
  • @jrbergen I am loading just the CSV file without being zipped, I am using Python 3.8 version, and pandas 1.1.3 – Ricardo Ortega May 21 '21 at 14:50

1 Answers1

1

As your CSV files contain some non-ASCII characters also, you need to pass a different encoding. UTF-8 can't handle that.

I tried this and it's working:

import pandas as pd
test_data2=pd.read_csv('InjectionWells.csv', sep=',', encoding='ISO-8859-1')
print(test_data2)

Result

halfer
  • 19,824
  • 17
  • 99
  • 186
Pratap Alok Raj
  • 1,098
  • 10
  • 19
  • This may be the problem, but without passing `encoding='utf-8'`, the file loads fine here, potentially with the entries containing non-utf8 compatible characters replaced by NaNs. But it does not explain his first example not loading the dataframe. – jrbergen May 21 '21 at 14:39
  • 1
    That's because when encoding is not passed while reading through pandas.read_csv(), the default encoding is passed as None, and the errors = "replace" which means that if pandas are not able to encode anything then it will simply put it as Nan and will not fail. Refer to this please:- https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html – Pratap Alok Raj May 21 '21 at 14:43
  • Hey, thanks for the help. I tried with your suggestion but it didn't solve the problem. I tried to download again the file from the original source and replace it with the old one and worked. Perhaps something I did with Python affected the file. I posted this time a link with the exact CSV file (https://drive.google.com/file/d/1F79iKrXQTLGIaiRhdl5GXtqPMpcqcoAP/view?usp=sharing) I was working on and having an error with to see if someone can clarify why I had this issue. – Ricardo Ortega May 21 '21 at 15:05
  • Let me try have a look – Pratap Alok Raj May 21 '21 at 15:28
  • @Alok Raj Exactly; that's what I was trying to say. Hence OP's example showing only `NaN`s did not make sense, as I didn't see that when loading the same file without passing an encoding. @Ricardo Ortega Glad it works now, maybe your file got corrupted whilst messing around with it indeed. – jrbergen May 21 '21 at 15:33
  • Got the difference between the file you shared through drive and the one we downloaded from the site. Each record has `";` added to the last of the record and `"` in the front of the record, and also has `""` in the file you shared. But now an interesting problem came up. So, I raised a similar question here:- https://stackoverflow.com/questions/67641311/pandas-csv-parser-not-working-properly-when-it-encounters – Pratap Alok Raj May 21 '21 at 17:37