0

I'm using the NOAA weather dataset to build a machine learning model to predict weather data. Python cannot read in this data as there are: a.) commas in the fields, and b.) different numbers of commas between each field.

Here are the headers and the first line: "STATION","DATE","SOURCE","REPORT_TYPE","CALL_SIGN","QUALITY_CONTROL","AA1","AJ1","AL1","CIG","DEW","GA1","KA1","MA1","MF1","OC1","RH1","SLP","TMP","VIS","WND"

"72503014732","2022-01-01T00:00:00","4","FM-12","99999","V020",,,,"99999,9,9,N","+0078,1","99,9,+00450,1,99,9","120,M,+0128,1","99999,9,10129,1",,,,"10141,1","+0106,1","016000,1,9,9","160,1,N,0046,1"

When I open this on excel, this is how it looks:

Image of rendered data on excel sheet

enter image description here

I have tried regex, I've tried setting the delimiter to ",", but it still doesn't work

blackraven
  • 5,284
  • 7
  • 19
  • 45
  • Please post a print out of the error message. Also, it might just be a typo in your question, but are you writing the function as "pd.read_csv" with the underscore? https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html – Juancheeto Aug 19 '22 at 12:01
  • 1
    yup it was a typo, I wrote pd.read.csv – Jenna Rink Aug 19 '22 at 12:13
  • perhaps you need to provide a larger dataset, or a representative dataset – blackraven Aug 19 '22 at 13:10

1 Answers1

0

As your fields are quoted the commas are not an issue for pandas:

df = pd.read_csv('yourfile.csv', sep=',')

output:

       STATION                 DATE  SOURCE REPORT_TYPE  CALL_SIGN  \
0  72503014732  2022-01-01T00:00:00       4       FM-12      99999   

  QUALITY_CONTROL  AA1  AJ1  AL1          CIG  ...                 GA1  \
0            V020  NaN  NaN  NaN  99999,9,9,N  ...  99,9,+00450,1,99,9   

             KA1              MA1 MF1  OC1  RH1      SLP      TMP  \
0  120,M,+0128,1  99999,9,10129,1 NaN  NaN  NaN  10141,1  +0106,1   

            VIS             WND  
0  016000,1,9,9  160,1,N,0046,1  

[1 rows x 21 columns]
mozway
  • 194,879
  • 13
  • 39
  • 75
  • It's not working, I used the same exact code as you just now, and got this error:ParserError: Error tokenizing data. C error: Expected 1 fields in line 10, saw 21 – Jenna Rink Aug 19 '22 at 12:18
  • @Jenna then your example is not representative, please provide line 10 – mozway Aug 19 '22 at 12:18