0

When I export public weather data from https://www1.ncdc.noaa.gov/pub/data/uscrn/products/subhourly01/2017/CRNS0101-05-2017-TX_Austin_33_NW.txt, as soon as solar radiation > 9, all of my data for the remaining columnsĀ gets lumped into a single column, as shown below. I have tried uploading as txt and csv and the problem still exists in excel, sheets, and dataprep.

Why is this happening?

Is there a programmatic way to fix this so that the data populates as intended, with 1 value per column?

CSV columns

atltbone
  • 285
  • 2
  • 4
  • 11

1 Answers1

1

It is likely because the initial data structure is not detected correctly. This can happen if the first rows of your dataset have a different structure than the remaining rows.

To solve this problem in Dataprep, you can indicate how the dataset should be structured by following these steps:

  1. Go to the flow view
  2. Right click on the dataset and choose "remove structure..." remove structure...
  3. Open the recipe
  4. Insert a split row step:
    • splitrows col: column1 on: '\n'
  5. Split the column using a whitespace regex (for e.g., /\s+/)
    • splitpatterns col: column1 type: on on: /\s+/ limit: 22

(you can copy and paste the following command inside the search input when you create a new step)

Here is what you should get: recipe

Note: it is also possible to prevent the initial structure detection when importing a dataset. See https://cloud.google.com/dataprep/docs/html/Remove-Initial-Structure_136154971

detect initial structure - unchecked

Hugues
  • 328
  • 2
  • 6