0

I upload the dataset into the storage of google cloud ai. Next, I open the flow in dataprep and put there the dataset. When I made the first recipe (without any step already) the dataset has approximately half of its original rows, that is, 36 234 instead of 62 948.

I would like to know what could be causing this problem. Some missing configuration?

Thank you very much in advance

Ana
  • 31
  • 1

1 Answers1

0

Here are a couple thoughts . . .


Data Sampling

Keep in mind that what's shown in the Dataprep editor is typically a sample of the data, not the full data (unless its very small). If the full file was small enough to load, you should see the "Full Data" label up where the sample is typically shown:

Google Cloud Dataprep navigation indicating that the full file has been loaded

In other cases, what you're actually looking at is a sample, which will also be indicated:

Google Cloud Dataprep navigation indicating that the current dataset has been sampled

It's very beneficial to have an idea of how Dataprep's sampling works if you haven't reviewed the documentation already: https://cloud.google.com/dataprep/docs/html/Overview-of-Sampling_90112099


Compressed Sources:

Another issue I've noticed occasionally is when loading compresses CSVs. In this case, I've had the interface tell me that I'm looking at the "Full Data"—but the number of rows is incorrect. However, any time this has happened the job does actually process the full number of rows.

justbeez
  • 1,367
  • 7
  • 12