1

I have csv containing 1.5 milion rows. I prepared Dataprep job that parse data and store them to BQ (or CSV). But after processing I have nearly half of rows missing (around 700k). When I run this Dataprep job without any recipe steps I got the same wrong number of rows.

I did analysis and data in input CSV looks correct. I filtered some subset of data that are missing in output and this small subset is imported correctly.

Isn't there something like sampling of data in output? What can cause my rows are lost.

y0j0
  • 3,369
  • 5
  • 31
  • 52
  • I would suggest to take a look for following documentation: https://cloud.google.com/dataprep/docs/html/Find-Missing-Data_57344564 where you can find information about finding missing data. Additionally, please review and download the results from completed job https://cloud.google.com/dataprep/docs/html/Job-Details-Page_57344846 . Have you encountered this behavior before? Please let me know if you find something meaningful. – aga Jun 25 '20 at 12:23
  • You can try filter bases sampling. https://cloud.google.com/dataprep/docs/html/Overview-of-Sampling_90112099#filter-based-samples All you need to do is collect this sample after applying recipes and look for the values that are missing. This will tell you if values are missing even before writing to final output or after writing – Prabhakar Reddy Jul 24 '20 at 02:58

0 Answers0