0

I am getting some error in loading my files onto big sheets both directly from the HDFS( files that are output of pig scripts) and also raw data that is lying on the local hard disk. I have observed that whenever I am loading the files and issuing a row count to see if all data is loaded into bigsheets, then I see lesses number of rows being loaded. I have checked that the files are consistent and proper delimeters(/t or comma separated fields). Size of my file is around 2GB and I have used either of the format *.csv/ *.tsv.

Also in some cases when i have tired to load a file from windows os directly then the files sometimes load successfully with row count matching with actual number of lines in the data, and then sometimes with lesser number of rowcount.

Even sometimes when a fresh file being used 1st time it gives the correct result but if I do the same operation next time some rows are missing.

Kindly share your experience your bigsheets, solution to any such problems where the entire data is not being loaded etc. Thanks in advance

CodeReaper
  • 377
  • 4
  • 18

1 Answers1

0

The data that you originally load into BigSheets is only a subset. You have to run the sheet to get it on the full dataset.

http://www-01.ibm.com/support/knowledgecenter/SSPT3X_3.0.0/com.ibm.swg.im.infosphere.biginsights.analyze.doc/doc/t0057547.html?lang=en

  • Yes I know that its a simulated version of the entire data set, the thing is even running an entire row count wont read all the data. – CodeReaper Jan 23 '15 at 12:17