CSV file not recognized as csv, reason nominal value not declared in header

Question

I am trying to load a dataset in weka, I have tried many solutions such as arff format, comas etc. but it was all a failure. Could any of you give me a working solution or load this dataset according to the format.

Here is a link to dataset

score 1 · Answer 1 · answered Jan 27 '22 at 01:20

Instead of using Weka's functionality for reading CSV files, you could use ADAMS (developed at the same university; I'm the lead developer) instead.

Download the adams-ml-app snapshot and then use the Weka Investigator to load/save the file:

Load it as ADAMS Spreadsheets (.csv, .csv.gz)
Save it as Arff data files (.arff, .arff.gz) or Simple ARFF data files (.arff, .arff.gz)

The Reviews column contains an erroneous 3.0M, which prevents it from becoming numeric.

If you want to have an introduction to the Weka Investigator, then take a look at my talk from the Weka User Conference 2021: Taking Weka to the next level with ADAMS .

score 0 · Answer 2 · answered Jan 25 '22 at 18:30

There are too many issues with lines in this file. In line 23, I eliminated the odd looking brackets. I removed all single quotes (') I eliminated all repeated double quotes ("") In line 10474 the first two fields (before the number) didn't seem to be separated, so I added a comma. This allowed the file to go through initial screening, but...

The file contains a lot of odd emojis. I started to eliminate them one by one, but there are clearly more of these than I wish to deal with. Each time I got rid of one, it would read farther into the file, then stop at the next one.

If I just try to read the top of the file, the first 20 lines before we get to any of these problems, it reads fine.

My partial editing can be found here: https://www.dropbox.com/s/ij707mb23dt1jvz/googleplaystore3.csv?dl=0 I think if you clear up the remaining emojis the file should be usable.

CSV file not recognized as csv, reason nominal value not declared in header

2 Answers2