18

I am desperately trying to download the Ta-Feng grocery dataset for few days but appears that all links are broken. I needed for data mining / machine learning research for my msc thesis. I also have the Microsoft grocery database, the Belgian store and Supermarket.arff from Weka. However in the research they say Ta Feng is largest and most interesting from all public available data sets.

http://recsyswiki.com/wiki/Grocery_shopping_datasets

I will be super thankful for any help :) Cheers!

Dragan
  • 500
  • 3
  • 11

3 Answers3

21

The person that down voted doesn't understand the difficulty to find this valuable piece of information for machine learning related to supermarket scenarios. It is the biggest publicly available dataset containing 4 months of shopping transactions of the Ta-Feng supermarket. I got it from Prof. Chun Nan who was very kind to send it to me because the servers of his previous institute in Taiwan were not supporting it anymore. Here is a link for everybody that needs it: https://sites.google.com/site/dataminingcourse2009/spring2016/annoucement2016/assignment3/D11-02.ZIP

Dragan
  • 500
  • 3
  • 11
  • I also added small ruby script converting the file into the WEKA readable .arff file as well as the file itself – Dragan Aug 27 '14 at 09:06
  • 1
    Dear Dragan, first, thank you very much, this is very valuable information. I'm trying to make sense of the data, to understand what each field means, but if I open it on a text editor I can see the data itself but not the headers... would you have this information available as well? Thanks a lot! – Chicoscience Sep 19 '14 at 13:37
18

If anyone who uses this "Ta Feng" data set will encounter a major problem when it comes to the column names. So I thought of sharing this. Hope this would help someone immensely.

It contains these files

D11: Transaction data collected in November, 2000

D12: Transaction data collected in December, 2000

D01: Transaction data collected in January, 2001

D02: Transaction data collected in February, 2001

Format of Transaction Data

First line: Column definition in Traditional Chinese

Second line and the rest: data columns separated by ";"

Column definition

Transaction date and time (time invalid and useless)

Customer ID

Age: 10 possible values,

A <25,B 25-29,C 30-34,D 35-39,E 40-44,F 45-49,G 50-54,H 55-59,I 60-64,J >65

Residence Area: 8 possible values, A-F: zipcode area: 105,106,110,114,115,221,G: others, H: Unknown Distance to store, from the closest: 115,221,114,105,106,110

Product subclass

Product ID

Amount

Asset

Sales price

Du-Lacoste
  • 11,530
  • 2
  • 71
  • 51
2

The dropbox link seems to be broken. You can still download the dataset at the following link:

https://sites.google.com/site/dataminingcourse2009/spring2016/annoucement2016/assignment3/D11-02.ZIP

Jordi Colomer
  • 126
  • 1
  • 2