I have the following data frame df
import pandas as pd
from datasets import Dataset
data = [[1, 'Jack', 'A'], [1, 'Jamie', 'A'], [1, 'Mo', 'B'], [1, 'Tammy', 'A'], [2, 'JJ', 'A'], [2, 'Perry', 'C']]
df = pd.DataFrame(data, columns=['id', 'name', 'class'])
> df
id name class
0 1 Jack A
1 1 Jamie A
2 1 Mo B
3 1 Tammy A
4 2 JJ A
5 2 Perry C
I would like to covert it to a Dataset object that has 2 rows, one per id
. The desired output is
> myDataset
Dataset({
features: ['id', 'name', 'class'],
num_rows: 2
})
where
> myDataset[0:2]
{'id': ['1', '2'], 'name': [['Jack', 'Jamie', 'Mo', 'Tammy'],['JJ', 'Perry']], 'class': [['A', 'A', 'B', 'A'], ['A', 'C']]}
Based on the documentation here, I tried the following but that gave me a Dataset with 6 rows, instead of one with 2 rows and grouped by the column id
myDataset = Dataset.from_pandas(df)
> myDataset
Dataset({
features: ['id', 'name', 'class'],
num_rows: 6
})
> myDataste[0:2]
{'id': [1, 1], 'name': ['Jack', 'Jamie'], 'class': ['A', 'A']}