0

my question is to make a sample for a grouped dataframe in pandas. I just grouped a dataset using pd.groupby, and the grouped dataset is like this, each bikeid has several trips:

 bikeid  tripid     A     B     C
    0     1       a1    b1    c1
          2       a2    b2    c2
          3       a3    b3    c3
    1     4       a4    b4    c4
          5       a5    b5    c5
    2     6      ..............
          7      ..............
    3     8      ..............
          9      ..............

What I want to do is just build a sample pick up one bikeid from every 3 bikeids. Should be like:

bikeid  tripid  A     B     C
0        1     a1    b1    c1
         2     a2    b2    c2
         3     a3    b3    c3
3        8     a8    b8    c8
         9     a9    b9    c9
6        ..............
         ..............
9
...

However I tried to use grouped_new = grouped.sample(frac=0.3) it comes out a sample based on each single trip but not base on bikeid.

Can anyone help me out with this? Thank you so much!!!

  • HI friend. I don't fully follow your question. What do the columns `A`, `B`, and `C` refer to? What do the cell values (e.g., `a1`, `b2`, etc.) refer to? How do the columns and values relate to the sample dataset your shared in your question? – Keith Dowd Feb 22 '18 at 01:58
  • Please refrain from posting input data as an image. Provide a sample of your input data in your question (as text), and what you want your output data to look like (again, as text). – Plasma Feb 22 '18 at 10:21

1 Answers1

0

Assuming you have a MultiIndex then consider using df.reindex() as shown below:

# your original dataframe
df = pd.read_json('{"bikeid": {"0":0,"1":0,"2":0,"3":1,"4":1,"5":2,"6":2,"7":3,"8":3},"tripid": {"0":1,"1":2,"2":3,"3":4,"4":5,"5":6,"6":7,"7":8,"8":9},"A":{"0":"a1","1":"a2","2":"a3","3":"a4","4":"a5","5":"a6","6":"a7","7":"a8","8":"a9"},"B":{"0":"b1","1":"b2","2":"b3","3":"b4","4":"b5","5":"b6","6":"b7","7":"b8","8":"b9"},"C":{"0":"c1","1":"c2","2":"c3","3":"c4","4":"c5","5":"c6","6":"c7","7":"c8","8":"c9"}}')

df.set_index(['bikeid', 'tripid'], inplace=True)

# df looks like the following
                A   B   C
bikeid tripid            
0      1       a1  b1  c1
       2       a2  b2  c2
       3       a3  b3  c3
1      4       a4  b4  c4
       5       a5  b5  c5
2      6       a6  b6  c6
       7       a7  b7  c7
3      8       a8  b8  c8
       9       a9  b9  c9

# define the labels you want to get out of your indexing operation
index_labels = np.arange(0, 100, 3)

# do the indexing
df.reindex(index_labels, level='bikeid')

# returns the following
                A   B   C
bikeid tripid            
0      1       a1  b1  c1
       2       a2  b2  c2
       3       a3  b3  c3
3      8       a8  b8  c8
       9       a9  b9  c9
jeschwar
  • 1,286
  • 7
  • 10