-2

I want to create a dataset with 300 features and instances which are combinations of 0 or 1(boolean).I have to specify the 1's using some id's.How can I do it with python. for eg: one instance should be like the columns 4,45,213,6,48 should be 1 and the combinations of those id's

najmath
  • 261
  • 1
  • 3
  • 16

1 Answers1

0

Hope this is not too late and that I understood your question correctly.
There are 2 main items you are asking for:
1. Generate a two dimensional, 300 feature Boolean sample set of size 300 * n.
2. Generate a dependent variable that will list the features of success for each observation (row)

Here is my approach:

#%% Imports
# Data manipulation
import numpy as np
import pandas as pd

import pprint # Print a nice output
PP = pprint.PrettyPrinter(indent=4)

#%% List columns
def list_true_columns(x):
    result = []
    for i in range(0,len(x)):
        if x[i] == 1:
            result += [i]
    return result

column_amount = 300
row_amount = 1000

#%% Sample dataset
dataset = pd.DataFrame(np.random.binomial(n=1, p=0.5, size = (row_amount, column_amount)))
# Based on the sample, calculate dependent variable 
dataset['dependent'] = dataset.apply(list_true_columns, axis = 1)
PP.pprint(dataset.head)

Here is the head of the sample:

    0   1   2   3   4   5   6   7   8   9   ... 291 292 293 294 295 296 297 298 299
0   0   1   1   0   1   1   1   0   1   0   ... 1   1   0   0   0   0   0   1   1
1   1   1   0   0   0   1   0   1   1   0   ... 0   1   1   1   0   1   1   0   1
2   0   1   0   0   1   1   0   1   0   0   ... 0   1   0   1   0   0   1   1   0
3   0   1   0   1   0   0   1   1   1   0   ... 0   0   0   0   0   1   1   0   0
4   1   0   1   1   0   0   0   0   1   0   ... 1   1   1   0   0   0   1   0   1
5   0   0   1   1   1   1   0   1   0   0   ... 1   1   0   1   0   1   1   1   0
..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ... ... ... ... ... ... ... ... ... ...
994 1   1   0   1   1   0   1   1   0   1   ... 0   0   0   1   0   0   1   0   0
995 1   0   1   0   0   0   0   1   0   0   ... 1   1   0   0   0   0   1   0   1
996 1   0   1   0   1   0   0   0   0   1   ... 1   1   0   0   0   1   1   0   1
997 0   0   0   1   0   1   1   0   0   0   ... 1   0   1   1   0   0   0   1   0
998 0   0   0   0   0   1   1   1   1   0   ... 1   0   0   0   1   1   1   1   0
999 0   0   1   0   0   0   1   1   1   1   ... 1   0   0   1   1   1   1   1   1

Here is the head of the dependent variable:

                                            dependent  
0    [1, 2, 4, 5, 6, 8, 11, 15, 17, 18, 19, 20, 21,...  
1    [0, 1, 5, 7, 8, 12, 15, 16, 17, 18, 19, 20, 24...  
2    [1, 4, 5, 7, 11, 12, 15, 16, 18, 26, 27, 28, 2...  
3    [1, 3, 6, 7, 8, 11, 12, 15, 16, 23, 25, 27, 28...  
4    [0, 2, 3, 8, 13, 16, 18, 19, 20, 21, 22, 28, 2...  
5    [2, 3, 4, 5, 7, 10, 11, 12, 13, 14, 15, 21, 24...  
..                                                 ...   
994  [0, 1, 3, 4, 6, 7, 9, 10, 11, 15, 17, 20, 21, ...  
995  [0, 2, 7, 12, 13, 14, 15, 16, 17, 19, 22, 23, ...  
996  [0, 2, 4, 9, 11, 13, 16, 17, 18, 20, 21, 23, 2...  
997  [3, 5, 6, 11, 14, 20, 21, 22, 24, 28, 30, 35, ...  
998  [5, 6, 7, 8, 13, 17, 19, 20, 22, 23, 24, 28, 3...  
999  [2, 6, 7, 8, 9, 14, 17, 18, 19, 20, 21, 22, 23...
AChervony
  • 663
  • 1
  • 10
  • 15