-1

How to extract feature from dataset by python like :

enter image description here

I find two ways to slove this problem. 1) One is:

enter image description here

But enter image description here So it is not a good way.

2) Another is :

Search C and D column to find topK items, and only keep the topK. But it will lead to the information loss.

Is there a better way to solve this problem?

Anna
  • 261
  • 1
  • 2
  • 12
  • What do you mean by _index of C\D(m, n)_? – AMC Nov 25 '19 at 07:20
  • if you want to access values in values(for.ex list/dict) then use sub-indexing means if it is list use columnname[list_index][element_index],if it is dictionary use use columnnmae[dict_key] or someting like that – SRG Nov 25 '19 at 07:23

1 Answers1

0

I guess I understand your question. I am listing an approach that you can follow without any sparsity or information loss.

  1. Let's say your column C varies from c1 to c4 and you create a binary vector of c1 to c4 as you already did.
  2. Then convert the binary vector in decimal and use it as a feature. (For eg. 1,1,0,0, --> 0*2^0 + 0*2^1 + 1*2^2 + 1*2^3).
  3. Take forward the same approach to D, but I would suggest you to create two features. One like step 2 without making use of the values of D and another using the values of D while taking a decimal conversion and then decide to retain them based on the correlation between the two features.
Dr Sudeep Ghosh
  • 399
  • 3
  • 3
  • Thank you. Convert the binary vector in decimal has some problem if the result is bigger than max(decimal) or max(hex). – Anna Nov 25 '19 at 08:14
  • Transfer D to D-key、D-value 、D-cor requires value is uniq. But the sub-value of list D maybe repeated, such as [{'d1':'d1_value'},{'d2':'d1_value'}]. – Anna Nov 25 '19 at 08:31