0

Is there a way to use Scikit-learn to map words to indexes where it starts from 1 and not from 0?

Example - Pseudo code:

sequence = ['welcome', 'home', 'shimon']
dict = mapping_func(sequence)

print(dict['welcome'])
print(dict['home'])
print(dict['shimon'])

While the output of this code is:

1

2

3

I need this option in order to zero padding sequences and if the value 0 belongs to a key it might (and probably will) lead to faulty results.

Lior Magen
  • 1,533
  • 2
  • 15
  • 33
  • I don't see why it will lead to faulty results? Python arrays, lists, etc. are 0 index. If you explain more, we can show you such that the 0 index will not be an issue. – ilyas patanam Dec 11 '15 at 00:33

1 Answers1

0

If you have a list of words such as sequence = ['welcome', 'home', 'shimon'] and you pad it with 0s you will have sequence= ['welcome', 'home', 'shimon', 0, 0]. Then you can always do l.index(welcome) to retrieve the index. If you are interested in cases where a word will have more than one index, you can use list comprehension.

>>>sequence= ['welcome', 'home', 'shimon', 0, 0]
>>>indices = [i for i, x in enumerate(sequence) if x == 0]
>>>indices
[3,4]
>>>indices = [i for i, x in enumerate(sequence) if x == 'welcome']
>>>indices
[0]
ilyas patanam
  • 5,116
  • 2
  • 29
  • 33