How to use Scikit for mapping words to indexes starting from index=1

Question

Is there a way to use Scikit-learn to map words to indexes where it starts from 1 and not from 0?

Example - Pseudo code:

sequence = ['welcome', 'home', 'shimon']
dict = mapping_func(sequence)

print(dict['welcome'])
print(dict['home'])
print(dict['shimon'])

While the output of this code is:

1

2

3

I need this option in order to zero padding sequences and if the value 0 belongs to a key it might (and probably will) lead to faulty results.

I don't see why it will lead to faulty results? Python arrays, lists, etc. are 0 index. If you explain more, we can show you such that the 0 index will not be an issue. — ilyas patanam, Dec 11 '15 at 00:33

score 0 · Answer 1 · answered Dec 11 '15 at 00:38

If you have a list of words such as sequence = ['welcome', 'home', 'shimon'] and you pad it with 0s you will have sequence= ['welcome', 'home', 'shimon', 0, 0]. Then you can always do l.index(welcome) to retrieve the index. If you are interested in cases where a word will have more than one index, you can use list comprehension.

>>>sequence= ['welcome', 'home', 'shimon', 0, 0]
>>>indices = [i for i, x in enumerate(sequence) if x == 0]
>>>indices
[3,4]
>>>indices = [i for i, x in enumerate(sequence) if x == 'welcome']
>>>indices
[0]

I know how to do it in general. I was asking about doing it with Scikit, especially with CountVectorizer — Lior Magen, Dec 13 '15 at 07:20

How to use Scikit for mapping words to indexes starting from index=1

1 Answers1