Working of my program: 1)from a file test.txt i search lines containing the word " साधु ". 2)After the searching for the line, I extract words adjacent to its Right and Left. 3)After appending these words to an array, I try to find the Intersecting words in those two arrays.
Asked
Active
Viewed 327 times
-1
-
is this what you want `{' ', '\xa4', '\xa5', '\xe0'}` – The6thSense Aug 27 '15 at 12:44
-
1intersection is `set(array1) & set(array2)` even with Hindi word-arrays, unless I'm missing something..? – thebjorn Aug 27 '15 at 12:46
1 Answers
2
You can decode your string to unicode with following code
mylist = map(lambda word: word.decode('utf-8'), mylist)
Though for intersection purposes, you don't need to decode it. You can just do
#considering you have two lists 'list1' and 'list2'
intersection = set(list1).intersection(set(list2))

hspandher
- 15,934
- 2
- 32
- 45
-
Sorry that doesnt work for me. Can you show me how do i find intersection between these two arrays? - list1 = [1,2,3,4, ' साधु ', ' बालक '] list2 = [1,3,5,6, ' साधु ', ' बालक '] – Aug 27 '15 at 13:10
-
It is working for example you are mentioning, only you might be getting result as byte string instead of unicode object – hspandher Aug 27 '15 at 13:15
-
-
-
@Dmitry.Samborskyi, the intersection works and it gives me result in byte string. How do i get the actual string? – Aug 27 '15 at 13:22
-
@vashi Python 2x does not support unicode by default. So, if it is not a problem use python3, the solution would work perfectly – hspandher Aug 27 '15 at 13:30
-
Otherwise you need to mess with decoding byte string to unicode object – hspandher Aug 27 '15 at 13:36
-
-
Use codecs library to open file with utf-8 format and then use the solution in described – hspandher Aug 27 '15 at 14:56