Intersection of two Unicode Arrays in Python

Question

Working of my program: 1)from a file test.txt i search lines containing the word " साधु ". 2)After the searching for the line, I extract words adjacent to its Right and Left. 3)After appending these words to an array, I try to find the Intersecting words in those two arrays.

intersection is `set(array1) & set(array2)` even with Hindi word-arrays, unless I'm missing something..? — thebjorn, Aug 27 '15 at 12:46

score 2 · Accepted Answer · answered Aug 27 '15 at 12:48

2

You can decode your string to unicode with following code

mylist = map(lambda word: word.decode('utf-8'), mylist)

Though for intersection purposes, you don't need to decode it. You can just do

#considering you have two lists 'list1' and 'list2'

intersection = set(list1).intersection(set(list2))

answered Aug 27 '15 at 12:48

hspandher

15,934
2
32
45

Sorry that doesnt work for me. Can you show me how do i find intersection between these two arrays? - list1 = [1,2,3,4, ' साधु ', ' बालक '] list2 = [1,3,5,6, ' साधु ', ' बालक '] – Aug 27 '15 at 13:10
It is working for example you are mentioning, only you might be getting result as byte string instead of unicode object – hspandher Aug 27 '15 at 13:15
@vashi result = set(list1).intersection(set(list2)) – Dmitry.Samborskyi Aug 27 '15 at 13:16
@hspandher, how do i get result in actual string (unicode object)? – Aug 27 '15 at 13:18
@Dmitry.Samborskyi, the intersection works and it gives me result in byte string. How do i get the actual string? – Aug 27 '15 at 13:22
@vashi Python 2x does not support unicode by default. So, if it is not a problem use python3, the solution would work perfectly – hspandher Aug 27 '15 at 13:30
Otherwise you need to mess with decoding byte string to unicode object – hspandher Aug 27 '15 at 13:36
@hspandher, can you tell me how to do that? I have added my code above. – Aug 27 '15 at 13:46
Use codecs library to open file with utf-8 format and then use the solution in described – hspandher Aug 27 '15 at 14:56

Intersection of two Unicode Arrays in Python

1 Answers1