-1

Working of my program: 1)from a file test.txt i search lines containing the word " साधु ". 2)After the searching for the line, I extract words adjacent to its Right and Left. 3)After appending these words to an array, I try to find the Intersecting words in those two arrays.

1 Answers1

2

You can decode your string to unicode with following code

mylist = map(lambda word: word.decode('utf-8'), mylist)

Though for intersection purposes, you don't need to decode it. You can just do

#considering you have two lists 'list1' and 'list2'

intersection = set(list1).intersection(set(list2))
hspandher
  • 15,934
  • 2
  • 32
  • 45
  • Sorry that doesnt work for me. Can you show me how do i find intersection between these two arrays? - list1 = [1,2,3,4, ' साधु ', ' बालक '] list2 = [1,3,5,6, ' साधु ', ' बालक '] –  Aug 27 '15 at 13:10
  • It is working for example you are mentioning, only you might be getting result as byte string instead of unicode object – hspandher Aug 27 '15 at 13:15
  • @vashi result = set(list1).intersection(set(list2)) – Dmitry.Samborskyi Aug 27 '15 at 13:16
  • @hspandher, how do i get result in actual string (unicode object)? –  Aug 27 '15 at 13:18
  • @Dmitry.Samborskyi, the intersection works and it gives me result in byte string. How do i get the actual string? –  Aug 27 '15 at 13:22
  • @vashi Python 2x does not support unicode by default. So, if it is not a problem use python3, the solution would work perfectly – hspandher Aug 27 '15 at 13:30
  • Otherwise you need to mess with decoding byte string to unicode object – hspandher Aug 27 '15 at 13:36
  • @hspandher, can you tell me how to do that? I have added my code above. –  Aug 27 '15 at 13:46
  • Use codecs library to open file with utf-8 format and then use the solution in described – hspandher Aug 27 '15 at 14:56