list1 = ["happy new year", "game over", "a happy story", "hold on"]
list2 = ["happy", "new", "hold"]
Assume I have two string lists, I want to use a new list to store the matched pairs of those two lists just like below:
list3=[["happy new year","happy"],["happy new year","new"],["a happy story","happy"],["hold on","hold"]]
which means I need to get all pairs of strings in one list with their substrings in another list.
Actually that is about some Chinese ancient scripts data. The first list contains names of people in 10th to 13th century, and the second list contains titles of all the poems at that period. Ancient Chinese people often record their social relations in the title of their works. For example, someone may write a poem titled "For my friend Wang Anshi". In this case, the people "Wang Anshi" in the first list should be matched with this title. Also their are cases like "For my friend Wang Anshi and Su Shi" which contains more than one people in the title. So basically that's a huge work involved 30,000 people and 160,000 poems.
Following is my code:
list3 = []
for i in list1:
for j in list2:
if str(i).count(str(j)) > 0:
list3.append([i,j])
I use str(i) because python always takes my Chinese strings as float. And this code does work but too too too slow. I must figure out another way to do that. Thanks!