Our system can collect data on user queries and domain clicks based on their behavior. We aim to enhance this system by predicting the domain corresponding to a user query. For instance, if a user enters a query not included in our list, such as "xoilac tv," we will predict the list of domains that the user is likely to click, such as [xoilac.tv, 'youtube']
Example dataset from log
data = [
("Lỗi Office 2013 không mở được file", "tainhanh.site"),
("yoko onsen", "www.klook.com"),
("bamboo", "www.bambooairways.com"),
("xoilac tv", "xoilac.tv"),
("xoilac tv", "youtube.tv"),
("hộp mực máy in 226dw", "toanphat.com"),
]
given q: 'xoi lac'
the output is the list of domain with highest probability :
[ {xoilac.tv, 0.99}, {'youtube.tv', 0.90},..]
Is it possible to approached as a sequence modeling problem, specifically a Seq2Seq ? where the input is the query and the output is the website domain users are likely to visit. Can anyone help?