0

TLDR: I wanna build multi-language search on my website ala Pinterest, how do I do that?

I am starting a website, where people can publish content that gets metadata typed by the user. People can then interact with the content by looking at it, liking it, commenting on it, sharing it to social media. Also content discovery is mostly done through search.

I do not wish to create geographic boundaries on my website. I would like people who speak any language to find content that is relevant to them in any language. This requirement makes sense because the content is highly visual, ala Pinterest. So even if I don't understand that the word "car" is written in French in the description, it's fine because I'll mostly be interested in seeing the car.

Pinterest is really really good with search across language. For example, on uk.pinterest.com I typed "coupe carrée" which is the French for "bob haircut" and all the results are visually relevant. Even if the pin metadata is in English and the original web site is all in English.

How is that possible? how was Pinterest able to match to my french search query content whose text is all in English? is there translation at some step: coupe carrée > bob haircut > content containing "bob haircut"?

I looked at their engineering blog and all I found is tech to detect the original country and language of a website. Nothing about managing language in search.

please let me know if this is the wrong place to ask the how-it-works question.

Thanks in advance for any help/pointers you will be able to share!

Myna
  • 569
  • 2
  • 10
  • 24

1 Answers1

0

The general strategy in this case is to index your content with every language translation you wish to search.

This would require use of a language translation API at index-time. And a language identification model. Here's a Solr example.

Peter Dixon-Moses
  • 3,169
  • 14
  • 18