3

I'm doing part-of-speech & morphological analysis project for Japanese sentences. Each sentence will have its own webpage. To make this page more visual, I want to show one picture which is somehow related to the sentence. For example, For the sentence "私は学生です" ("I'm a student"), the relevant pictures would be pictures of school, Japanese textbook, students, etc. What I have: part-of-speech tagging for every word. My approach now: use 2-3 nouns from every sentence and retrieve the first image from search results using Bing Images API. Note: all the sentence processing up to this point was done in Java.


Have a couple of questions though: 1) what is better (richer corpus & powerful search), Google Images API, Bing Images API, Flickr API, etc. for searching nouns in Japanese? 2) how do you select the most important noun from the sentence to do the query in Image Search Engine without doing complicated topic modeling, etc.? Thanks!

Makoto
  • 104,088
  • 27
  • 192
  • 230
Arman
  • 1,074
  • 3
  • 20
  • 40

2 Answers2

1

Japanese WordNet has links to OpenClipart pictures. That could be another relevant source. They describe it in their paper called "Enhancing the Japanese WordNet".

Nate Glenn
  • 6,455
  • 8
  • 52
  • 95
0

I thought you would start by choosing any noun before は、が and を and giving these priority - probably in that order.

But that assumes that your part-of-speech tagging is good enough to get は=subject identified properly (as I guess you know that は is not always the subject marker).

I looked at a bunch of sample sentences here with this technique in mind and found it as good as could be expected. Except where none of those are used, which is rarish.

And sentences like this one, where you'd have to consider maybe looking for で and a noun before it in the case where there is no を or は. Because if you notice here, the word 人 (people) really doesn't tell you anything about what's being said. Without parsing context properly, you don't even know if the noun is person or people.

毎年 交通事故で 多くの人が 死にます (many people die in traffic accidents every year)

But basically, couldn't you implement a priority/fallback type system like this?

BTW I hope your sentences all use kanji, or when you see はし (in one of the sentences linked to) you won't know whether to show a bridge or chopsticks - and showing the wrong one will probably not be good.

computingfreak
  • 4,939
  • 1
  • 34
  • 51
PandaWood
  • 8,086
  • 9
  • 49
  • 54