Data structure for indexing application?

Asked Apr 21 '18 at 18:21

Active Apr 21 '18 at 23:07

Viewed 26 times

ok so I have this geek question that is must be debatable. If I have a web page and want to count how many words repeated and how its long? for example:

" java is a great.... bla bla bla" ... "java is ...bla bla bla"

Now, I have

  7            …      3       
java           …     is   
 2                   2

"java" repeated twice and has 7 indices. so do "is" repeated twice in the web page and has 3 indices. The output index consists of two integers separated by a colon like java is 7:2 The first number before the colon represents the word’s ID, and the second number is the word's frequency (means, how many times the word has occurred in the web [age.) My question is, which data structure should be using here and why? I was thing of hashcode as i can use it to count times of a word repeated... but not sure if it would be so sufficient

edited Apr 21 '18 at 23:07

asked Apr 21 '18 at 18:21

Afeer Yahya

What do you mean by '"java" has 7 indices'? Why does this word have the ID `7`? Why not `23`? – Apr 21 '18 at 18:24
"java" repeated twice in the text and its 7 because we are using index so we should start counting from zero. java has 3 indices abut if we say java+jave = 7. I do not know how to explain my point... – Afeer Yahya Apr 22 '18 at 17:49
'jave'? Where does this come from? I still don't understand how you reach 7. – Apr 22 '18 at 18:34

Data structure for indexing application?

0 Answers0