-3

I am trying to make a program which customize ads based on your search history.

therefore, I need an algorithm / script that find the best keyword of Ad that will be adjusted to the specific person, based on the frequency of the word in the searches and the time elapsed from that search.

for example,

if my search list is :

  1. how to find the main word of sentence - 2018-03-31 15:16:04.752350

  2. main word of sentence - python - 2018-03-28 15:16:04.752350

  3. food of dogs - 2016-03-28 15:16:04.752350

  4. dogs and their food - 2016-03-25 15:16:04.752350
  5. dog's food - 2016-03-23 15:16:04.752350

so even though dog, food is apearing 3 times and main word of sentence only 2 , due to the fact that has been pass along time since the user searched for dog's food, the topic that would be choose is - main word of sentence.

So far I have done some algorithms that find the main topic of sentence, without cosidering the time that pass. But, unfrotuently, as I said, i need an algorithm based on time. I tought about simple ideas like multipication recently search score of “good” by constant but i want better algorithm.

Thanks alot,

Omer

David Makogon
  • 69,407
  • 21
  • 141
  • 189
omersk1
  • 11
  • 3

1 Answers1

0

You could count the frequency of each words, with some sort of penalty for older words.

  • For example, if a word is present in the last month, it counts for "1".

  • If it is older than a month, but sooner than a year, count it for "0.5"

  • If it is older than a year, count it for "0.1"

This is a simplification, but you can use this idea to place more emphasis on recent words.

A slight step up from this could use a "normal distribution". Here's a great example how to draw a normal distribution: python pylab plot normal distribution

In your case, instead of plotting it on a graph you want to multiply the y axis value by the frequency.

Mohamad Zeina
  • 404
  • 3
  • 20
  • Thanks, but i want more complex algorithm that will bring better result. Something that was checked and being development by large company or a group of developers. Anyway i vote up for you :) – omersk1 Mar 31 '18 at 16:21
  • I'd be interested to see if you find anything, I don't imagine big companies are too eager to share these types of algorithms! You could make my example more sophisticated by modelling it on a normal distribution, where you can pick the standard deviation based on the average attention span you expect your users to have – Mohamad Zeina Mar 31 '18 at 16:24
  • Can you explain some more? ( I have done statistic course so I know what is normal distribution and other statistic concepts) – omersk1 Mar 31 '18 at 16:42
  • Essentially, instead of choosing random numbers like I have above (1, 0.5, 0.1) you can automatically generate these numbers based on a normal distribution. – Mohamad Zeina Mar 31 '18 at 16:44
  • Just one more thing, how do i generate the standart deviation, I really don’t know which number can be adjusted – omersk1 Mar 31 '18 at 16:50
  • I just edited y answer to show an example. In your case, you want to adjust the "variance" to correspond with the attention span of your audience. You don't need to adjust anything else. Just use the height of the curve as the "frequency", each time a word appears. – Mohamad Zeina Mar 31 '18 at 16:56
  • Does this help answer your question? – Mohamad Zeina Mar 31 '18 at 16:56
  • I arbitrarily choose the variance? or i get it in a smart way from my search history – omersk1 Mar 31 '18 at 17:16
  • ( after i create the graph, the numbers that you choosed above would be raplaced by the height of the curve in the specific time ?) – omersk1 Mar 31 '18 at 17:24
  • Yes, the variance is somewhat arbitrary. You can tune it later. You want to look at the height of the graph, at the distance of the word. For example, if it was searched a month ago plot the graph for X = 1. If it was 6 months, you can plot it for X = 6 – Mohamad Zeina Mar 31 '18 at 17:26
  • No worries :) do you mind marking my answer correct if it answered your question? – Mohamad Zeina Mar 31 '18 at 17:34
  • There should be a tick next to my question, by the votes. I'm glad I could help :) – Mohamad Zeina Mar 31 '18 at 17:36