0

I am running a task where it would be nice to have different versions of word embeddings across different time periods e.g. embeddings for 2013, 2014, 2015, 2016 ... 2020. This is because I don't want to bias my algorithm by using embeddings which have been trained on a time period before I run my target task e.g. using embeddings from 2019 when I am running a task on data from 2013 (i.e. words will have different meanings).

Is anybody aware of a project which has pretrained embeddings across time? The data type would preferably be social media (e.g. Twitter).

jim travis
  • 37
  • 3

2 Answers2

0

There may be some useful recommendations/references in my previous answer to a similar question:

Word embeddings for the same word from two different texts

gojomo
  • 52,260
  • 14
  • 86
  • 115
0

One way you can try is to collect twitter datasets from different years and train yourself. The best point to start looking for those datasets would be this one: https://archive.org/search.php?query=collection%3Atwitterstream&sort=-publicdate