Pretrained Word Embeddings For Each Year

Question

I am running a task where it would be nice to have different versions of word embeddings across different time periods e.g. embeddings for 2013, 2014, 2015, 2016 ... 2020. This is because I don't want to bias my algorithm by using embeddings which have been trained on a time period before I run my target task e.g. using embeddings from 2019 when I am running a task on data from 2013 (i.e. words will have different meanings).

Is anybody aware of a project which has pretrained embeddings across time? The data type would preferably be social media (e.g. Twitter).

Hi, this question seeking recommendations is off-topic for Stack Overflow. Please take a look at [ask]. — Arya McCarthy, Feb 22 '21 at 05:38

score 0 · Answer 1 · answered Feb 22 '21 at 07:51

0

There may be some useful recommendations/references in my previous answer to a similar question:

Word embeddings for the same word from two different texts

answered Feb 22 '21 at 07:51

gojomo

52,260
14
86
115

score 0 · Answer 2 · answered Feb 22 '21 at 11:58

0

One way you can try is to collect twitter datasets from different years and train yourself. The best point to start looking for those datasets would be this one: https://archive.org/search.php?query=collection%3Atwitterstream&sort=-publicdate

answered Feb 22 '21 at 11:58

elmurod1202

15
2

Pretrained Word Embeddings For Each Year

2 Answers2