I need it to do some entity extraction. How do I get an annotated dataset with JobTitles
?
Asked
Active
Viewed 87 times
-2

Nathaniel Ford
- 20,545
- 20
- 91
- 102

Maha Benabbou
- 183
- 4
-
2Did you do any research or experimentation yourself first? This question is too broad for this forum. – Ageonix Feb 17 '16 at 20:24
-
Yes but I didn't find anything. – Maha Benabbou Feb 17 '16 at 20:27
-
It is not clear what type of dataset do you want. Do you need a list with jobTitles? Or do you need a rawtext where job titles are manually annoteted? – Istvan Nagy Feb 17 '16 at 20:28
-
Ok...what is Jobtitles? What entities are you trying to extract? Again...this question needs specifics for people to help you. – Ageonix Feb 17 '16 at 20:29
-
I need rawtext where jobs titles are annotated – Maha Benabbou Feb 17 '16 at 20:31
-
Refer to http://www.stackoverflow.com/help/mcve – Nathaniel Ford Feb 17 '16 at 20:51
-
You can find data where occupations are also annotated: http://nlp.uned.es/weps/weps-2/weps2-papers – Istvan Nagy Feb 18 '16 at 08:21
1 Answers
1
Here is what I suggest to do, if you haven't come across any datasets. Grab wikipedia occupation lists: https://en.wikipedia.org/wiki/Lists_of_occupations, create a gazetteer list of jobs and write regular expressions to capture them or any variations in the text and you have annotated data :).

user3639557
- 4,791
- 6
- 30
- 55
-
1Or she can build a corpus from the wikipedia: when a link in a random wiki page link to one of the occupation Wiki page she found an occupation in raw text. – Istvan Nagy Feb 18 '16 at 08:16