What approach can I use to predict the nationality of a person from the surname?
I have a huge list of texts and surnames of authors. I would like to identify which texts have been written by latin-language speakers and which texts have been written by native english speakers, in order to study if certain writing style patterns are different in one group compared to the other.
I have looked in google and in pubmed for a database of surnames, but I could not find any accessible for free. Another approach is to use some regexs, for example ".*ez" to identify some hispanic surnames such as 'rodriguez', but it doesn't get me very far.
Do you have any suggestion? Since I will manually revise all the associations after making the prediction, I don't need a great accuracy, but any help or idea will be welcome.