Hi I am on the course of developing Encoder-Decoder model with Attention which predicts WTO Panel Report for the given Factual Relation given as Text_Inputs.
Sample_sentence for factual relation is as follow:
sample_sentence = "On 23 January 1995, the United States received a request from Venezuela to hold consultations under Article XXII:1 of the General Agreement on Tariffs and Trade 1994 (\"General Agreement\"), Article 14.1 of the Agreement on Technical Barriers to Trade (\"TBT Agreement\") and Article 4 of the Understanding on Rules and Procedures Governing the Settlement of Disputes (\"DSU\"), on the rule issued by the Environmental Protection Agency on 15 December 1993, entitled \"Regulation of Fuels and Fuel Additives - Standards for Reformulated and Conventional Gasoline\" (WT/DS2/1). The consultations between Venezuela and the United States took place on 24 February 1995. As they did not result in a satisfactory solution of the matter, Venezuela, in a communication dated 25 March 1995, requested the Dispute Settlement Body (\"DSB\") to establish a panel to examine the matter under Article XXIII:2 of the General Agreement and Article 6 of the DSU (WT/DS2/2). On 10 April 1995, the DSB established a panel in accordance with the request made by Venezuela. On 28 April 1995, the parties to the dispute agreed that the Panel should have standard terms of reference (DSU, Art. 7) and agreed on the composition of the Panel as follows"
I am trying to using Word2Vec from google and encode each word into 300dim Word Vectors however, like number 23 appears as not included in the Word2Vec VocaSets.
Which would be the solution for this problem?
1) Use another Word Embedding for example Glovec?
2) Or Another any other advice?
Thx in advance for your help
- edit)
I think to succefully fulfill this task, I think first I have to understand how current NMT application deals with Named Entity Recognition problem in advance before they actually train it.
Any suggestive literatures?