2

I am using the great new r package "reticulate" to merge Python and R to be able to use an API from a data provider (Thomson Reuters Eikon) in R which is only available for Python. I wish to do that as my R abilities are better than my (almost non-existent) Python abilities.

I use a function "get_news_headlines" from the Python module "eikon" which serves as the API to download data from Thomson Reuters Eikon. I automatically convert the resulting pandas dataframe to an r dataframe by setting the argument "convert" of the reticulate function "import" to TRUE.

The API sets the first column of the downloaded data containing the news publication dates as the index. When the dataframe is converted to an r object automatically there are duplicates in the dates and I receive the following error message:

Error in `row.names<-.data.frame`(`*tmp*`, value = value) : 
duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique value when setting 'row.names': ‘2018-05-31 08:21:56’ 

Here is my code:

library(reticulate) #load reticulate package to combine Python with R

PYTHON_pandas <- import("pandas", convert = TRUE)
#import Python pandas via reticulate's function "import"
PYTHON_eikon <- import("eikon", convert = TRUE)
#import the Thomson Reuters API Python module for use of the API in R
#(I set convert to true to convert Python objects into their R equivalents)

#do not bother with the following line:
PYTHON_eikon$set_app_id('ADD EIKON APP ID HERE')
#set a Thomson Reuters application ID (step is necessary to download data from TR, any string works)

DF <- PYTHON_eikon$get_news_headlines(query = 'Topic:REAM AND Topic:US', count = 10L)
#save news data from the API in an R dataframe
#query is the Thomson Reuters code from their Eikon database
#count is the number of news articles to be downloaded, I arbitrarily set it to 10 articles here

So my problem is that I have to tell R to replace the duplicates from the pandas index before the conversion into an r dataframe happens to avoid the stated error message. When I set the argument count to a small number and coincidentally do not have any duplicates, the code works perfectly fine as it is now.

This is probably easy for people with some knowledge both in R and Python (so not for me as my Python knowledge is very limited). Unfortunately the code is not replicable as I want to use the Thomson Reuters data access. Any help is highly appreciated!

EDIT: Would it possibly be an option to set the argument convert = FALSE in the import function to receive a pandas dataframe in R first? Than I would require a possibility to manipulate the Python pandas dataframe within R so that the duplicates are removed or alternatively the pandas dataframe index is removed before I manually convert the pandas dataframe to an R dataframe. Is that possible with reticulate?

The documentation for the eikon Python package is not really good yet as it is a pretty new Python module.

@Moody_Mudskipper:

str(PYTHON_eikon) only returns Module(eikon) as I am only fetching the respective Python module with the import function.

names(PYTHON_eikon) returns: "data_grid" "eikonError" "EikonError" "get_app_id" "get_data" "get_news_headlines" "get_news_story" "get_port_number" "get_symbology" "get_timeout" "get_timeseries" "json_requests" "news_request" "Profile" "send_json_request" "set_app_id" "set_port_number" "set_timeout" "symbology" "time_series" "tools" "TR_Field"

None of the available eikon functions seems to help me with my issue.

sspade
  • 63
  • 6
  • `DF <- PYTHON_eikon$get_news_headlines(query = 'Topic:REAM AND Topic:US', count = 10L)` is the line which triggers the error ? – moodymudskipper May 31 '18 at 10:12
  • Exactly, I should have stated that in the question. Sorry. – sspade May 31 '18 at 10:14
  • I don't know this library but I'd guess if you explore the structure of `PYTHON_eikon` you will find a `data` item or something similar. You can then do : `PYTHON_eikon$data$dates <- row.names(PYTHON_eikon$data)` and `row.names(PYTHON_eikon$data)<-NULL` and it should run fine – moodymudskipper May 31 '18 at 10:16
  • What does `str(PYTHON_eikon)` or `names(PYTHON_eikon)` return ? – moodymudskipper May 31 '18 at 10:18
  • Thanks Moody_Mudskipper! I added some more information in an edited version of my question. – sspade Jun 01 '18 at 13:21

2 Answers2

1

In case this rather special problem is ever interesting for someone else, I briefly want to share the solution I found in the meantime (not perfect, but working):

library(reticulate) #load reticulate package to combine Python with R

PYTHON_pandas <- import("pandas", convert = TRUE)
#import Python pandas via reticulate's function "import"
PYTHON_eikon <- import("eikon", convert = TRUE)
#import the Thomson Reuters API Python module for use of the API in R
#(I set convert to true to convert Python objects into their R equivalents)

#do not bother with the following line:
PYTHON_eikon$set_app_id('ADD EIKON APP ID HERE')
#set a Thomson Reuters application ID (step is necessary to download data from TR, any string works)

#**Solution starts HERE:**

DF <- PYTHON_eikon$get_news_headlines(query = 'Topic:REAM AND Topic:US', count = 10L, raw_output = TRUE)
#use argument "raw_output" to receive a list instead of a dataframe

DF[c(2, 3)] <- NULL
#delete unrequired list-elements

DF <- list.cbind(DF)
#use "rlist" function "list.cbind" to column-bind list object "DF"

DF <- rbindlist(DF, fill = FALSE)
#use "data.table" function "rbindlist" to row-bind list object "DF" 
sspade
  • 63
  • 6
0

Do you need to use the R package "reticulate" or could you look at other packages as well?

There is an open source wrapper for R available at GitHub: eikonapir. Although it is not officially supported, you might find it useful because it can execute your command without any problems:

get_news_headlines(query = 'Topic:REAM AND Topic:US', count = 10L)

**Disclaimer: I am currently employed by Thomson Reuters

PythonSherpa
  • 2,560
  • 3
  • 19
  • 40
  • I could use other packages aswell, but I in the meantime was told by Thomson Reuters that they limit the number of news articles for one API call to 200. For my intented machine learning goals, the Eikon API thus unfortunately seems to be useless. – sspade Jul 04 '18 at 17:33