0

I have over 8000 csv files in a folder, each containing 2 columns. The file title is also the stock ticker. e.g. "AAPL.csv" shows the data of the Apple stock.

This is the data it contains:

glimpse(AAPL)

*Columns: 2

$ timestamp     <chr> "2018-05-02 04:53:46", "2018-05-02 06:38:58", "2018-05-03 00:35:25",

$ users_holding <int> 150785, 150785, 145510*

Edit: this is the raw data in the csv files - see first date and last date, this is the timeframe to be matched with the stock prices

timestamp,users_holding
2018-05-02 04:53:46,150785
2018-05-02 06:38:58,150785
2018-05-03 00:35:25,145510
2018-05-03 06:33:53,145510
2018-05-03 06:48:56,145510
2018-05-03 07:07:03,145510
2018-05-03 07:34:19,145510
2018-05-03 07:43:36,145510
2018-05-03 11:19:43,145511
2018-05-03 12:43:07,145511
2018-05-03 13:43:07,145512
2018-05-03 14:43:07,144974
2018-05-03 15:43:08,144543
2018-05-03 16:43:08,144389
2018-05-03 17:43:07,144264
2018-05-03 18:43:07,144060
2018-05-03 19:43:07,143941
2018-05-03 20:43:07,143789
2018-05-03 21:43:07,143754
2018-05-03 22:43:08,143747
2018-05-03 23:43:06,143747
2018-05-04 00:43:06,143747
2018-05-04 01:43:07,143747
2018-05-04 02:43:08,143747
2018-05-04 03:43:07,143747
2018-05-04 04:43:07,143747
2018-05-04 05:43:07,143747
2018-05-04 06:43:07,143747
2018-05-04 07:43:07,143747
2018-05-04 08:43:08,143747
2018-05-04 09:43:07,143749
2018-05-04 10:43:07,143749
.
.
.
2020-08-13 16:52:38,726024
2020-08-13 20:51:07,730106
2020-08-13 21:50:08,730448
2020-08-13 22:55:09,730774

This is the only information the files contain. To read all the files, I used the Tidyverse package

x  <- dir("popularity_export", full.names = T) %>% map_df(read_csv)

And to get the stock data, I use the tidyquant and quantmod packages. The problem is, I would have to get the data for each ticker separately. The ticker would be in the csv file title. Is there a way how I could do this?

mdate = "2018-05-02" #earliest date from which the stock data should be recieved
aaplPrices <- getSymbols('AAPL', from=mdate, auto.assign=F)[,4]
print(aaplPrices)

2021-03-02   125.1200
2021-03-03   122.0600
2021-03-04   120.1300
2021-03-05   121.4200
2021-03-08   116.3600
2021-03-09   121.0900
2021-03-10   119.9800

Then I have to merge the stock price with the other data, but I only need one price per day. The csv files contain hourly data (and have a different date format). Does anyone know how to do this?

Thank you already very much.

Rifraf
  • 1
  • 1
  • 1
    Need to see enough data (as text, not pictures). Show the top six rows each from three different files that illustrate the "different date format". – IRTFM Mar 11 '21 at 20:50
  • Perhaps try `... %>% map_dfr(read_csv, .id = "filename")` and then use that (perhaps after stripping ".csv") to filter by stock? – Jon Spring Mar 11 '21 at 22:03
  • I added a sample in the question. The date format of the stocks is also there to see. With different format I mean that in the csv file it is and has time added, while in the stock data it is only the date. Hope this helps – Rifraf Mar 11 '21 at 22:19

0 Answers0