How to /In which program process a large amount of data(observations)?

Question

I have a project where my data universe is charged in Oracle SQL Developerand it's around 86 millions of rows, of course I want to apply a mothodology in order to reduce the number of observations such as clusters and in order to get that I use a server with 248 GB RAM capacity and Rstudio. The problem is sending the data from SQL to Rstudio, the process crashed in time and also I couldn't obtain the data.

I'm exploring other options (as an alternative to Rstudio) to get the data from SQL. Does any of you have any idea of how can I get the data and maybe improve the consulting time? I understand that SQL only gives you a preview of the data so it runs quickly and maybe there's an option like that in Rstudio, Python or any other software but I'm not really familiar with it. I'll really appreciate any idea.

Regards,

When you say you want to *"reduce the number of observations"*, does this mean you want to aggregate data and use the aggregated results in R? If so see package [`sqldf`](https://CRAN.R-project.org/package=sqldf). See also [this R-bloggers post](https://www.r-bloggers.com/manipulating-data-frames-using-sqldf-a-brief-overview/) — Rui Barradas, Feb 07 '20 at 17:06
No, it's not an aggregate itself, I mean it's not the same observation over and over again so I can group them and sum some of the concepts, the idea is group the observations by their characteristics as a cluster so the manipulation of the data gets simplier (in terms of number of rows). — Salma Guzmán, Feb 07 '20 at 17:18

How to /In which program process a large amount of data(observations)?

0 Answers0