How to handle large Pandas dataframe to visualize data in R using `ggplot`?

Asked Mar 01 '19 at 16:33

Active Mar 01 '19 at 16:33

Viewed 206 times

I have a workflow where I use python for signal processing and R for analyses and plotting. From the python part, I get a large data frame (100 observations of 11165 variables) that I save and import it in R using feather. However, working with such a large df in RStudio is very slow when it comes to plotting (ggplot2) or doing data wrangling.

Should I be working in a different format (i.e. list of lists?) to get more efficient memory management?

asked Mar 01 '19 at 16:33

AJMA

1,134
2
13
28

2

Plot rendering speed with ggplot2 is known to be a bit slow, there's nothing you can do about that (I do know the maintainers have recently dedicated someone to attacking that tho). Slowness in general "data wrangling" is sort of impossible to help with without a specific example to work from. – joran Mar 01 '19 at 16:42
2

package `datatable` is generally faster than base R or `tidyverse` for data wrangling, but I don't think there's any way to get around the fact that it will take a while to plot a big dataset in ggplot – Jan Boyer Mar 01 '19 at 16:42
1

On the python side, there are some [ways to improve performance](https://pandas.pydata.org/pandas-docs/stable/user_guide/enhancingperf.html) of a dataframe, if it makes your 'wrangling' piece faster to do before you port to R... – G. Anderson Mar 01 '19 at 16:47

How to handle large Pandas dataframe to visualize data in R using `ggplot`?

0 Answers0