So I'm a newbie when it comes to working with big data. I'm dealing with a 60GB CSV file so I decided to give Dask a try since it produces pandas dataframes. This may be a silly question but bear with me, I just need a little push in the right direction...
So I understand why the following query using the "compute" method would be slow(lazy computation):
df.loc[1:5 ,'enaging_user_following_count'].compute()
btw it took 10 minutes to compute.
But what I don't understand is why the following code using the "head" method prints the output in less than two seconds (i.e., In this case, I'm asking for 250 rows while the previous code snippet was just for 5 rows):
df.loc[50:300 , 'enaging_user_following_count'].head(250)
Why doesn't the "head" method take a long time? I feel like I'm missing something here because I'm able to pull a huge number of rows in a way shorter time than when using the "compute" method. Or is the compute method used in other situations?
Note: I tried to read the documentation but there was no explanation to why head() is fast.