I have a very large pandas dataframe similar as follows:
╔════════╦═════════════╦════════╗
║ index ║ Users ║ Income ║
╠════════╬═════════════╬════════╣
║ 0 ║ user_1 ║ 304 ║
║ 1 ║ user_2 ║ 299 ║
║ ... ║ ║ ║
║ 399999 ║ user_400000 ║ 542 ║
╚════════╩═════════════╩════════╝
(There are few columns more needed to do some calculations)
So, for every client I have to apply lots and lots of operations (shift, sums, substractions, conditions, etc) so it's impossible (to my believe) to apply boolean masking for everything, I have already tried, so my question is if it's possible to divide the pandas dataframe in chunks as follows, e.g:
# chunk 1
╔════════╦═════════════╦════════╗
║ index ║ Users ║ Income ║
╠════════╬═════════════╬════════╣
║ 0 ║ user_1 ║ 304 ║
║ 1 ║ user_2 ║ 299 ║
║ ... ║ ║ ║
║ 19999 ║ user_20000 ║ 432 ║
╚════════╩═════════════╩════════╝
# chunk 2
╔════════╦═════════════╦════════╗
║ index ║ Users ║ Income ║
╠════════╬═════════════╬════════╣
║ 20000 ║ user_20000 ║ 199 ║
║ 20001 ║ user_20001 ║ 412 ║
║ ... ║ ║ ║
║ 39999 ║ user_40000 ║ 725 ║
╚════════╩═════════════╩════════╝
# chunk K
╔════════╦═════════════╦════════╗
║ index ║ Users ║ Income ║
╠════════╬═════════════╬════════╣
║ ... ║ user_... ║ ... ║
║ ... ║ user_... ║ ... ║
║ ... ║ ║ ║
║ ... ║ user_... ║ ... ║
╚════════╩═════════════╩════════╝
And apply all the operations too all those chunks in parallel.