I was actually trying to solve analytics vidya recent Hackathon LTFS(Bank Data), and there I faced something unique problem, actually not too unique. Let me explain
Problem
There are few columns in a Bureau dataset named
REPORTED DATE - HIST
, CUR BAL - HIST
, AMT OVERDUE
- HIST & AMT PAID - HIST
which consists blank value ,,
or more than one value in a row, and also there is not the same number of value in each row
Here is the part of the dataset (it's not original data, because of the big row size)
**Requested Date - Hist**
20180430,20180331,
20191231,20191130,20191031,20190930,20190831,20190731,20190630,20190531,20190430,20190331
,
20121031,20120930,20120831,20120731,20120630,20120531,20120430,
----------------x-----------2nd column------------x-----------------------------------
**AMT OVERDUE**
37873,,
,,,,,,,,,,,,,,,,,,,,1452,,
0,0,0,
,,
0,,0,0,0,0,3064,3064,3064,2972,0,2802,0,0,0,0,0,2350,2278,2216,2151,2087,2028,1968,1914,1663,1128,1097,1064,1034,1001,976,947,918,893,866
-----x--other columns are similar---x---------------------
Seeking for a better option, if possible
Previously when I solved this kind of problem, it was genres of Movielens project and there I use used dummy column concept, it worked there because there had not too many values in genres columns and also some of the values are repeating value in many rows, so it was quite easy. But here it seems quite hard here because of two reasons
1st reason
because it contains lots of value and at the same time it may contain no value
2nd reason
how to create a column for each unique value or a row like in Movielens genre case
**genre**
action|adventure|comedy
carton|scifi|action
biopic|adventure|comedy
Thrill|action
# so here I had extracted all unique value and created columns
**genre** | **action** | **adventure**| **Comedy**| **carton**| **sci-fi**| and so on...
action|adventure|comedy | 1 | 1 | 1 | 0 | 0 |
carton|scifi|action | 1 | 0 | 0 | 1 | 1 |
biopic|adventure|comedy | 0 | 1 | 1 | 0 | 0 |
Thrill|action | 1 | 0 | 0 | 0 | 0 |
# but here it's different how can I deal with this, I have no clue
**AMT OVERDUE**
37873,,
,,,,,,,,,,,,,,,,,,,,1452,,
0,0,0,
,,
0,,0,0,0,0,3064,3064,3064,2972,0,2802,0,0,0,0,0,2350,2278,2216,2151,2087,2028,1968,1914,1663,1128,1097,1064,1034,1001,976,947,918,893,866