I'm quite new to vaex ;)
Problem:
I'm importing a huge amount of logfiles into vaex, each as a string and with lowered leters.
After that I'm calculating the size of each string into column size
For every string I'm calculating and storing the most frequent digram into topdigram
Now I would like to replace the die most frequent digram in the string with another letter, but each row on its own.
Is there any way to implement it with str.replace? Or is it necessary to make a complete new implementation using multiprocess for at least parallelisation?
# InputFile Contents lower size topdigram compressed compressedsize
0 HDFS_1_46.log 123456789 123456789 10 12 error error
1 HDFS_2_42.log File 2: 222222222222222222222222 file 2: 222222222222222222222222 33 22 error error
2 HDFS_3_10.log File 3: 33333333333333333333333333333333 file 3: 33333333333333333333333333333333 41 33 error error
3 HDFS_4_25.log File 4: 444444444444444444444444444444 file 4: 444444444444444444444444444444 39 44 error error
4 HDFS_5_6.log File 5: 5555555555555555555555555555 file 5: 5555555555555555555555555555 37 55 error error