I have a set of large files that have to be split into 100MB parts. The problem I am running into is the fact that lines are terminated by the ^B ASCII (or \u002) character.
Thus, I need to be able to get 100MB parts (plus or minus a few bytes obviously) that also accounts for the line endings.
Example file:
000111222333...nnn^B000111222333...nnn^B000111222333...nnn^B000111222333...nnn^B000111222333...nnn^B000111222333...nnn^B000111222333...nnn^B000111222333...nnn^B000111222333...nnn^B000111222333...nnn^B000111222333...nnn^B000111222333...nnn^B000111222333...nnn^B000111222333...nnn^B000111222333...nnn^B000111222333...nnn^B000111222333...nnn^B000111222333...nnn^B000111222333...nnn^B000111222333...nnn^B000111222333...nnn^B
The size of a "line" can vary in size.
I know of split and csplit, but couldn't wrap my head around combining the two.
#!/bin/bash
split -b 100m filename #splitting by size
csplit filename “/$(echo -e “\u002”)/+1” “{*}” #splitting by context
Any suggestions on how I can do 100MB chunks that maintain the lines intact? As a side note, I am not able to change the line endings to a \n because that will corrupt the file as the data between ^B has to maintain the new line characters if present.