Given common bash-tools, it is easy to split a big file (in my case a MySQL dump and thus a TSV-file) into smaller parts using the split
command. Furthermore, this command supports splitting a file after n
new lines (i.e. -l
argument). But this command does not distinguished between escaped and unescaped newline characters and thus might break a single table row into two incomplete parts.
Example (TSV with 2 columns)
cool 2014-12-15 17:31:00
do not censor it ...^M\\n 2016-01-24 22:33:00
watch out ari, you've got compeition! hahah 2001-12-05 19:11:01
Oh God, the poor guy! xD\\nCan't wait to watch this! 2011-07-11 22:01:20
wish i could do that.\\n 2001-02-07 00:24:11
Funny! I will use this reason when I drink something in other houses 2015-06-10 12:20:00
As you can see, there are two columns (first contains the comment and the second the date), which are separated by an tab. I visualised just the escaped newlines, tabs and unescaped newlines are not printed. If you put these lines into a file and split it (e.g., split example.tsv -l 1
) you will get 9 files, but there are only 6 comments (3 contain escaped newlines)! This is because escaped newlines are treated as regular newlines prefixed with a backslash. This is a huge problem for me, because splitting the file might lead to incomplete table rows in the output-files.
Is it somehow possible to ignore escaped newlines or does someone know another command which can do this?