I am trying to use csplit
in BASH to separate a file by years in the 1500-1600's as delimiters.
When I do the command
csplit Shakespeare.txt '/1[56]../' '{36}'
it almost works, except for at least two issues:
- This outputs 38 files, not 36, numbered
xx00
throughxx37
. (Alsoxx00
is completely blank.) I don't understand how this is possible. - One of the files (why, it seems, that
csplit
returns 37 non-empty files instead of the 36 non-empty files I expected) doesn't begin with 15XX or 16XX -- it begins with "ACT 4 SCENE 15\n" (where \n is supposed to denote a newline or line break). I don't understand howcsplit
can match a new line/line break with a number.
When I do the command (which is what I want)
csplit Shakespeare.txt '/1[56][0-9][0-9]/' '{36}'
the terminal returns the error: csplit: 1[56][0-9][0-9]: no match
plus listing all of the numbers it lists when the above is executed.
This especially doesn't make sense to me, since grep
says otherwise:
grep -c "1[56][0-9][0-9]" Shakespeare.txt
36
grep -c "1[56].." Shakespeare.txt
36
Note: man csplit
indicates that I have the BSD version from January 26, 2005. man grep
indicates that I have the BSD version from July 28, 2010.