Questions tagged [csplit]

The csplit is a Unix command thats split a file into two or more smaller files determined by context lines.

62 questions
0
votes
1 answer

Using csplit to divide a large file in smaller files with a determinated name

Having a very large plaint text file, with about 40 million lines, each line with the same length and format, we want to split it, line by line in N files using csplit. For example, if N is 80, the name of the generated files should…
JLLMNCHR
  • 1,551
  • 5
  • 24
  • 50
0
votes
1 answer

cSplit_e not returning a binary data frame

I have a data frame with a Genre column that has rows like Action,Romance. I want to split those values and create a binary vector. If Action,Romance,Drama are all the possible genres, then the above mentioned row would be 1,1,0 in the output data…
James L.
  • 12,893
  • 4
  • 49
  • 60
0
votes
1 answer

Split file by context and size in bash

I have a set of large files that have to be split into 100MB parts. The problem I am running into is the fact that lines are terminated by the ^B ASCII (or \u002) character. Thus, I need to be able to get 100MB parts (plus or minus a few bytes…
Vlad
  • 23
  • 1
  • 6
0
votes
1 answer

Why don't `csplit` and `grep` agree on whether there are matches?

I am trying to use csplit in BASH to separate a file by years in the 1500-1600's as delimiters. When I do the command csplit Shakespeare.txt '/1[56]../' '{36}' it almost works, except for at least two issues: This outputs 38 files, not 36,…
Chill2Macht
  • 1,182
  • 3
  • 12
  • 22
0
votes
1 answer

Trouble passing static string as REGEXP with csplit

I'm on a Linux terminal and struggling to split a large text file into several smaller files. I'm trying with csplit, but csplit demands that the delimiter pattern is passed as a REGEXP. The static delimiter pattern is , lorum ipsum. How do I write…
Magnus
  • 589
  • 8
  • 26
0
votes
1 answer

How to split Character Columns into multiple columns and then into binary in R?

I got a data set with around 4000 observations: It looks like this format: View(transaction) CustomerID Description 12346 MEDIUM CERAMIC TOP STORAGE JAR 12347 c("BLACK CANDLEABRA HOLDER","AIRLINE BAG VINTAGE JET…
Marre
  • 93
  • 1
  • 8
0
votes
0 answers

CSplit regexp not working

I have the following file content testing with ---- ---- get footer I want to split it with the ---- ---- . There may be some other content between the '----'. I am using the following, but it keeps telling that match not found. csplit -f…
coffeeak
  • 2,980
  • 7
  • 44
  • 87
0
votes
3 answers

Using csplit in Bash script with Form Feed Regex

I have a print output file (uncomp.txt) that has form feeds in it. I'm trying to split the single document into multiple documents based on the \f regex match, and outputting files with the epoch time. I've tried this: $ csplit --prefix=$(date +%s)…
AMPSYS
  • 23
  • 5
0
votes
1 answer

Split a long file (on stdout) according to a pattern and input that into a loop

I have a very long file (yes, this is DNA in fasta format) that is actually a batch of several files patched together, output on the stdout. E.g.: >id1 ACGT >id2 GTAC = >id3 ACGT = >id4 ACCGT >id6 AACCGT I want to split this stream according to a…
Lionel Guy
  • 13
  • 4
0
votes
1 answer

Split fasta file using csplit

I need to split a big fasta file into smaller ones. I am trying the following command: csplit -z input.fasta '/>/' '{*}' but it is generating lots of files (for each ">"). Is there a way to ask to create only two smaller files? Thank you
0
votes
1 answer

splitting a huge text file based on line content

Help me guys, I'm really lost here. I have a big text file, full of links, and I'm trying to separate them based on which website the link belongs. I was trying to do it with the csplit command, but I'm not really sure how I would do it, as it would…
0
votes
1 answer

Split a batch of text files using pattern

I have a directory of almost a thousand html files. Each file needs to be split up into multiple text files, based on a recurring pattern (a heading). I am on a windows machine, using GnuWin32 tools. I've found a way to do this, for a single…
aquadhere
  • 1
  • 2
0
votes
1 answer

Splitting text files on two consecutive lines containing only one integer number

I have a single long text file that contains a list os 3D coordinates. The beginning of the file is composed by a header like this: 10112 2455 121.417670 172.321300 1.704072 0.997697 0.067831 -0.000222 -0.067831 0.997697 0.000207 0.000236 -0.000191…
-1
votes
0 answers

linux pv and csplit

I'm having difficulties to combine the linux command of "pv" (Pipe Viewer) and csplit in one single command line. Trying to make csplit to split a very big file (>10GB) with a running progress bar, the main command goes something like this: csplit…
-1
votes
2 answers

Unix awk command to execute a specific logic

I am not so good with Unix commands and struggling to achieve this. I have a file like…
user1637487
  • 241
  • 1
  • 9
  • 17