5

I have a file that contains sequence data, where each new paragraph (separated by two blank lines) contain a new sequence:

#example

ASDHJDJJDMFFMF
AKAKJSJSJSL---
SMSM-....SKSKK
....SK


SKJHDDSNLDJSCC
AK..SJSJSL--HG
AHSM---..SKSKK
-.-GHH

and I want to end up with a file looking like:

ASDHJDJJDMFFMFAKAKJSJSJSL---SMSM-....SKSKK....SK
SKJHDDSNLDJSCCAK..SJSJSL--HGAHSM---..SKSKK-.-GHH

each sequence is the same length (if that helps).

I would also be looking to do this over multiple files stored in different directiories.

I have just tried

sed -e '/./{H;$!d;}' -e 'x;/regex/!d' ./text.txt

however this just deleted the entire file :S

any help would bre appreciated - doesn't have to be in sed, if you know how to do it in perl or something else then that's also great.

Thanks.

brucezepplin
  • 9,202
  • 26
  • 76
  • 129

4 Answers4

3

All you're asking to do is convert a file of blank-lines-separated records (RS) where each field is separated by newlines into a file of newline-separated records where each field is separated by nothing (OFS). Just set the appropriate awk variables and recompile the record:

$ awk '{$1=$1}1' RS= OFS= file
ASDHJDJJDMFFMFAKAKJSJSJSL---SMSM-....SKSKK....SK
SKJHDDSNLDJSCCAK..SJSJSL--HGAHSM---..SKSKK-.-GHH
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
2
awk '
    /^[[:space:]]*$/ {if (line) print line; line=""; next}
    {line=line $0}
    END {if (line) print line}
'
perl -00 -pe 's/\n//g; $_.="\n"'

For multiple files:

# adjust your glob pattern to suit, 
# don't be shy to ask for assistance
for file in */*.txt; do
    newfile="/some/directory/$(basename "$file")"
    perl -00 -pe 's/\n//g; $_.="\n"' "$file" > "$newfile"
done
glenn jackman
  • 238,783
  • 38
  • 220
  • 352
  • glenn, how do i do this for multiple files in multiple directories? and then output all files into one single directory... – brucezepplin Dec 20 '12 at 15:48
  • thanks glenn, and i was about to run the function, but I think on second thoughts it is easier to just alter the existing files. currently the one liner you have provided displays the changes in the terminal. I know that using the sed -i flag i can make the changes to the existing file. is there an equivalent for perl? – brucezepplin Dec 20 '12 at 16:11
  • ok, it;s exactly the same. no worries. Glenn you've been a great help! – brucezepplin Dec 20 '12 at 16:14
1

A Perl one-liner, if you prefer:

perl -nle 'BEGIN{$/=""};s/\n//g;print $_' file

The $/ variable is the equivalent of awk's RS variable. When set to the empty sting ("") it causes two or more empty lines to be treated as one empty line. This is the so-called "paragraph-mode" of reading. For each record read, all newline characters are removed. The -l switch adds a newline to the end of each output string, thus giving the desired result.

JRFerguson
  • 7,426
  • 2
  • 32
  • 36
0

just try to find those double linebreaks: \n or \r and replace first those with an special sign like :$: after that you replace every linebreak with an empty string to get the whole file in one line. next, replace your special sign with a simple line break :)

EnrageDev
  • 91
  • 6