0

I'd like to print the first 255 characters followed by entire line separated with a '^' character.

(Right now I'm testing with 10 characters at a time instead of 255.) For example, this works fine with one huge problem:

cat myfile.txt | sed -e 's:^\(.\{10\}\)\(.*\)$:\1^\1\2:'

The problem is that some lines are very short, in which case I want to print the entire line twice separated with '^' character.

For example:

1234567890987654321
123

should print as:

1234567890^1234567890987654321
123^123

I don't really need to use sed, but it seems that sed should be able to do this - but any one line command would be nice.

anubhava
  • 761,203
  • 64
  • 569
  • 643
Alex
  • 1,192
  • 14
  • 30

2 Answers2

2

It only requires an almost trivial tweak to your sed script — you want to print up to the first N characters of a line, followed by the caret and the whole line, so you specify a range with a number and a comma before the upper bound:

sed -e 's:^\(.\{1,10\}\)\(.*\)$:\1^\1\2:'

or (cutting down on the number of backslashes and remembered strings):

sed -e 's:^.\{0,10\}:&^&:'

This adds the caret to what were empty lines; the version with 1,10 leaves empty lines empty.

Incidentally, Mac OS X sed requires a number before the comma; GNU sed follows tradition and does not require it, and treats the leading missing number as 0. Portable code, therefore, will not write:

sed -e 's:^.\{,10\}:&^&:'

It will work with some, maybe most, but not all versions of sed.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • The line without '0' before comma doesn't work in MKS either. – Alex Jun 30 '14 at 15:11
  • @Alex: that's interesting to know — thanks for the information. Checking POSIX, the section on [BRE](http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03_06)'a and 'multiple matching characters' does not allow the value before the comma to be omitted, so there are grounds for omitting that option. – Jonathan Leffler Jun 30 '14 at 15:12
1

You can do so using awk easily:

awk '{print substr($0, 1, 10) "^" $0}' file
1234567890^1234567890987654321
123^123
anubhava
  • 761,203
  • 64
  • 569
  • 643
  • you could just leave the -v couldn't you and just have `n=10` inside the awk statement ? –  Jun 30 '14 at 14:41
  • 1
    Also you could just have `awk '{print substr($0, 1, 10) "^" $0}'` as substring will stop when it reaches the end of a line anyway :) –  Jun 30 '14 at 14:43