sed printing first 255 characters followed by entire line

Question

I'd like to print the first 255 characters followed by entire line separated with a '^' character.

(Right now I'm testing with 10 characters at a time instead of 255.) For example, this works fine with one huge problem:

cat myfile.txt | sed -e 's:^\(.\{10\}\)\(.*\)$:\1^\1\2:'

The problem is that some lines are very short, in which case I want to print the entire line twice separated with '^' character.

For example:

1234567890987654321
123

should print as:

1234567890^1234567890987654321
123^123

I don't really need to use sed, but it seems that sed should be able to do this - but any one line command would be nice.

Jonathan Leffler · Accepted Answer · 2014-06-30T14:59:47.747

2

It only requires an almost trivial tweak to your sed script — you want to print up to the first N characters of a line, followed by the caret and the whole line, so you specify a range with a number and a comma before the upper bound:

sed -e 's:^\(.\{1,10\}\)\(.*\)$:\1^\1\2:'

or (cutting down on the number of backslashes and remembered strings):

sed -e 's:^.\{0,10\}:&^&:'

This adds the caret to what were empty lines; the version with 1,10 leaves empty lines empty.

Incidentally, Mac OS X sed requires a number before the comma; GNU sed follows tradition and does not require it, and treats the leading missing number as 0. Portable code, therefore, will not write:

sed -e 's:^.\{,10\}:&^&:'

It will work with some, maybe most, but not all versions of sed.

edited Jun 30 '14 at 14:59

answered Jun 30 '14 at 14:53

Jonathan Leffler

730,956
141
904
1,278

The line without '0' before comma doesn't work in MKS either. – Alex Jun 30 '14 at 15:11
@Alex: that's interesting to know — thanks for the information. Checking POSIX, the section on [BRE](http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03_06)'a and 'multiple matching characters' does not allow the value before the comma to be omitted, so there are grounds for omitting that option. – Jonathan Leffler Jun 30 '14 at 15:12

anubhava · Answer 2 · 2014-06-30T14:45:06.853

1

You can do so using awk easily:

awk '{print substr($0, 1, 10) "^" $0}' file
1234567890^1234567890987654321
123^123

edited Jun 30 '14 at 14:45

answered Jun 30 '14 at 14:26

anubhava

761,203
64
569
643

you could just leave the -v couldn't you and just have `n=10` inside the awk statement ? – Jun 30 '14 at 14:41
1

Also you could just have `awk '{print substr($0, 1, 10) "^" $0}'` as substring will stop when it reaches the end of a line anyway :) – Jun 30 '14 at 14:43

sed printing first 255 characters followed by entire line

2 Answers2