3

I have a chain of SSL certificates like this

-----BEGIN CERTIFICATE-----
MIICPjCCAeSgAwIBAgIRALMMpKnhRM2C7mnKI/rl8ggwCgYIKoZIzj0EAwIwgY4x
CERT1
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
MIIDIjCCAsegAwIBAgIOAMjnPM1wShDmOWUELuIwCgYIKoZIzj0EAwIwgagxCzAJ
CERT2
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
MIIDIDCCAsWgAwIBAgIOAMjnPL8JUbVSmpMadWUwCgYIKoZIzj0EAwIwbDELMAkG
CERT3
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
MIIDBjCCAqygAwIBAgIFFRCCEwYwCgYIKoZIzj0EAwIwgZQxFDASBgNVBAoMC0Ft
CERT4
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
MIIDNjCCAtugAwIBAgIJAKpBxYNyH8biMAoGCCqGSM49BAMCMIGUMRQwEgYDVQQK
CERT5
-----END CERTIFICATE-----

and I need to strip the last certificate from it.

On MacOS/BSD command split has flag -p to split by pattern, and I used it:

cat cert | split -p "-----BEGIN CERTIFICATE-----" 
cat xa{a,b,c,d}

I believe there is a command to do it in one line on Linux too, but on Ubuntu the command split is not able to split by pattern.

I need to do the job using standard linux commands, such as those I tagged.

Enlico
  • 23,259
  • 6
  • 48
  • 102
kyb
  • 7,233
  • 5
  • 52
  • 105

3 Answers3

5

This GNU Sed solution should be enough:

sed -zE 's/(.*\n)-----BEGIN CERTIFICATE-----.*/\1/' your_input
  • -E allows one to use (…) instead of \(…\) to capture something;
  • -z (available in GNU Sed) is to treat the whole input as a single long string with embedded \ns.

Therefore, the first .* matches as much as it can (and captures it, together with the \n right after it, so it can reference it in the substitution by using \1), as long as it is followed by \n-----BEGIN CERTIFICATE----- and anything else after it (the second .*).

Enlico
  • 23,259
  • 6
  • 48
  • 102
  • it removes last `-----END CERTIFICATE-----` but sure it works. Thank you. – kyb Oct 15 '20 at 17:19
  • @kyb, don't you want to remove it? Please, include the desired output in your question, then. – Enlico Oct 15 '20 at 17:20
  • This command outputs 4 certs, stripping the 5th. That's what i am looking for. But the 4th cert looks broken because it does not have *-----END CERTIFICATE-----*. I'd like to see 4 valid cert in output. (Of course it is easy to add it manually with `echo`) – kyb Oct 16 '20 at 06:54
  • 1
    @kyb I have fixed it (upon Ed's suggestion, honestly). Basically I had `(.*)\n` which was not capturing, thus deleting, the last newline at the EOF, which is required by some VS editors (VS = very stupid); changing that to `(.*\n)` keeps the last newline in the captured group and therefore in the output, thus making those editors happy. – Enlico Oct 16 '20 at 07:04
  • yes. now it works as expected. thank you. I did not know about `-z` before. – kyb Oct 16 '20 at 07:27
  • I do not use VS editor. In my input certs chain there is no last empty line. And therefore it did not work. – kyb Oct 16 '20 at 07:29
3

With any awk alone:

$ awk '/-----BEGIN CERTIFICATE-----/{printf "%s", rec; rec=""} {rec=rec $0 ORS}' file
-----BEGIN CERTIFICATE-----
MIICPjCCAeSgAwIBAgIRALMMpKnhRM2C7mnKI/rl8ggwCgYIKoZIzj0EAwIwgY4x
CERT1
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
MIIDIjCCAsegAwIBAgIOAMjnPM1wShDmOWUELuIwCgYIKoZIzj0EAwIwgagxCzAJ
CERT2
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
MIIDIDCCAsWgAwIBAgIOAMjnPL8JUbVSmpMadWUwCgYIKoZIzj0EAwIwbDELMAkG
CERT3
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
MIIDBjCCAqygAwIBAgIFFRCCEwYwCgYIKoZIzj0EAwIwgZQxFDASBgNVBAoMC0Ft
CERT4
-----END CERTIFICATE-----

or if you have tac:

$ tac file | awk 'f; /-----BEGIN CERTIFICATE-----/{f=1}' | tac
-----BEGIN CERTIFICATE-----
MIICPjCCAeSgAwIBAgIRALMMpKnhRM2C7mnKI/rl8ggwCgYIKoZIzj0EAwIwgY4x
CERT1
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
MIIDIjCCAsegAwIBAgIOAMjnPM1wShDmOWUELuIwCgYIKoZIzj0EAwIwgagxCzAJ
CERT2
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
MIIDIDCCAsWgAwIBAgIOAMjnPL8JUbVSmpMadWUwCgYIKoZIzj0EAwIwbDELMAkG
CERT3
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
MIIDBjCCAqygAwIBAgIFFRCCEwYwCgYIKoZIzj0EAwIwgZQxFDASBgNVBAoMC0Ft
CERT4
-----END CERTIFICATE-----
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
  • Have you any idea what the OP complains about with this output? (See their comment under my question.) – Enlico Oct 15 '20 at 17:29
  • 1
    @Enrico I suspect they're using an editor or some other tool to verify the output that was discarding/hiding the last line of your output because it didn't end in a newline (I know you fixed that now) so to them that final END line was missing. – Ed Morton Oct 15 '20 at 17:32
  • 1
    Oooooow, I thought that they were referring to the original last line! They were obviously referring to the error you corrected in my code. Thanks again, then. – Enlico Oct 15 '20 at 17:34
  • 1
    You're welcome. I'm just guessing that that's what it was about of course but it seems likely and I can't think of anything else! – Ed Morton Oct 15 '20 at 17:35
3

With GNU awk using gensub you could try following, written and tested based on shown samples only.

awk -v RS="" -v regex="(.*)\n(-----BEGIN CERTIFICATE-----.*)" '
{
  print gensub(regex,"\\1","1",$0)
}' Input_file
RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
  • 1
    As far as I understand RS="" is for "slurp mode" so $0 contains full input file. Right? – kyb Oct 16 '20 at 07:32
  • 2
    @kyb no, `RS=""` (or `RS=''`) is for "paragraph mode"` where records are separated by blank lines. `RS='^$'` (if your awk version supports multi-char RS) is for "slurp mode" where $0 contains the full input file. They behave similarly when the input doesn't contain any blank lines (which is probably your situation) except that with `RS=""`$0 will not contain the last newline in the file while with `RS='\0'` it will. To see the difference try `echo 7 | awk -v RS='' '{print "<" $0 ">"}'` and `echo 7 | awk -v RS='^$' '{print "<" $0 ">"}'` and note the newline before `>` in the first output. – Ed Morton Oct 16 '20 at 17:29
  • 1
    @EdMorton, Thank you sir for clarifying here sorry my understanding was wrong. – RavinderSingh13 Oct 16 '20 at 17:53