Extracting beginning alphabets of a string in bash/ash

Question

How can I extract the beginning alphabetic letters from a string? I want to extract alphabets occurring in the beginning before I hit the first non-alphabetic character.

e.g. If the input string is abcd045tj56 the output should be abcd

Similarly, if the input is jkl657890 the output should be jkl

Can it be done in shell script using awk/sed/cut?

I tried

echo "XYZ123" | awk 'sub(/[[:alpha:]]*/, "")'

But it gives 123 instead of xyz

Then I tried

echo "XYZ123" | awk '{print (/[[:alpha:]]*/)}'

but it gives 1

I want the answer to be XYZ

Try something like `awk 'match($0,/^[a-zA-Z]+/){print substr($0,RSTART,RLENGTH)}' Input_file` should work in any version of `awk`. — RavinderSingh13, Jul 27 '23 at 06:55

RavinderSingh13 · Answer 1 · 2023-07-27T07:09:52.047

5

Converting my comment to an answer here. Using any awk version.

awk '
match($0,/^[a-zA-Z]+/){
  print substr($0,RSTART,RLENGTH)
}
' Input_file

OR:

awk '
match($0, /[^[:alpha:]]/){
  print substr($0, 1, RSTART-1)
}
' Input_file

edited Jul 27 '23 at 07:09

answered Jul 27 '23 at 06:56

RavinderSingh13

130,504
14
57
93

1

Another awk: `awk 'match($0, /[^[:alpha:]]/) {print substr($0, 1, RSTART-1)}'` – anubhava Jul 27 '23 at 07:01

anubhava · Answer 2 · 2023-07-27T07:00:17.340

4

You may use this sed:

sed 's/[^[:alpha:]].*$//'

This sed matches a non-alpha character and everything afterwards and substitutes with an empty string.

Examples:

sed 's/[^[:alpha:]].*$//' <<< 'abcd045tj56'
abcd

sed 's/[^[:alpha:]].*$//' <<< 'XYZ123'
XYZ

sed 's/[^[:alpha:]].*$//' <<< 'jkl657890'
jkl

If you want to do this in bash then:

s='abcd045tj56'
echo "${s/[^[:alpha:]]*}"

abcd

edited Jul 27 '23 at 07:00

answered Jul 27 '23 at 06:54

anubhava

761,203
64
569
643

N.B. This will print leading alpha characters even if no substitution occurs i.e. if a string only consists of alpha characters it will print it however the OP stipulated "I want to extract the alphabets occurring in the beginning before I hit first non-alphabetic character". This may be achieved either `sed 's/^[^[:alpha:]].*$//p;d' file` or `sed -n 's/^[^[:alpha:]].*$//p' file` – potong Jul 28 '23 at 07:26
I think OP needs to clarify this case when `string only consists of alpha characters only` – anubhava Jul 28 '23 at 08:40

score 2 · Answer 3 · answered Jul 27 '23 at 06:54

2

Use grep:

$ grep -Eo '^[A-Za-z]+' <<<"XYZ123"

to only match alphabetic letters at the beginning of the string.

answered Jul 27 '23 at 06:54

Paolo

21,270
6
38
69

blhsing · Answer 4 · 2023-07-27T07:17:43.410

1

You can use awk with a non-alphabet as the field separator so you can get the leading alphabets by printing the first field:

awk -F'[^[:alpha:]]' '{print $1}'

Demo: https://awk.js.org/?snippet=g7eajb

edited Jul 27 '23 at 07:17

answered Jul 27 '23 at 07:12

blhsing

91,368
6
71
106

blhsing · Answer 5 · 2023-07-28T01:23:40.143

1

You can use bash's parameter expansion to remove the first non-alphabet and all characters after it:

s=XYZ123
echo ${s%%[^[:alpha:]]*}

Demo: https://onlinegdb.com/OzjGf53T-

Note that this approach has the performance benefit of avoiding the overhead of spawning a separate process.

edited Jul 28 '23 at 01:23

answered Jul 27 '23 at 08:29

blhsing

91,368
6
71
106

score 1 · Answer 6 · answered Jul 27 '23 at 15:19

I tried

echo "XYZ123" | awk 'sub(/[[:alpha:]]*/, "")'

But it gives 123 instead of xyz

You instructed GNU AWK to replace zero-or-more alphabetic characters using empty string, if you wish to do this task using sub select non-alpha character followed by zero-or-more any characters, namely

echo "XYZ123" | awk '{sub(/[^[:alpha:]].*/, "");print}'

gives output

XYZ

(tested in GNU Awk 5.1.0)

The fourth bird · Answer 7 · 2023-07-27T20:31:24.083

1

Using gnu awk you can print the first 1 or more alphabetic letters:

echo "XYZ123" | awk 'match($0, /[[:alpha:]]+/, a) {print a[0]}'

Output

XYZ

If there should be at least a single a non alphabetic character following, you can use a capture group and print that value:

echo "XYZ123" | awk 'match($0, /([[:alpha:]]+)[^[:alpha:]]/, a) {print a[1]}'

edited Jul 27 '23 at 20:31

answered Jul 27 '23 at 20:18

The fourth bird

154,723
16
55
70

1

gnu awk is awesome – anubhava Jul 27 '23 at 20:21
1

@anubhava If it wasn't for your answers all these years, I would have never got the idea :-) – The fourth bird Jul 27 '23 at 20:21

Extracting beginning alphabets of a string in bash/ash

7 Answers7