How do I read into a .txt and extract a certain string corresponding to a found string?

Question

A folder contains a README.txt and several dicom files named emr_000x.sx (where x are numerical values). In the README.txt are different lines, one of which contains the characters "xyz" and a corresponding emr_000x.sx in the line.

I would like to: read into the .txt, identify which line contains "xyz", and extract the emr_000x.sx from that line only. For reference, the line in the .txt is formatted in this way:

A:emr_000x.sx,  B:00001, C:number, D(characters)string_string_number_**xyz**_number_number

I think using grep might be helpful, but am not familiar enough to bash coding myself. Does anyone know how to solve this? Many thanks!

Does `grep -F '_xyz_' README.txt | grep -o 'emr_000[0-9]\.s[0-9]'` work? — Discussian, Dec 09 '22 at 18:38
are there any other commands that need to follow this? this on its own didn't work unfortunately — sk23, Dec 09 '22 at 19:24

score 0 · Answer 1 · answered Dec 09 '22 at 18:46

0

You can use awk to match fields on you csv:

awk -F, '$4 ~ "xyz" {sub(/^A:/, "", $1); print $1}'

answered Dec 09 '22 at 18:46

Diego Torres Milano

65,697
9
111
134

score 0 · Answer 2 · answered Dec 09 '22 at 19:47

I like sed for this sort of thing.

 sed -nE '/xyz/{ s/^.*A:([^,]+),.*/\1/; p; }' README.txt

This says, "On lines where you see xyz replace the whole line with the non-commas between A: and a comma, then print the line."

-n is no printing unless I say so. (p means print.) -E just means to use Extended regexes.

/xyz/{...} means "on lines where you see xyz do the stuff between the curlies."
s/^.*A:([^,]+),.*/\1/ will substitute the matched part (which should be the whole line) with just the part between the parens.

How do I read into a .txt and extract a certain string corresponding to a found string?

2 Answers2