-3

A folder contains a README.txt and several dicom files named emr_000x.sx (where x are numerical values). In the README.txt are different lines, one of which contains the characters "xyz" and a corresponding emr_000x.sx in the line.

I would like to: read into the .txt, identify which line contains "xyz", and extract the emr_000x.sx from that line only. For reference, the line in the .txt is formatted in this way:

A:emr_000x.sx,  B:00001, C:number, D(characters)string_string_number_**xyz**_number_number

I think using grep might be helpful, but am not familiar enough to bash coding myself. Does anyone know how to solve this? Many thanks!

Amit Joshi
  • 15,448
  • 21
  • 77
  • 141
sk23
  • 11
  • 2

2 Answers2

0

You can use awk to match fields on you csv:

awk -F, '$4 ~ "xyz" {sub(/^A:/, "", $1); print $1}'
Diego Torres Milano
  • 65,697
  • 9
  • 111
  • 134
0

I like sed for this sort of thing.

 sed -nE '/xyz/{ s/^.*A:([^,]+),.*/\1/; p; }' README.txt

This says, "On lines where you see xyz replace the whole line with the non-commas between A: and a comma, then print the line."

-n is no printing unless I say so. (p means print.) -E just means to use Extended regexes.

/xyz/{...} means "on lines where you see xyz do the stuff between the curlies."
s/^.*A:([^,]+),.*/\1/ will substitute the matched part (which should be the whole line) with just the part between the parens.

Paul Hodges
  • 13,382
  • 1
  • 17
  • 36