0

My question can be split in 2. First I have a data file (file.dat) that looks like:

Parameter stuff number 1 (1029847) word index 2 (01293487), bla bla
Parameter stuff number 3 (134123) word index 4 (02983457), bla bla
Parameter stuff number 2 (109847) word index 3 (1029473), bla bla
etc...

I want to extract the number in brackets and save it to a variable for example the first one in line one to be 'x1', the second on the same line to be 'y1', for line 2 'x2' and 'y2', and so on... The numbers change randomly line after line, their position (in columns, if you like) stays the same line after line. The number of lines is variable (0 to 'n'). How can I do this? Please.

I have search for answers and I get lost with the many different commands one can use, however those answers attend to particular examples where the word is at the end or in brackets but only one per line, etc. Anyhow, here is what I have done so far (I am newby):

1) I get rid of the characters that are not part of the number in the string

sed -i 's/(//g' file.dat
sed -i 's/),//g' file.dat

2) Out of frustration I decided to output the whole lines to variables (getting closer?) 2.1) Get the number of lines to iterate for:

numlines=$(wc -l < file.dat)

2.2) Loop to numlines (I havent tested this bit yet!)

for i in {1..$numlines}
do
line${!i}=$(sed -n "${numlines}p" file.dat)
done

2.3) I gave up here, any help appreciated.

The second question is similar and merely out of curiosity: imagine a database separated by spaces, or tabs, or comas, any separator; this database has a variable number of lines ('n') and the strings per line may vary too ('k'). How do I extract the value of the 'i'th line on the 'j'th string, and save it to a variable 'x'?

Leo
  • 35
  • 5

3 Answers3

3

Here is a quick way to store value in bash array variable.

x=("" $(awk -F"[()]" '{printf "%s ",$2}' file))
y=("" $(awk -F"[()]" '{printf "%s ",$4}' file))

echo ${x[2]}
134123

If you are going to use these data for more jobs, I would have done it in awk. Then you can use internal array in awk

awk -F"[()]" '{x[NR]=$2;y[NR]=$4}' file
Jotne
  • 40,548
  • 12
  • 51
  • 55
  • That's a nice way to do it. Why do you need the `""` at the beginning of the arrays? – pfnuesel Nov 08 '13 at 17:40
  • So that `x[1]=line1`. Without `""` `x[0]=line1` and `x[1]=line2` etc – Jotne Nov 08 '13 at 17:51
  • I did not think of x1 as a part of an array i.e. x[1] thanks a lot this solves the first part of the question! – Leo Nov 08 '13 at 18:06
  • Give an example on how data should looks like in example 2, we may be able to help. – Jotne Nov 08 '13 at 18:26
  • the example I was thinking of was random number of lines with random number of strings separated by a common separator (i.e. space). – Leo Nov 11 '13 at 21:13
2
#!/usr/bin/env bash

x=()
y=()

while read line; do
    x+=("$(sed 's/[^(]*(\([0-9]*\)).*/\1/' <<< $line)")
    y+=("$(sed 's/[^(]*([^(]*(\([0-9]*\)).*/\1/' <<< $line)")
done < "data"

echo "${x[@]}"
echo "${y[@]}"

x and y are declared as arrays. Then you loop over the input file and invoke a sed command to every line in your input file.

x+=(data) appends the value data to the array x. Instead of writing the value we want to store in the array, we use command substitution, which is done with $(command), instead of appending the literal meaning of $(command) to the array, the command is executed and its return value is stored in the array.

Let's look at the sed commands:

's' is the substitute command, with [^(]* we want to match everything, except (, then we match (. The following characters we want to store in the array, to do that we use \( and \), we can later reference to it again (with \1). The number is matched with [0-9]*. In the end we match the closing bracket ) and everything else with .*. Then we replace everything we matched (the whole line), with \1, which is just what we had between \( and \).

If you are new to sed, this might be highly confusing, since it takes some time to read the sed syntax.

The second sed command is very similar.

pfnuesel
  • 14,093
  • 14
  • 58
  • 71
1

How do I extract the value of the 'i'th line on the 'j'th string, and save it to a variable 'x'?

Try using awk

x=$(awk -v i=$i -v j=$j ' NR==i {print $j; exit}' file.dat)

I want to extract the number in brackets and save it to a variable for example the first one in line one to be 'x1', the second on the same line to be 'y1', for line 2 'x2' and 'y2', and so on...

Using awk

x=($(awk -F'[()]' '{print $2}' file.dat))
y=($(awk -F'[()]' '{print $4}' file.dat))

x1 can be accessed as ${x[0]} and y1 as ${y[0]}, likewise for other sequence of variables.

jkshah
  • 11,387
  • 6
  • 35
  • 45
  • Your awk addition is just a copy of my posted `awk` solution. And he like x[1]=line1 and not x[1]=line2 – Jotne Nov 08 '13 at 17:44
  • @Jotne Sorry to say but I didn't see your post. I had posted general solution a min ago and was working on getting an array. It turned out to be similar but not the same. If you observe, you have intelligently indexed array while I could not achieve this. Can't two `awk` solution be similar? If you insist, I don't mind deleting my ans. In that case can you please post general ans, which seems more important to me. – jkshah Nov 08 '13 at 17:47
  • It can course be similar, but yours was change after you have posted yours from a different solution. Also see my comment of x1=line1. You can add a "" in front to skip the 0 record. – Jotne Nov 08 '13 at 17:50
  • @Jotne If you observe I have two partition in my solution. As I mentioned, I had posted one earlier and had edited ans to post another one after working out on it. I agree with you, but then if I do the same, it will be exact copy. Please note that you use `printf`, I use `print` only. I would like to retain my original ans to get away from any plagiarism. – jkshah Nov 08 '13 at 17:52
  • @Jotne I appreciate your answers and by no means try to copy ans from others. I'm here to stress and test my scripting in addition to helping others. Many a times I have seen your ans posted and left answering. You can click on *edited .. mins ago* and figure out what I changed at what time, if it clarifies the concern – jkshah Nov 08 '13 at 17:58
  • I have no problem with this, it can off course be similar, no hard feelings. But you should change it to `y=("" $(awk -F'[()]' '{print $4}' t))` to conform OP request. – Jotne Nov 08 '13 at 18:02