0

I have a variable containing a list of of names of several other variables. These variable each contain a table. I want to join all of these tables.

The tables could look something like this:

Name Average      Name Average
A 1               A 1.1
B 2               B 2.2
C 3               C 3.3                  etc.
D 4               D 4.4
E 5               E 5.5

My list of variable names is called $all_variables and here is what its content looks like (a lot more variables in the real situation):

echo "$all_variables"

$table1
$table2
$table3
$table4
$table5

To create the parameter list for the join function, I created $all_variables_join, which contains the parameters for the join function:

echo "$all_variables_join"

<(echo "$table1") <(echo "$table2") <(echo "$table3") <(echo "$table4") <(echo "$table5")

I then want to run join (based on first column so I am using default options) using something like this:

join "$all_variables_join" > file.txt

Which would be expanded to

join <(echo "$table1") <(echo "$table2") <(echo "$table3") <(echo "$table4") <(echo "$table5") > file.txt

And file.txt would contain something like this:

Name Average      
A 1 1.1
B 2 2.2
C 3 3.3         etc...         
D 4 4.4
E 5 5.5

However, when I try to run this I get this error:

join "$all_variables_join" > file.txt

join: missing operand after `<(echo "$table1") <(echo "$table2") <(echo "table3") <(echo "$table4") <(echo "$table5")'
Try `join --help' for more information.

Any idea how I could fix this?

Any help is very appreciated!

thanks

EDIT: Fixed the error message, I had copied the wrong one

arielle
  • 915
  • 1
  • 12
  • 29
  • 1
    It's unclear what your're trying to do. Sounds like an XY problem: there might be a totally different solution to your problem. – choroba Sep 27 '16 at 15:30
  • Basically I could reduce this to saying that I want the join parameters to be called from a variable. join "$all_variables_join" > file.txt would be expanded to join <(echo "$table1") <(echo "$table2"), etc. – arielle Sep 27 '16 at 16:19
  • The expansion seems to work well, I just dont get why I am getting this error message – arielle Sep 27 '16 at 16:20
  • That expansion into `"$all_variables_join"` is confusing, and I expect unnecessary, unless I misunderstand what you're doing. If your list of tables is already in `"$all_variables"`, then you simply need to pass that as a list to the `join`, as in: `join $all_variables > file.txt` (no quotation marks around the var). Your method is a complicated way of achieving the same thing, via indirection and temporary file descriptors... – gilez Sep 27 '16 at 16:27
  • If I just pass $all_variables to join, I get this error: join: extra operand `$table1' – arielle Sep 27 '16 at 16:42
  • Bash doesn't interpret expanded strings as code. The join command sees that entire string as a single literal argument. The individual tables won't be expanded and the <(...) sub-shells won't work. Expanded strings can be used as command names or command arguments and virtually nothing else. You need to use `eval` to make bash interpret the string as code in the same way it would on a command line. Also `join` only accepts two files. – ccarton Sep 27 '16 at 17:48
  • do all the tables contain the same number of rows and the same keys: A, B, C, D...? if so, use paste instead of join. – webb Sep 27 '16 at 19:41
  • They do use the same key, but they might not have the same number of rows (some tables may mix a few rows)... – arielle Sep 27 '16 at 20:07

1 Answers1

0

@giles and @ccarton point out that you need to take off the double-quotes around $all_variables. here's an example showing why:

touch 1
touch 2
x='1 2'
ls $x
ls "$x"

however, this won't solve your problem, because, as @ccarton says, join only accepts two files at a time.

one strategy that would work would be to create a column containing all the possible names (A, B, C...):

table=$(echo -e "$table1\n$table2\n$table3\n$table4\n$table5" |
  tail -n+2 |
  awk '{print $1}' |
  sort -u)

and then join in each table one-by-one:

table=$(join -a1 <(echo "$table") <(echo "$table1"))
table=$(join -a1 <(echo "$table") <(echo "$table2"))
table=$(join -a1 <(echo "$table") <(echo "$table3"))
table=$(join -a1 <(echo "$table") <(echo "$table4"))
table=$(join -a1 <(echo "$table") <(echo "$table5"))

it is possible to use a loop instead of explicitly naming table1...table5, but it is most natural to do so if the data is in files instead of variables, e.g.,

mkdir /tmp/tables
echo "$table1" > /tmp/tables/table1
...
echo "$table5" > /tmp/tables/table1
for t in /tmp/tables/*; do
  table=$(join -a1 <(echo "$table") $f)
done

two notes about join: 1. -a keeps the row even if there's no match in the right table. 2. the keys have to be sorted if they aren't already:

table=$(join -a1 <(echo "$table") <(sort -k1 $f))
webb
  • 4,180
  • 1
  • 17
  • 26