using awk to lookup and insert data

Question

In my continuing crusade not to use MS Excel, I'd like to process some data, send it to a file, and then insert some records from a separate file into a third file using field $1 as the index. Is this possible?

I have data like this:

2600,foo,stack,1,04/02/2015,ACH Payment,ACH Settled,1500
2600,foo,stack,2,04/06/2015,Credit Card Sale,Settled,100
2600,foo,stack,3,04/07/2015,Credit Card Sale,Settled,157.13
2600,foo,stack,4,04/07/2015,ACH Credit,ACH Settled,.03

I have this to group it:

cat group.awk
#!/usr/bin/awk -f

BEGIN {
    OFS = FS = ","
}
NR > 1 {
    arr[$1 OFS $2 OFS $3]++
}
END {
    for (key in arr)
        print key, arr[key]
}

The group makes it like this:

2600,foo,stack,4

Simple multiplication is applied to fields 5, 6 and 7 where applicable--depends on fields 3.

In this example we can say the finished record looks like this:

2600,foo,stack,4,.2,19.8

Now in a separate file, I have this data:

2600,registered user,5hPASLJlHlgJR4AQc9sZQ==

basic flow is:

awk -f group.awk data.csv | awk -f math.awk > finished.csv

Then use awk (if it can do this) to look up field $1 in finished.csv and find corresponding record above in the separate file(bill.csv) and print to a third file or insert into bill.csv.

Expected output in third file(bill.csv):

x,y,,1111111,2600,,,,,,,19.8,,,registered user,,,,,,,,,,RS,,,N5hPASLJlHlgJR4AQc9sZQ==,z,a

x,y,,1111111,RS,z,a will be pre-populated to I only need to insert three new records.

Is this something awk can accomplish?

Edit

Field $3 is the accountID that sets the multiplication on 5, 6 and 7.

Here's the idea:

bill.awk:

NR>1{if($3=="stack" && $4>199) $5=$4*0.03; 
if($3=="stack" && $4<200) $5=$4*0.05
if($3=="user") $5=$4*.01
}1

total.awk:

    awk -F, -v OFS="," 'NR>1{if($3=="stack" && $5<20) $6=20-$5;
if($3=='stack && $5>20) $6=0;}1'

This part is working and final output is like above:

2600,foo,stack,4,.2,19.8

4*.05 = .2 & 20 - .2 = 19.8

But the minimium charge is $20

So we'll correct it:

4*.05 = .2 & 20 - .2 = 20

Extra populated fields came from a separate file (bill.csv) and I need to fill in 20 to the correct record on bill.csv

bill.csv contains everything needed except the 20

before:

x,y,,1111111,2600,,,,,,,,,,,registered user,,,,,,,,,,RS,,,N5hPASLJlHlgJR4AQc9sZQ==,z,a

after:

x,y,,1111111,2600,,,,,,,20,,,registered user,,,,,,,,,,RS,,,N5hPASLJlHlgJR4AQc9sZQ==,z,a

Is this a better explanaiton? Go on the assumption that group.awk, bill.awk and total.awk are working correctly. I just need to extract the correct total for field $1 and put it in bill.csv in the correct spot.

Your input file has **5** lines that start with `2600,foo,stack` so why does your first intermediate output end with 4 `2600,foo,stack,4`? You say `Simple multiplication is applied to fields 5, 6 and 7 where applicable--depends on fields 3.` but field 3 is a text string (`stack`) and unless you tell us what the multiplication is we can't write a script to do it. Where did all those extra populated and empty fields in the final output (`x,y,,1111111,2600,,,,,,,10.4,,,registered user,,,,,,,,,,RS,,,N5hPASLJlHlgJR4AQc9sZQ==,z,a`) come from and where did `foo,stack,4` go? — Ed Morton, May 16 '15 at 14:25
The 4th record of your input (`2600,foo,stack,404/07/2015,ACH Credit,ACH Settled,.03`) has 1 less field than the rest, is that correct? wrt `x,y,,1111111,RS,z,a will be pre-populated` - in which file? `I only need to insert three new records` - what new records (or do you mean new fields)? Please clean up your question to show PRECISE, TESTABLE input files and the actual associated output and algorithms. — Ed Morton, May 16 '15 at 14:36
A good finished record looks like this: 2601,FOS - Las Vegas,stack,468,23.4,0,23.4 and I'd just need to get that 23.4 out and insert it into bill.csv based on the 2601. — Registered User, May 16 '15 at 19:52

score 0 · Answer 1 · answered May 16 '15 at 07:48

0

Is maybe this last awk what you need. I´ve tried to understand what you want and I think is just that merging awk way:

For explaininng: We first save the fileA in an array with the first key as the index. Then we search for each line o file B if the field1 is between the indexes of our array, and if it´s, we print all data from two files together

awk -F"," 'BEGIN {while (getline < "record.dat"){ a[$1]=$0; }} {if($1 in a){ print a[$1]","$0}}' file.dat
2600,foo,stack,4,10,10.4,2600,registered user,5hPASLJlHlgJR4AQc9sZQ==

answered May 16 '15 at 07:48

Alejandro Teixeira Muñoz

2,758
1
22
31

1

That will go into an infinite loop if/when `record.dat` can't be read for any reason. Don't use getline unless it's a specific, appropriate application and you have read and fully understand http://awk.info/?tip/getline. The script has other non-awkishness issues too. – Ed Morton May 16 '15 at 14:19

Ed Morton · Answer 2 · 2015-05-16T15:05:41.467

0

This is the kind of solution you need:

$ cat fileA
2600,foo,stack,1,04/02/2015,ACH Payment,ACH Settled,1500
2600,foo,stack,2,04/06/2015,Credit Card Sale,Settled,100
2600,foo,stack,3,04/07/2015,Credit Card Sale,Settled,157.13
2600,foo,stack,4,04/07/2015,ACH Credit,ACH Settled,.03
2600,foo,stack,5,04/09/2015,ACH Payment,ACH Settled,147.10

$ cat fileB
2600,registered user,5hPASLJlHlgJR4AQc9sZQ==

$ cat tst.awk
BEGIN { FS=OFS="," }
NR==FNR{
    cnts[$1][$2FS$3]++
    next
}
{
    for (val in cnts[$1]) {
        cnt = cnts[$1][val]
        print $1, val, cnt, cnt*2.5, $2, $3
    }
}

$ awk -f tst.awk fileA fileB
2600,foo,stack,5,12.5,registered user,5hPASLJlHlgJR4AQc9sZQ==

but until you update your question we can't provide any more concrete help than that.

The above uses GNU awk 4.* for true 2D arrays.

edited May 16 '15 at 15:05

answered May 16 '15 at 14:47

Ed Morton

188,023
17
78
185

Baiscally I have a completed file (file.csv) and need to get out one piece of information from that I need to put it in bill.csv that has a matching field $1 in file.csv. Is this possible? – Registered User May 18 '15 at 01:05
Yes, anything is possible. You started out with 2 awk scripts named `group.awk` and `math.awk` and it wasn't clear why you had 2 separate scripts. blus files named `data.csv` and `finished.csv`. Your updated question now additionally has `bill.awk`, `total.awk`, and `bill.csv`. It's confusing enough that I can't be bothered to try to figure out what's relevant and what isn't. Good luck! – Ed Morton May 18 '15 at 01:23

using awk to lookup and insert data

2 Answers2