Replace each 2 nth occurs from a string in separate files using line range from another file

Question

I have three files:

0.txt e 0-1.txt with same content bellow:

"#sun\t",
"car_snif = house.group_tree(home_cool)\t",
"machine(shoes_shirt.shop)\t",
"car_snif = house.group_tree(home_cool)\t",
"machine(shoes_shirt.shop)\t",
"car_snif = house.group_tree(home_cool)\t",
"machine(shoes_shirt.shop)\t",
"car_snif = house.group_tree(home_cool)\t",
"machine(shoes_shirt.shop)\t",
"#sun\t",
"car_snif = house.group_tree(home_cool)\t",
"machine(shoes_shirt.shop)\t",
"car_snif = house.group_tree(home_cool)\t",
"machine(shoes_shirt.shop)\t",
"car_snif = house.group_tree(home_cool)\t",
"machine(shoes_shirt.shop)\t",
"car_snif = house.group_tree(home_cool)\t",

and source file 1.txt below:

(food, apple,)(bag, tortoise,)
(sky, cat,)(sun, sea,)
(car, shape)(milk, market,)
(man, shirt)(hair, life)
(dog, big)(bal, pink)

For 0.txt I would like to replace every 2 nth occurs from home_cool by 1 nth 1.txt line, but only using only up to the second line of 1.txt (then sed -n '1,2p'), such that my 2.txt output is as below:

"#sun\t",
"car_snif = house.group_tree((food, apple,)(bag, tortoise,))\t",
"machine(shoes_shirt.shop)\t",
"car_snif = house.group_tree((food, apple,)(bag, tortoise,))\t",
"machine(shoes_shirt.shop)\t",
"car_snif = house.group_tree((sky, cat,)(sun, sea,))\t",
"machine(shoes_shirt.shop)\t",
"car_snif = house.group_tree((sky, cat,)(sun, sea,))\t",
"machine(shoes_shirt.shop)\t",
"#sun\t",
"car_snif = house.group_tree((food, apple,)(bag, tortoise,))\t",
"machine(shoes_shirt.shop)\t",
"car_snif = house.group_tree((food, apple,)(bag, tortoise,))\t",
"machine(shoes_shirt.shop)\t",
"car_snif = house.group_tree((sky, cat,)(sun, sea,))\t",
"machine(shoes_shirt.shop)\t",
"car_snif = house.group_tree((sky, cat,)(sun, sea,))\t",

When finishing the process at 2.txt, I would like to replace all 2 nth occurring from home_cool at 0-1.txt by 1 nth 1.txt line using the third line of 1.txt onwards (then sed -n '3,5p'), such that my 3.txt output is as below:

"#sun\t",
"car_snif = house.group_tree((car, shape)(milk, market,))\t",
"machine(shoes_shirt.shop)\t",
"car_snif = house.group_tree((car, shape)(milk, market,))\t",
"machine(shoes_shirt.shop)\t",
"car_snif = house.group_tree((man, shirt)(hair, life))\t",
"machine(shoes_shirt.shop)\t",
"car_snif = house.group_tree((man, shirt)(hair, life))\t",
"machine(shoes_shirt.shop)\t",
"#sun\t",
"car_snif = house.group_tree((dog, big)(bal, pink))\t",
"machine(shoes_shirt.shop)\t",
"car_snif = house.group_tree((dog, big)(bal, pink))\t",
"machine(shoes_shirt.shop)\t",
"car_snif = house.group_tree((car, shape)(milk, market,))\t",
"machine(shoes_shirt.shop)\t",
"car_snif = house.group_tree((car, shape)(milk, market,))\t",

With the line below I can separate into two steps the replacement of home_cool at 0.txt (first step sed -n '1,2p' and second step sed -n '3,5p'). But I would like to save the first step in 2.txt and the second step in 3.txt:

awk 'NR==FNR {a[NR]=$0; n=NR; next}/home_cool/ { gsub("home_cool", a[int((++i-1)%(n*2)/2)+1])}1' <(cat 1.txt | tee >(sed -n '1,2p') >(sed -n '3,5p')) 0.txt >> 2.txt

so what I really wanted was something like (pseudocode below):

awk 'NR==FNR {a[NR]=$0; n=NR; next}/home_cool/ { gsub("home_cool", a[int((++i-1)%(n*2)/2)+1])}1' <(cat 1.txt | tee >(sed -n '1,2p') >(sed -n '3,5p')) | "to sed -n '1,2p' make" 0.txt >> 2.txt | "to sed -n '3,5p' make" 0-1.txt >> 3.txt

How could I do this by maintaining a command line without breaking in several awk fragments isolated?

Note: perhaps the title of the question should be "multiple inputs, same process, different outputs"

Related, I assume: https://stackoverflow.com/q/69867423/4162356 Are _2 nth_ and _1 nth_ 2nd and 1st? — James Brown, Nov 07 '21 at 11:50
@JamesBrown Yes, related, but not enough I believe for this new question. — 7beggars_nnnnm, Nov 07 '21 at 12:33
@Cyrus I thinks that may be unfair. Did they actually delete a question with answers? I only saw that they deleted one question - a previous version of this one, because it was hard to articulate the problem clearly. Let alone for someone who (apparently) has english as a second language. The problem is described more clearly in this version. OP should have edited their previous question, instead of deleting and reposting, and maybe had it reviewed by a coworker or friend first. — dan, Nov 07 '21 at 13:15
@dan I agree with Cyrus, what happens is that I have already deleted the old question, the most correct is to keep this new question and if I need to edit it completely if necessary. — 7beggars_nnnnm, Nov 07 '21 at 14:19
@dan: Someone deleted my comment. An [example](https://stackoverflow.com/questions/69788918/repeat-nth-number-until-the-nth-match-while-appending-in-different-contexts-us) for a question with answer. — Cyrus, Nov 07 '21 at 14:19
It's going to be hard to program this: you have 2 separate algorithms to generate 2.txt and 3.txt. — glenn jackman, Nov 07 '21 at 14:24
@Cyrus I see no answer in this old question deleted, and as far as I know I can not delete questions already answered, what was there in this case is who someone answered but deleted their own answer and then I thought it better to delete it — 7beggars_nnnnm, Nov 07 '21 at 14:25
@glennjackman I believe it will be necessary to store the outputs in variables, and maybe use `tee`, but so far I have not had a great progress — 7beggars_nnnnm, Nov 07 '21 at 14:29
@7beggars_nnnnm: In this case, the answer was deleted by the author after a comment from you. — Cyrus, Nov 07 '21 at 14:32

markp-fuso · Answer 1 · 2021-11-07T16:41:00.743

Referring to OP's previous Q&A ...

While it's certainly possible we could modify the (accepted answer to the) previous Q&A to perform these split operations across the various inputs, I'd vote for simplicity over complexity by breaking this new question into two separate operations, eg:

awk '... from previous Q&A ...' <(head -2 1.txt) 0.txt   > 2.txt
awk '... from previous Q&A ...' <(tail +3 1.txt) 0-1.txt > 3.txt

Stripping out the unnecessary /house_cool01/{...} line of code this becomes:

awk 'NR==FNR {a[NR]=$0; n=NR; next} /home_cool/ {gsub("home_cool", a[int((i++)%(n*2)/2)+1] )} 1' <(head -2 1.txt) 0.txt   > 2.txt
awk 'NR==FNR {a[NR]=$0; n=NR; next} /home_cool/ {gsub("home_cool", a[int((i++)%(n*2)/2)+1] )} 1' <(tail +3 1.txt) 0-1.txt > 3.txt

These generate:

$ cat 2.txt
"#sun\t",
"car_snif = house.group_tree((food, apple,)(bag, tortoise,))\t",
"machine(shoes_shirt.shop)\t",
"car_snif = house.group_tree((food, apple,)(bag, tortoise,))\t",
"machine(shoes_shirt.shop)\t",
"car_snif = house.group_tree((sky, cat,)(sun, sea,))\t",
"machine(shoes_shirt.shop)\t",
"car_snif = house.group_tree((sky, cat,)(sun, sea,))\t",
"machine(shoes_shirt.shop)\t",
"#sun\t",
"car_snif = house.group_tree((food, apple,)(bag, tortoise,))\t",
"machine(shoes_shirt.shop)\t",
"car_snif = house.group_tree((food, apple,)(bag, tortoise,))\t",
"machine(shoes_shirt.shop)\t",
"car_snif = house.group_tree((sky, cat,)(sun, sea,))\t",
"machine(shoes_shirt.shop)\t",
"car_snif = house.group_tree((sky, cat,)(sun, sea,))\t",

$ cat 3.txt
"#sun\t",
"car_snif = house.group_tree((car, shape)(milk, market,))\t",
"machine(shoes_shirt.shop)\t",
"car_snif = house.group_tree((car, shape)(milk, market,))\t",
"machine(shoes_shirt.shop)\t",
"car_snif = house.group_tree((man, shirt)(hair, life))\t",
"machine(shoes_shirt.shop)\t",
"car_snif = house.group_tree((man, shirt)(hair, life))\t",
"machine(shoes_shirt.shop)\t",
"#sun\t",
"car_snif = house.group_tree((dog, big)(bal, pink))\t",
"machine(shoes_shirt.shop)\t",
"car_snif = house.group_tree((dog, big)(bal, pink))\t",
"machine(shoes_shirt.shop)\t",
"car_snif = house.group_tree((car, shape)(milk, market,))\t",
"machine(shoes_shirt.shop)\t",
"car_snif = house.group_tree((car, shape)(milk, market,))\t",

but this fragments into several blocks of awk, exactly what I do not seek — 7beggars_nnnnm, Nov 07 '21 at 15:28
relatively clean/simple implementation; easier to understand/maintain than an alternative/likely-convoluted solution; [KISS principle](https://en.wikipedia.org/wiki/KISS_principle); could you update the question to explain why you need a single `awk` call? — markp-fuso, Nov 07 '21 at 15:36
I'm trying to give a solution as well, meanwhile I try to make the issue clearer. Thanks. — 7beggars_nnnnm, Nov 07 '21 at 15:40
Thanks! I am schematizing everything I need to ask in the future to work with the maximum of simplicity, instead of presenting very fragmented problems of your real context. — 7beggars_nnnnm, Nov 07 '21 at 17:58

score 1 · Accepted Answer · answered Nov 07 '21 at 17:00

1

This works:

awk \
'FNR==1 {++f}
f==1 {a[i++]=$0}
f==2 {if ($0~/home_cool/) {gsub(/home_cool/, a[int(j++/2)%2]) }; print > "2.txt"}
f==3 {if ($0~/home_cool/) {gsub(/home_cool/, a[int(k++/2)%3 + 2]) }; print > "3.txt"}' \
    1.txt 0.txt 0-1.txt

Alternatives to hardcoding "2.txt" and "3.txt" include:

Use variables assigned with -v outfile1=2.txt -v outfile2=2.txt
Replace them with outfile, and use this arg list: 1.txt outfile=2.txt 0.txt outfile=3.txt 0-1.txt
Replace them with ARGV[4] and ARGV[5], add the line f==4 {exit}, and use this arg list: 1.txt 0.txt 0-1.txt 2.txt 3.txt

Caveats:

If a given file is empty, it will not cause f to increment, and break things accordingly. In gawk, ENDFILE can be used instead. See this answer: How to get the filenumber that is being processing by an awk script?

answered Nov 07 '21 at 17:00

dan

4,846
6
15

It worked exactly as expected, I will be reading the documentation with your response. Thank you very much to the user markp-fuso. I really will be avoiding deleting questions, I often think they do not have any resources to be answered and I end up deleting it to have a time and renew my thoughts. Thanks user Cyrus also, as others users by council. – 7beggars_nnnnm Nov 07 '21 at 17:56
maybe this is a conversation for a chat but I'm trying to put a shell command via `system()` inside `awk` before `f==3` print `3.txt`, the only issue is that the shell command is coming in loop and being repeated, should I open a new question? – 7beggars_nnnnm Nov 08 '21 at 00:57
1

@7beggars_nnnnm You can put `if (FNR==1) {system("#")}` as the first command inside the `f==3` block. – dan Nov 08 '21 at 08:59
I did testing and it worked, even when I add second shell command waiting before a third replacement process with the new output `4.txt`. look here https://onlinegdb.com/brgArNpwF. If you wish to answer here https://stackoverflow.com/questions/69877886/run-shell-command-inside-awk-only-once-using-system . – 7beggars_nnnnm Nov 08 '21 at 12:03
see `3.sh` https://onlinegdb.com/brgArNpwF – 7beggars_nnnnm Nov 08 '21 at 12:20
Could you explain better about the use of number of total rows of the `1.txt` file, and about `array indides` used in this procedure? – 7beggars_nnnnm Nov 09 '21 at 23:16

Replace each 2 nth occurs from a string in separate files using line range from another file

2 Answers2

Linked