How to extract a value by searching for two words in different lines and getting the value of second one

Question

How to search for a word, once it's found, in the next line save a specific value in a variable.

The json bellow is only a small part of the file.

Due to this specific file json structure be inconsistent and subject to change overtime, it need to by done via search like grep sed awk.

however the paramenters bellow will be always the same.

search for the word next
get the next line bellow it
extract everything after the word page_token not the boundary "
store in a variable to be used

test.txt:

"link": [
    {
      "relation": "search",
      "url": "aaa/ww/rrrrrrrrr/aaaaaaaaa/ffffffff/ccccccc/dddd/?token=gggggggg3444"
    },
    {
      "relation": "next",
      "url": "aaa/ww/rrrrrrrrr/aaaaaaaaa/ffffffff/ccccccc/dddd/?&_page_token=121_%_@212absa23bababa121212121212121"
    },
]

so the desired output in this case is:

PAGE_TOKEN="121_%_@212absa23bababa121212121212121"

my attempt:

PAGE_TOKEN=$(cat test.txt| grep "next" | sed 's/^.*: *//;q')

no lucky..

I agree that you need `jq`, but given your comment below, `sed -n '/next/{n;s/.*page_token=//p;q}' testDat` may help, but is likely to blow up sooner than later. Good luck. — shellter, Jan 12 '23 at 00:19
`sed -n '/"next",$/{N;s/^.*page_token=$[^"]*$"/\1/;p}' file.txt` Enjoy... — Jetchisel, Jan 12 '23 at 00:21
If you cannot use a proper JSON parser, you might try something like: `p=$(grep 'page_token' test.txt); p="${p##*=}"; echo "PAGE_TOKEN=\"$p"` or ` p=$(grep -A1 'next' test.txt); echo "$p"; p="${p##*=}"; echo "PAGE_TOKEN=\"$p"` — j_b, Jan 12 '23 at 00:21
Thanks alot @Jetchisel, shellter and j_b. all your answers worked. would you be able to post it as the answer explaining how the command works. I will accept as the answer — Peter, Jan 12 '23 at 00:29
@Peter, I'll take a pass on posting a `sed` answer since there is already a `jq` one posted. — Jetchisel, Jan 12 '23 at 00:40

score 2 · Answer 1 · answered Jan 12 '23 at 00:07

2

Presuming your input is valid json, one option is to use:

cat test.json
[{
        "relation": "search",
        "url": "aaa/ww/rrrrrrrrr/aaaaaaaaa/ffffffff/ccccccc/dddd/?token=gggggggg3444"
    },
    {
        "relation": "next",
        "url": "aaa/ww/rrrrrrrrr/aaaaaaaaa/ffffffff/ccccccc/dddd/?&_page_token=121_%_@212absa23bababa121212121212121"
    }
]

PAGE_TOKEN=$(cat test.json | jq -r '.[] | select(.relation=="next") | .url | gsub(".*=";"")')
echo "$PAGE_TOKEN"
121_%_@212absa23bababa121212121212121

answered Jan 12 '23 at 00:07

jared_mamrot

22,354
4
21
46

unfortunately, it's a huge json which I only posted the part needed. One of the reasons I tried doing it with `grep` bc the json is super nested and it structure can can change – Peter Jan 12 '23 at 00:15
`jq -r ... test.json` without the `cat` maybe? – Jetchisel Jan 12 '23 at 00:33
Definitely not insurmountable problems; here are some suggestions for when you're unsure of the structure and your file is large: https://stackoverflow.com/a/56892419/12957340 – jared_mamrot Jan 12 '23 at 02:08

score 2 · Accepted Answer · answered Jan 12 '23 at 09:22

This might work for you (GNU sed):

sed -En '/next/{n;s/.*(page_token=)([^"]*).*/\U\1\E"\2"/p}' file

This is essentially a filtering operation, hence the use of the -n option.

Find a line containing next, fetch the next line, format as required and print the result.

How to extract a value by searching for two words in different lines and getting the value of second one

test.txt:

2 Answers2