0

How to search for a word, once it's found, in the next line save a specific value in a variable.

The json bellow is only a small part of the file.

Due to this specific file json structure be inconsistent and subject to change overtime, it need to by done via search like grep sed awk.

however the paramenters bellow will be always the same.

  1. search for the word next
  2. get the next line bellow it
  3. extract everything after the word page_token not the boundary "
  4. store in a variable to be used
test.txt:
"link": [
    {
      "relation": "search",
      "url": "aaa/ww/rrrrrrrrr/aaaaaaaaa/ffffffff/ccccccc/dddd/?token=gggggggg3444"
    },
    {
      "relation": "next",
      "url": "aaa/ww/rrrrrrrrr/aaaaaaaaa/ffffffff/ccccccc/dddd/?&_page_token=121_%_@212absa23bababa121212121212121"
    },
]

so the desired output in this case is:

PAGE_TOKEN="121_%_@212absa23bababa121212121212121"

my attempt:

PAGE_TOKEN=$(cat test.txt| grep "next" | sed 's/^.*: *//;q')

no lucky..

Peter
  • 544
  • 5
  • 20
  • 3
    Use proper tools to parse json: `jq` – Diego Torres Milano Jan 11 '23 at 23:48
  • 1
    I agree that you need `jq`, but given your comment below, `sed -n '/next/{n;s/.*page_token=//p;q}' testDat` may help, but is likely to blow up sooner than later. Good luck. – shellter Jan 12 '23 at 00:19
  • 1
    `sed -n '/"next",$/{N;s/^.*page_token=\([^"]*\)"/\1/;p}' file.txt` Enjoy... – Jetchisel Jan 12 '23 at 00:21
  • 1
    If you cannot use a proper JSON parser, you might try something like: `p=$(grep 'page_token' test.txt); p="${p##*=}"; echo "PAGE_TOKEN=\"$p"` or ` p=$(grep -A1 'next' test.txt); echo "$p"; p="${p##*=}"; echo "PAGE_TOKEN=\"$p"` – j_b Jan 12 '23 at 00:21
  • Thanks alot @Jetchisel, shellter and j_b. all your answers worked. would you be able to post it as the answer explaining how the command works. I will accept as the answer – Peter Jan 12 '23 at 00:29
  • 2
    @Peter, I'll take a pass on posting a `sed` answer since there is already a `jq` one posted. – Jetchisel Jan 12 '23 at 00:40

2 Answers2

2

Presuming your input is valid json, one option is to use:

cat test.json
[{
        "relation": "search",
        "url": "aaa/ww/rrrrrrrrr/aaaaaaaaa/ffffffff/ccccccc/dddd/?token=gggggggg3444"
    },
    {
        "relation": "next",
        "url": "aaa/ww/rrrrrrrrr/aaaaaaaaa/ffffffff/ccccccc/dddd/?&_page_token=121_%_@212absa23bababa121212121212121"
    }
]

PAGE_TOKEN=$(cat test.json | jq -r '.[] | select(.relation=="next") | .url | gsub(".*=";"")')
echo "$PAGE_TOKEN"
121_%_@212absa23bababa121212121212121
jared_mamrot
  • 22,354
  • 4
  • 21
  • 46
  • unfortunately, it's a huge json which I only posted the part needed. One of the reasons I tried doing it with `grep` bc the json is super nested and it structure can can change – Peter Jan 12 '23 at 00:15
  • `jq -r ... test.json` without the `cat` maybe? – Jetchisel Jan 12 '23 at 00:33
  • Definitely not insurmountable problems; here are some suggestions for when you're unsure of the structure and your file is large: https://stackoverflow.com/a/56892419/12957340 – jared_mamrot Jan 12 '23 at 02:08
2

This might work for you (GNU sed):

sed -En '/next/{n;s/.*(page_token=)([^"]*).*/\U\1\E"\2"/p}' file

This is essentially a filtering operation, hence the use of the -n option.

Find a line containing next, fetch the next line, format as required and print the result.

potong
  • 55,640
  • 6
  • 51
  • 83