1

When I add the body to output list, some wrong names get output. I expected it to output only names for nfl subreddit in both examples. Feature or bug? How can I only output the tuples for subreddit nfl?

The file:

{"author":"403and780","author_flair_css_class":"NHL-EDM4-sheet1-col01-row17","author_flair_text":"EDM - NHL","body":"Don't get why we do this but can't have a Grey Cup GDT.","can_gild":true,"controversiality":0,"created_utc":1517788800,"distinguished":null,"edited":false,"gilded":0,"id":"dtqrsn6","is_submitter":false,"link_id":"t3_7v9yqa","parent_id":"t3_7v9yqa","permalink":"/r/hockey/comments/7v9yqa/game_thread_super_bowl_lii_philadelphia_eagles_vs/dtqrsn6/","retrieved_on":1518931297,"score":2,"stickied":false,"subreddit":"hockey","subreddit_id":"t5_2qiel","subreddit_type":"public"}    
{"author":"kygiacomo","author_flair_css_class":null,"author_flair_text":null,"body":"lol missed the extra wtf","can_gild":true,"controversiality":0,"created_utc":1517788800,"distinguished":null,"edited":false,"gilded":0,"id":"dtqrsn7","is_submitter":false,"link_id":"t3_7vad8n","parent_id":"t3_7vad8n","permalink":"/r/nfl/comments/7vad8n/super_bowl_lii_game_thread_philadelphia_eagles/dtqrsn7/","retrieved_on":1518931297,"score":2,"stickied":false,"subreddit":"nfl","subreddit_id":"t5_2qmg3","subreddit_type":"public"}    
{"author":"shitpostlord4321","author_flair_css_class":null,"author_flair_text":null,"body":"I really hope we get Bleeding Edge before we get the all new all different armor. ","can_gild":true,"controversiality":0,"created_utc":1517788800,"distinguished":null,"edited":false,"gilded":0,"id":"dtqrsn8","is_submitter":false,"link_id":"t3_7v7whz","parent_id":"t3_7v7whz","permalink":"/r/marvelstudios/comments/7v7whz/a_great_new_look_at_iron_mans_avengers_infinity/dtqrsn8/","retrieved_on":1518931297,"score":1,"stickied":false,"subreddit":"marvelstudios","subreddit_id":"t5_2uii8","subreddit_type":"public"}
{"author":"namohysip","author_flair_css_class":null,"author_flair_text":null,"body":"Maybe. I mostly am just doing this to get a story out, and it\u2019s a huge one, so I\u2019m not sure that I\u2019ll be making another fic for many more months. I guess Pokemon Mystery Dungeon just isn\u2019t as popular with the older demographics of AO3.","can_gild":true,"controversiality":0,"created_utc":1517788800,"distinguished":null,"edited":false,"gilded":0,"id":"dtqrsn9","is_submitter":true,"link_id":"t3_7v9psr","parent_id":"t1_dtqrm3t","permalink":"/r/FanFiction/comments/7v9psr/how_do_you_deal_with_bad_reviews/dtqrsn9/","retrieved_on":1518931297,"score":1,"stickied":false,"subreddit":"FanFiction","subreddit_id":"t5_2r5kb","subreddit_type":"public"}
{"author":"SDsc0rch","author_flair_css_class":null,"author_flair_text":null,"body":"if it rates an upvote, I'll click it - I'm not gonna click on low quality      \nnot gonna apologize for it either ","can_gild":true,"controversiality":0,"created_utc":1517788800,"distinguished":null,"edited":false,"gilded":0,"id":"dtqrsna","is_submitter":false,"link_id":"t3_7vaam4","parent_id":"t3_7vaam4","permalink":"/r/The_Donald/comments/7vaam4/daily_reminderif_you_see_any_gray_arrows_on_the/dtqrsna/","retrieved_on":1518931297,"score":4,"stickied":false,"subreddit":"The_Donald","subreddit_id":"t5_38unr","subreddit_type":"public"}
{"author":"scarletcrawford","author_flair_css_class":null,"author_flair_text":null,"body":"Honestly, I wanted Takeshi to stay with Poe, but to each their own ship, I guess.","can_gild":true,"controversiality":0,"created_utc":1517788800,"distinguished":null,"edited":false,"gilded":0,"id":"dtqrsnb","is_submitter":false,"link_id":"t3_7upyc0","parent_id":"t1_dtppyry","permalink":"/r/alteredcarbon/comments/7upyc0/season_1_series_discussion/dtqrsnb/","retrieved_on":1518931297,"score":2,"stickied":false,"subreddit":"alteredcarbon","subreddit_id":"t5_3bzvp","subreddit_type":"public"}
{"author":"immortalis","author_flair_css_class":"vikings","author_flair_text":"Vikings","body":"The ghost of MN kickers will haunt this game.","can_gild":true,"controversiality":0,"created_utc":1517788800,"distinguished":null,"edited":false,"gilded":0,"id":"dtqrsnc","is_submitter":false,"link_id":"t3_7vad8n","parent_id":"t3_7vad8n","permalink":"/r/nfl/comments/7vad8n/super_bowl_lii_game_thread_philadelphia_eagles/dtqrsnc/","retrieved_on":1518931297,"score":2,"stickied":false,"subreddit":"nfl","subreddit_id":"t5_2qmg3","subreddit_type":"public"}
{"author":"KryptoFreak405","author_flair_css_class":"48","author_flair_text":"","body":"His original backstory had him training to be an Imperial officer until a commanding officer ordered him to transport a shipment of slaves. He refused, freed the slaves, one of which was Chewie, and defected to become a smuggler","can_gild":true,"controversiality":0,"created_utc":1517788800,"distinguished":null,"edited":false,"gilded":0,"id":"dtqrsnd","is_submitter":false,"link_id":"t3_7vanzc","parent_id":"t1_dtqr5q5","permalink":"/r/StarWars/comments/7vanzc/solo_a_star_wars_story_big_game_tv_spot/dtqrsnd/","retrieved_on":1518931297,"score":1102,"stickied":false,"subreddit":"StarWars","subreddit_id":"t5_2qi4s","subreddit_type":"public"}
{"author":"thwinks","author_flair_css_class":null,"author_flair_text":null,"body":"Oh. TIL","can_gild":true,"controversiality":0,"created_utc":1517788800,"distinguished":null,"edited":false,"gilded":0,"id":"dtqrsne","is_submitter":false,"link_id":"t3_7v8o0z","parent_id":"t1_dtqg97a","permalink":"/r/gifs/comments/7v8o0z/silly_walk_champion/dtqrsne/","retrieved_on":1518931297,"score":2,"stickied":false,"subreddit":"gifs","subreddit_id":"t5_2qt55","subreddit_type":"public"}
{"author":"Mimi108","author_flair_css_class":"lions","author_flair_text":"Lions","body":"The Big. The Dick. The Nick. ","can_gild":true,"controversiality":0,"created_utc":1517788800,"distinguished":null,"edited":false,"gilded":0,"id":"dtqrsnf","is_submitter":false,"link_id":"t3_7vad8n","parent_id":"t3_7vad8n","permalink":"/r/nfl/comments/7vad8n/super_bowl_lii_game_thread_philadelphia_eagles/dtqrsnf/","retrieved_on":1518931297,"score":2,"stickied":false,"subreddit":"nfl","subreddit_id":"t5_2qmg3","subreddit_type":"public"}

Code example 1, which works OK:

$ cat head_rc.txt | jq -r 'select(.subreddit=="nfl") .author'
kygiacomo
immortalis
Mimi108

Code example 2, which is wrong or unexpected to me:

$ cat head_rc.txt | jq -r 'select(.subreddit=="nfl") .body, .author'
403and780
lol missed the extra wtf
kygiacomo
shitpostlord4321
namohysip
SDsc0rch
scarletcrawford
The ghost of MN kickers will haunt this game.
immortalis
KryptoFreak405
thwinks
The Big. The Dick. The Nick. 
Mimi108

You can see that author 403and780 commented to a hockey subreddit, not nfl, unfortunately.

peak
  • 105,803
  • 17
  • 152
  • 177
Geoffrey Anderson
  • 1,534
  • 17
  • 25
  • Also need to add streaming once I get this syntax correct. The file is too massive for all loading into RAM. I cannot understand the docs sufficiently for using streaming on my code. – Geoffrey Anderson Mar 02 '18 at 17:38
  • 2
    `jq -r 'select(.subreddit=="nfl") | (.body, .author)'` will work too. Though why you'd want to alternate lines instead of generating, say, tab-separated output escapes me. – Charles Duffy Mar 02 '18 at 18:01
  • BTW, are there really a bunch of blank lines in your input? – Charles Duffy Mar 02 '18 at 18:02
  • @CharlesDuffy, it is not true that I want to alternate lines instead of generating, say, tab-separated output. Do you know what happens when you assume? – Geoffrey Anderson Mar 02 '18 at 18:12
  • There are not really a bunch of blank lines in the input. – Geoffrey Anderson Mar 02 '18 at 18:15
  • If you didn't want to alternate lines, then why would you show us code that alternates lines in the context of showing us what you want? The "assumption" was based on the question you chose to ask, and the manner in which you chose to ask it -- that this didn't appear likely or sensible is the whole reason I called it out. – Charles Duffy Mar 02 '18 at 18:25

2 Answers2

2

jq solution:

jq -r 'select(.subreddit == "nfl") as $o | $o.body, $o.author' head_rc.txt
  • ... as $o - assign filtered object to variable o

The output:

lol missed the extra wtf
kygiacomo
The ghost of MN kickers will haunt this game.
immortalis
The Big. The Dick. The Nick. 
Mimi108
RomanPerekhrest
  • 88,541
  • 4
  • 65
  • 105
  • Code looks nice but I cannot find my problem: $ jq -r 'select(.subreddit=="nfl") as o$ | $o.body, $p.author' head_rc.txt jq: error: syntax error, unexpected IDENT, expecting '$' or '[' or '{' (Unix shell quoting issues?) at , line 1: select(.subreddit=="nfl") as o$ | $o.body, $p.author jq: 1 compile error – Geoffrey Anderson Mar 02 '18 at 17:50
  • Fixed code but still get error: $ jq -r 'select(.subreddit=="nfl") as x$ | $x.body, $x.author' head_rc.txt jq: error: syntax error, unexpected IDENT, expecting '$' or '[' or '{' (Unix shell quoting issues?) at , line 1: select(.subreddit=="nfl") as x$ | $x.body, $x.author jq: 1 compile error – Geoffrey Anderson Mar 02 '18 at 17:52
  • @GeoffreyAnderson, it works fine on the file you've posted. I can share a screenshot if needed – RomanPerekhrest Mar 02 '18 at 17:53
  • I can show screen it does not work on file I posted, interestingly. – Geoffrey Anderson Mar 02 '18 at 17:54
  • jq --version jq-1.5-1-a5b5cbe – Geoffrey Anderson Mar 02 '18 at 17:55
  • @GeoffreyAnderson, that's not the issue - I have the same version – RomanPerekhrest Mar 02 '18 at 17:56
  • 1
    @GeoffreyAnderson, it's `as $o`, not `as o$`. The answer has it right; try copy-and-pasting the code exactly before saying it doesn't work. – Charles Duffy Mar 02 '18 at 17:59
  • @CharlesDuffy, I already stated Code looks nice but I cannot find my problem. Please be patient. – Geoffrey Anderson Mar 02 '18 at 18:07
  • @GeoffreyAnderson, ...as you were already told, this already *does* stream input as-written; you don't need to change anything at all. It's only when processing a large JSON object (not a bunch of small individual JSON objects) that code needs to be written differently to permit streaming. – Charles Duffy Mar 02 '18 at 18:27
2

Also need to add streaming once I get this syntax correct.

Some good news - you won't need to use the so-called "streaming parser", because your input has already been chopped up. The "streaming parser" is only needed when the input has one or more individually ginormous JSON entities, whereas you have a (long) stream of small JSON objects.

p.s.

As Charles Duffy suggested, the simplest solution to your selection problem is to use parentheses

jq -r 'select(.subreddit=="nfl") | (.body, .author)' input.json

If CSV or TSV makes sense, then change the parentheses to brackets, and tack on @csv or @tsv, e.g.

select(.subreddit=="nfl") | [.body, .author] | @tsv
peak
  • 105,803
  • 17
  • 152
  • 177