How to write to file the data received in an infinite loop

Question

Is there some way to write to file the data received while in an infinite loop? I have a script that displays web content in my terminal as it appears on the web page. But all my attempts to tee the data have resulted in an empty file. I suppose this is because without ever exiting the loop, there is no opportunity to write anything to file. But I have read about infinite loops filling a hard drive with unwanted data. So it seems like writing the output from a command pipeline should be possible as well.

get_page() {

    osascript -e \
    'tell application "Google_Chrome" to tell window 1 to tell active tab to execute javascript "document.body.innerText"'

}

while get_page | grep -E '[:alnum:]' 
do 
    sleep 1 & 
done < <(get_page) | awk '!x[$0]++'

Note that the only reason this works at all is the awk !x[$0]++ command which (correct me if my explanation is not accurate) reads the input it receives and then removes duplicate lines while also preserving the order of the lines as well. Without that in place, this script would be insane.

Kusalananda · Accepted Answer · 2016-07-21T16:05:13.297

0

A few things:

The loop isn't infinite. It iterates until the getpage function returns non-zero.
You want the loop to execute once a second? In that case, remove the & after the sleep 1 or it will execute much quicker than that! The & puts the sleep process in the background and continues.
You're calling getpage twice. This is probably unintended. I'm not sure what it returns, but you probably want something like the following instead:
```
while true; do
  getpage
  sleep 1
done | awk '!seen[$0]++' | tee output.log
```

If that still doesn't solve it, it is probably, as pointed out in the comments below, due to buffering done by awk. To force awk to flush its output buffer after each line you can do

awk '!seen[$0]++ { print; fflush() }'

A slight issue with this is that the awk process will keep a copy of each unique line of input in memory. This will grow as more unique lines are read from the output of getpage.

edited Jul 21 '16 at 16:05

answered Jul 21 '16 at 06:07

Kusalananda

14,885
3
41
52

From the `uniq` man page - *Repeated lines in the input will not be detected if they are not adjacent, so it may be necessary to sort the files first.* The `awk` command in the question will detect all duplicates and preserve the order. It is something I use a lot. Your solution does output data and allows the `tee` command as well but `awk` is necessary to output only truly unique lines. What I don't understand is when I use your solution and replace `uniq` with `awk` no data is output to my terminal nor to the output log. – I0_ol Jul 21 '16 at 06:51
@user556068 You're right about `uniq`. It was too early in the morning here. Sorry. Does `getpage` by itself produce data on standard output? – Kusalananda Jul 21 '16 at 06:55
Yes it produces whatever is currently onscreen in the browser. And further testing shows that `awk` only outputs data if it is the last command in the pipeline. So I can `tee` to the output log and then use `awk` after that but doing this produced an output log of 3 mb in 60 seconds. – I0_ol Jul 21 '16 at 07:06
@user556068 Any command that filters its input for _all_ duplicates will need to read the _complete_ input. The `awk` script can't output anything until it has consumed all input. With an infinite loop, you will never run out of input, so you get no output. – Kusalananda Jul 21 '16 at 07:19
But it does output to the terminal – I0_ol Jul 21 '16 at 14:15
@Kusalananda you are wrong about the awk command, it will handle each line as it is output from the loop and print it immediately (buffering aside) if it hasn't been printed previously. That is different from how, say, `sort -u` would behave - THAT would behave as you describe. idk what `uniq` does with unsorted input. – Ed Morton Jul 21 '16 at 15:15
1

@user556068 your vanishing output is probably due to buffering. Many (all?) UNIX commands buffer their output differently when writing to a file vs to a terminal (google interactive vs non-interactive buffering). Some commands have an option to change that behavior, e.g. --line-buffered for grep. With awk, you can change the command to `!seen[$0]++{print; fflush()}` to force line buffering even when piping to another command (`seen` is the more common name for the array you're currently calling `x`) – Ed Morton Jul 21 '16 at 15:24
1

@EdMorton This shows why I shouldn't start answering questions here literally five minutes after getting out of bed. You are correct and I will amend the answer yet again. – Kusalananda Jul 21 '16 at 15:27
@Kusalananda There is an error in your solution. You have a `,` where there should be a `;` – I0_ol Jul 21 '16 at 16:05
@user556068 Fixed it as you started typing that comment! But thanks for having sharp eyes :-) – Kusalananda Jul 21 '16 at 16:06

How to write to file the data received in an infinite loop

1 Answers1