0

I am using a small program written by someone else in bash that runs according to cron on my Synology NAS and basically it does search for subtitles for my movies collection and convert their encoding to utf8 if needed.

In general the main bash script calls another subscripts, and unfortunetly it doesn't work 100% as it should. During my investigation I have narrowed down the problem being this specific function in one of the subscripts:

subs_getCharset_SO() {
local file="$1"
local charset=
local et=

tools_isDetected "file" || return $G_RETFAIL

et=$(file \
    --brief \
    --mime-encoding \
    --exclude apptype \
    --exclude tokens \
    --exclude cdf \
    --exclude compress \
    --exclude elf \
    --exclude soft \
    --exclude tar \
    "$file" | wrappers_lcase_SO) || {
    return $G_RETFAIL
}

case "$et" in
    *utf*) charset="UTF8";;
    *iso*) charset="ISO-8859-2";;
    us-ascii) charset="US-ASCII";;
    csascii) charset="CSASCII";;
    *ascii*) charset="ASCII";;
    *) charset="WINDOWS-1250";;
esac

echo "$charset"

}

It turns out that running the file command on every movie file causes always a Segmentation fault. I have reproduced it by running this command in terminal manually:

admin@Synek:/volume1/video/Filmy/Ghostland.2018$ file --brief --mime encoding Ghostland.2018.txt

The output is:

utf-8
Segmentation fault

So my main problem as I think is that the output of the file command is not assigned to et variable. My goal ideally would be to capture the first line of the output and assign it to et variable. Or at least redirect the output to a file, so far I have tried some solutions that I have found in the web:

admin@Synek:/volume1/video/Filmy/Ghostland.2018$ { file --brief --mime-encoding ./Ghostland.2018.txt; } 2> log

which outputs in terminal just the line that I need and omits the Segmentation fault message:

utf8

Running:

admin@Synek:/volume1/video/Filmy/Ghostland.2018$ cat log

Gives:

Segmentation fault

But I just can't find a way to get the first line before Segmentation fault written in the log output file.

Any help appreciated!

wozaq
  • 3
  • 1
  • I think I would be investigating why `file` crashes... maybe reinstall it. – Mark Setchell Sep 23 '18 at 21:07
  • *nod* -- `file` often being used to scan unknown/untrusted content, null pointer dereferences being triggered by unexpected content is the kind of thing that smells potentially-exploitable. – Charles Duffy Sep 24 '18 at 17:48

1 Answers1

1

When stdout is to a TTY, GNU libc (like most implementations) configures line-buffering by default, so output written with the standard C library is printed whenever a full line is complete (since it's assumed that a human is watching and wants to see results as soon as they're available, even if that makes overall execution take longer). By contrast, when stdout is to a FIFO or a file, a larger output buffer is used for better efficiency.

Because a SIGSEGV doesn't allow a program to flush its buffers, that means that data still in the buffer at the time of the failure is lost.

On a system with GNU coreutils, you can configure unbuffered or line-buffered stdout (by default, programs can still override it) using the tool stdbuf:

result=$(stdbuf -o0 file --brief --mime-encoding ./Ghostland.2018.txt)

...or, on systems without GNU coreutils but with expect installed, you can use the tool unbuffer:

result=$(unbuffer file --brief --mime-encoding ./Ghostland.2018.txt)

See BashFAQ #9 for more background on buffering and its control from the shell.

Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
  • Holy crap! That might have fixed that! Now the first line - text encoding - is written in log file and echoed as well. Just need some more testing if everything else now works as it should. I have used the command with stdbuf, the second one isn't available. – wozaq Sep 24 '18 at 17:37