How to subtitute the cat command in a bash file?

Question

I'm working in testing my smt model using different solvers. In this moment I have the file .smt2 which contains the instruction of a model converted in smtlib.
I have created a file .sh which implements the linear search in order to test my model using different solvers.
This is the code of my bash file:

#!/bin/bash

# Usage:
#   ./smt-opt <SMT-LIB file> <obj.var name> <initial bound> <solver> <min/max>
# No (get-model) query should in the SMT-LIB input file.

TIMEOUT=300 # Timeout in seconds. TODO: Set timeout as a parameter.
in_file=$1
obj_var=$2
obj_val=$3

if
  [ "$4" = 'z3' ];
then
  solver_cmd="z3 -in "
elif
  [ "$4" = 'cvc4' ];
then
  solver_cmd="cvc4 --lang smt --produce-models --incremental "
elif
  [ "$4" = 'cvc5' ];
then
  solver_cmd="cvc5 --produce-models --incremental "
else
  echo "Unknown solver!"
  exit 1
fi
if [ "$5" = 'max' ]; then
  # Maximization problem.
  rel='>='
  next=1
else
  # Minimization problem.
  rel='<='
  next=-1
fi

# Creating input/output pipes.
in_pipe=pipe.in
out_pipe=pipe.out
rm $in_pipe $out_pipe pkill sleep 2>/dev/null
mkfifo $in_pipe
mkfifo $out_pipe

# Block input pipe for $TIMEOUT seconds.
sleep $TIMEOUT > $in_pipe &
# Send piped input data to solver, and the output of solver to output pipe.
$solver_cmd < $in_pipe > $out_pipe &
# Feed input pipe with the queries SMT-LIB specified in the input file.
cat $in_file > $in_pipe
# SOL = 0 iff solver hasn't find any solution yet.
SOL=0

while read line < $out_pipe
do
  if [ ! "$line" = 'sat' ]; then
    break;
  fi
  SOL=1
  echo -e "$obj_var = $obj_val\n----------"
  # Updating obj. value (linear search). TODO: Implement binary search.
  obj_val=`awk -v ov=$obj_val -v n=$next 'BEGIN {ov = int(ov) + n; print ov}'`
  if
    [ $obj_val -lt 0 ]
  then
    # Adjusting for unary minus.
    echo "(assert ($rel $obj_var (-"`echo $obj_val | tr '-' ' '`")))" > $in_pipe
  else
    echo "(assert ($rel $obj_var $obj_val))" > $in_pipe
  fi
  echo "(check-sat)" > $in_pipe
#  echo "%%% Solving with $obj_var $rel $obj_val"
done

if
  [[ $SOL -eq 0 ]]
then
  if [ "$line" = 'unsat' ]; then
    echo '=====UNSATISFIABLE====='
  else
    echo '=====UNKNOWN====='
  fi
elif [ "$line" = 'unsat' ]; then
  echo '=========='
fi

rm $in_pipe $out_pipe pkill sleep 2>/dev/null

In particular my problem is in this line:

cat $in_file > $in_pipe

The problem of this instruction is that it takes a lot of times (the dimension of my smtlib file is 80 megabytes). How can I substitute this instruction in order to have a more efficient instruction ?

I am expecting a more efficient instruction which works for this project.

`cat` itself is very fast; if it's being slow here, that's because you're trying to do I/O that's slow, not because `cat` incurs unreasonable overhead. There are certainly alternatives, but none of them will be faster; either reading the input file is slow, or writing to the pipe is slow (and if the latter is the case, probably that's because the thing on the _read_ side of the pipe is being slow). — Charles Duffy, Jun 26 '23 at 16:23
BTW, `$solver_cmd` is inherently bug-prone; see [BashFAQ #50](https://mywiki.wooledge.org/BashFAQ/050). — Charles Duffy, Jun 26 '23 at 16:24
Another note: `while read line <$out_pipe` re-opens `out_pipe` every time it's read. That's almost certainly not what you want to do. Consider something like `while IFS= read -r line <&3; do ...; done 3<"$out_pipe"` — Charles Duffy, Jun 26 '23 at 16:26
_But_, importantly: bash's `read` operation is much, **much** slower than `cat` is. — Charles Duffy, Jun 26 '23 at 16:26
(also, I'd strongly suggest rewriting this to take advantage of the shell's built-in process substitution features rather than creating FIFOs yourself) — Charles Duffy, Jun 26 '23 at 16:27
...that is to say, f/e: `while IFS= read -r line; do ...; done < <(solver_cmd <"$in_file")`. No `cat`s needed, no `mkfifo`s needed. — Charles Duffy, Jun 26 '23 at 16:28
You might also try pasting your script into https://shellcheck.net (per the SO bash tag recommendation) as a first pass for debugging. — j_b, Jun 26 '23 at 16:35
Also, in high-performance code, don't use command substitutions. No backticks, no `$()` -- keep everything in-process so no `fork()`s are needed. — Charles Duffy, Jun 26 '23 at 16:43
Running a separate copy of `awk` for each line of output is another thing that's going to be very, _very_ expensive from a performance perspective. `awk` is fast, but you want to keep one copy running through your whole stream, not start a new copy per line, or the cost of starting up new copies will kill you no matter how fast it is after it's up and running. — Charles Duffy, Jun 26 '23 at 16:51
(Interspersing two different input sources into one file is also, in general, a Bad Idea; UNIX FIFOs aren't built to be used that way -- you'd almost be better off having a program like `tail -f +0` assembling a single stream from multiple input sources, though a great deal of care is needed to ensure ordering -- really, I'd be writing a specialized tool to assemble your stdin stream here if the requirements are at all nontrivial) — Charles Duffy, Jun 26 '23 at 16:52
Anyhow -- I could conceive of you just having a deadlock here, if `echo >$in_pipe` can't run until `cat >$in_pipe` is finished, but `cat >$in_pipe` won't ever complete until the solver has gotten through the input `cat` is writing, and the `read` loop handling the solver's output is blocked waiting for the `echo` to run. That's not a component being _slow_, that's just... well, as I said, it's a deadlock. — Charles Duffy, Jun 26 '23 at 16:54
...if what you intend to be doing is to concatenate the queries you generate into a stream that's in entirety written to the solver's stdin after `in_file` as a whole has finished being written, btw, that would be a reasonable thing to do (and a good reason to use `cat`). Something like `< <(solver_cmd < <(cat "$in_file" "$generated_query_file") )` might be a cheap and only subtly wrong approach to doing that ("subtly wrong" because there are some potential race conditions that could possibly happen if new content is not yet done being written to `generated_query_file` when `in_file` is done). — Charles Duffy, Jun 26 '23 at 17:03

Charles Duffy · Answer 1 · 2023-06-26T17:11:46.190

`cat` is not your performance problem here.

Really, it's not. The things on the output side of the pipe are much slower than the things on the input side; assuming reasonable I/O performance, cat is getting hung waiting for solver_cmd to run, and solver_cmd could well be getting hung waiting for read to run. (Remember, all parts of a pipeline run in parallel, and pipe buffer size is limited -- once the pipe is full, nothing more can be written into it on the input side until something is pulled off the output side).

...but you can replace it with a direct handle on your input file.

Instead of trying to replace cat with something that does a faster copy from a file to a FIFO, we can just attech your solver process direct to the file and not try to use any FIFO at all.

#!/usr/bin/env bash
#              ^^^^- bash only, not sh

solve_type=$4
case $solve_type in
  z3)   solver_cmd=( z3 -in ) ;;
  cvc4) solver_cmd=( cvc4 --lang smt --produce-models --incremental ) ;;
  cvc5) solver_cmd=( cvc5 --produce-models --incremental ) ;;
  *)    echo "ERROR: Unknown solver type" >&2; exit 1;;
esac
  
  
while IFS= read -r line; do
  echo "Processing $line" >&2
  # ...put your logic here as appropriate...
done < <("${solver_cmd[@]}" <"$in_file").

Note the <"$in_file" -- no mkfifo or cat needed. This will be much faster, in particular, when your solver can take advantage of a seekable file descriptor on input to read multiple parts of the input stream in parallel and hand them off to different threads or subprocesses; it'll be a considerably more marginal improvement otherwise.

None of this will help if you have a deadlock

...and it looks like your code does have one.

Remember that only one process can write into a FIFO at a time: if cat is writing into it, then echo can't. And if the cat can't finish until after the echo does (because the output side of your pipeline blocks on that echo's completion and there's enough data to fill the pipe buffer so the solver can't write more content until the echo is complete), then your program will never complete.

If I use this code, it gives me an error related to the file: "No such file or directory". I think if I use this solution I have to handle the file in different way. How I have to handle the file? Precisely the bash file, it is not able to run the the file with the solver. — Lorenzo Cassano, Jun 27 '23 at 08:41
You get "no such file or directory" from _which_ command, _exactly_? — Charles Duffy, Jun 27 '23 at 12:07

How to subtitute the cat command in a bash file?

1 Answers1

cat is not your performance problem here.

...but you can replace it with a direct handle on your input file.

None of this will help if you have a deadlock

`cat` is not your performance problem here.