1

So.. I've been tasked with converting a bunch of *.doc files to *.pdf utilizing lowriter

What I would like to do is do this in place, but since there is no option to do that using lowriter, I figured I would capture the originating file and path, capture the conversion, and then move the converted file to the originating path, and then delete the original *.doc

The problem is my sed and or awk is weak at best ;) so I cannot figure out how I can "capture" the converted file name from the output.

My Code:

#!/bin/bash

FILES=/my/path/**/*.doc

shopt -s globstar

for f in $FILES; do

    the_file=$f;
    the_orig_dir=$(dirname "$the_file") ;

    converted=$(lowriter --headless --convert-to pdf "$the_file");
    
    echo $converted;
done;

and the output is:

convert /my/path/Archives/Ally/Heavenly Shop.doc -> /my/Heavenly Shop.pdf using filter : writer_pdf_Export
convert /my/path/Archives/Ally2/Solutions Shop.doc -> /my/Solutions Shop.pdf using filter : writer_pdf_Export
convert /my/path/Archives/Ally3/Xpress Shop.doc -> /my/Xpress Shop.pdf using filter : writer_pdf_Export

What I need to do is capture the path/filename of the converted file after the -> and before the :. I just don't know how I can do this. Can someone tell me?

Kevin
  • 2,684
  • 6
  • 35
  • 64
  • I've never heard of `lowriter ` before but does it really not have an option to just output the new file path or give you an option to specify the output file or send it's output to stdout (so you can redirect to an output file name)? – Ed Morton Aug 05 '20 at 13:24
  • it does not have a method to convert inplace... it outputs what I posted in the question... – Kevin Aug 05 '20 at 13:32
  • I'm not talking about converting "inplace", I'm taking about options to control where the output goes. Also - don;t you already know the output file name before you call `lowriter `? It looks like it's always `/my/$(basename "$the_file")`. – Ed Morton Aug 05 '20 at 13:32
  • That would be the originating file. Not the converted file. Yes, you can specify the output directory, but not the output filename. – Kevin Aug 05 '20 at 13:34
  • Sorry, I meant `/my/$(basename "$the_file" '.doc').pdf`. – Ed Morton Aug 05 '20 at 13:40
  • Your globbing pattern will recurse to sub-directories but the output seems to always be in your `/my/` directory - what happens when 2 files with the same name exist in different directores, e.g. what would the output file name(s) be given input of `/my/path/foo/abc.doc` and `/my/path/bar/abc.doc`? – Ed Morton Aug 05 '20 at 13:45
  • look at the output I posted in the question... specifically the .doc right after the word "convert". the converted file stays in the directory you run the conversion from. – Kevin Aug 05 '20 at 13:54
  • Again, we're talking about your output files, not your input files. Your posted output shows all output files generated from all input files in all directories ending up in one common directory - `/my/`. – Ed Morton Aug 05 '20 at 13:56
  • so no mater what.. if you run `lowriter` from /home the converted pdf will stay in /home.... unless you specify the output directory. my answer does what I need... i'll update it with the full code for the script for archival.purposes. – Kevin Aug 05 '20 at 13:57
  • p.s. `lowriter` is the cli for LibreOffice Writer – Kevin Aug 05 '20 at 13:59
  • 1
    As an aside, all the trailing semicolons are useless. And why do you call your loop variable `f`, then copy it to a different variable? Just use `for the_file in` if that's what you want. – tripleee Aug 05 '20 at 13:59

3 Answers3

2

The quick answer to the question you asked is that this will work using any sed:

sed 's/.*-> \(.*\) using filter :.*/\1/'

but I'm not sure you actually need to do that. Based on what you posted and your comments under the question I think all you really need is:

#!/usr/bin/env bash

shopt -s globstar

docPaths=( /my/path/**/*.doc )

for docPath in "${docPaths[@]}"; do

    pdfPath=$(basename "$docPath" '.doc')'.pdf'

    lowriter --headless --convert-to pdf "$docPath"
    
    printf '%s\n' "$pdfPath"

done
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
1
#!/bin/bash

FILES=/my/specific/input/folder/**/*.doc

shopt -s globstar

for f in $FILES; do

    the_file=$f;
    the_orig_dir=$(dirname "$the_file") ;

    converted=$(lowriter --headless --convert-to pdf "$the_file");
    
    new_file=$(echo "$converted" | grep -o -P '(?<= -> ).*(?= using filter : )');
    
    new_file_name=$(basename "$new_file");
    
    
    echo "$the_orig_dir/$new_file_name";
    
    
    set -x;
    
    rm -f $the_file;
    
    mv "$new_file" "$the_orig_dir/";
    
    set +x;
    
done;

does what I need it to do

Kevin
  • 2,684
  • 6
  • 35
  • 64
  • That's a pretty convoluted, non-portable way to do such a simple thing. – Ed Morton Aug 05 '20 at 14:00
  • i dont need it to be portable. i need it to do what I asked in the question, and it doea specifically that – Kevin Aug 05 '20 at 14:01
  • 1
    `grep -P` is nice if you have it; but the `sed` solution by Ed Morton is more portable and arguably simpler. – tripleee Aug 05 '20 at 14:02
  • It'll work **IF** you have one specific version of `grep`, GNU grep, it won't work anywhere else. You should at least mention that so that others reading this in future who aren't using GNU grep won't waste their time trying it. – Ed Morton Aug 05 '20 at 14:02
  • Thats fine with me mate... by all means, if you would like to continue to argue your opinion with me, let's move it to chat and not the comments – Kevin Aug 05 '20 at 14:05
  • You posted a question asking for help, I'm trying to help you, not argue with you. – Ed Morton Aug 05 '20 at 14:11
1

Following on comment from ed motron, worth mentioning that the libraOffice writer will place the output file in predictable name, based on the --outdir (or current working folder), and the requested conversion (pdf). The rules can be used to construct the name of the output file.

The above script can simply be written:

FILES=/my/path/**/*.doc

shopt -s globstar

for f in $FILES; do

    lowriter --headless --convert-to pdf "$f"
    converted=$(basename "$f" .doc).pdf
    # Do something with converted ...    
    echo "Output: $converted"
done;
dash-o
  • 13,723
  • 1
  • 10
  • 37