12

I know how to Tesseract multiple files in the same directory using Terminal on OS X.

for i in *.tif ; do tesseract $i outtext;  done;

Does anyone have suggestions for how to do this on the Command Prompt on a computer running Windows?

Karol S
  • 9,028
  • 2
  • 32
  • 45
Thomas Padilla
  • 193
  • 1
  • 1
  • 7
  • 1
    A point for someone who lands here-- This will try to dump all the output to same file and overwrite it. (This point is also noted in an answer below by Joe W). You might prefer to do this as `for i in *.tif ; do tesseract $i "txtfolder/$i"; done;` This will place all output files in a folder txtfolder – R.S. Feb 20 '22 at 06:02
  • Or `for i in *.tif ; do tesseract $i - >> output.txt; done` if you want all the output in just one textfile – Paul Chris Jones Jan 11 '23 at 16:45

4 Answers4

6

What is the Windows equivalent of the Unix for i command?

Without knowing exactly what the tesseract command does on Unix compared to Windows it is difficult to give a comprehensive answer.

On Windows you can use the for command to perform a command on several files.

From a command line:

for %i in (*.tif) do tesseract %i outtext

In a batch file:

for %%i in (*.tif) do tesseract %%i outtext

Further Reading

DavidPostill
  • 7,734
  • 9
  • 41
  • 60
5

In the above example:

for %%i in (*.tif) do tesseract %%i outtext

Tesseract will write over the same output file outtext.txt for each iteration. You will end up with a single file (outtext.txt) containing only the text from the last image. You need to uniquely name each output file. You could replace the string outtext with %%i as shown below.

for %%i in (*.tif) do tesseract %%i %%i

However, if you want a different output file name, you can assign an additional variable using the set command. Then increment this variable for each iteration.

set /a j=1
for %%i in (*.tif) do (
tesseract %%i output_file%j%
set /a j+=1
)

However, %j% will expand to '1' for each iteration. You will end up with one file named outputfile1.txt. The %j% is expanded once at the beginning of the loop, and that same value is used for each iteration. Using the setlocal enabledelayedexpansion command and replacing %j% with !j! will force Windows to expand !j! for each iteration. To restore the previous environment settings a matching endlocal command should be issued.

setlocal enabledelayedexpansion
set /a j=1
for %%i in (*.tif) do (
tesseract %%i output_file!j!
set /a j+=1
)
endlocal

I tested this successfully on Microsoft Windows 7 Home Premium edition. I hope it helps you.

Joe W.
  • 51
  • 1
  • 1
  • I get an error ''syntax error near unexpected token `(' '' when I execute the first and second examples. What is it ? – LearnToGrow Aug 11 '17 at 19:20
3
for %i in (*.tif) do (tesseract %i stdout 1>> out.txt)

Will find all the tif files and pipe the stdout into the target out.txt

Aleks
  • 63
  • 5
2
dir "folder_path\*.tif" /s /b > "folder_path\input.txt"
"tesseract_path\tesseract" "folder_path\input.txt" "folder_path\output"
knayan
  • 21
  • 2