1

I have a pdf of 22 pages. I am using GhostScript to convert the PDF to TIFF to be used by Tesseract. I did this...

  string filename=openFileDialog1.FileName;

  using (GhostscriptRasterizer rasterizer = new GhostscriptRasterizer())
            {
                rasterizer.Open(filename, _lastInstalledVersion, false);
                for (int pageNumber = 1; pageNumber <= rasterizer.PageCount; pageNumber++)
                {
                  Image img = rasterizer.GetPage(400, 400, pageNumber);  
                }

I want to set the img size before I pass it to be used by Tesseract but I cant.. Is there any way?

There is this example: but idk if I can pass each image from the pdf to be used by Tesseract

  GhostscriptVersionInfo gv = GhostscriptVersionInfo.GetLastInstalledVersion();

        using (GhostscriptProcessor processor = new GhostscriptProcessor(gv, true))
        {
            processor.Processing += new GhostscriptProcessorProcessingEventHandler(processor_Processing);

            List<string> switches = new List<string>();
            switches.Add("-empty");
            switches.Add("-dSAFER");
            switches.Add("-dBATCH");
            switches.Add("-dNOPAUSE");
            switches.Add("-dNOPROMPT");
            switches.Add(@"-sFONTPATH=" + System.Environment.GetFolderPath(System.Environment.SpecialFolder.Fonts));
            switches.Add("-dFirstPage=" + pageFrom.ToString());
            switches.Add("-dLastPage=" + pageTo.ToString());
            switches.Add("-sDEVICE=png16m");
            switches.Add("-r96");
            switches.Add("-dTextAlphaBits=4");
            switches.Add("-dGraphicsAlphaBits=4");

            //switches.Add("-sDEVICE=pdfwrite");

            switches.Add(@"-sOutputFile=" + outputFile);
            switches.Add(@"-f");
            switches.Add(inputFile);

            processor.StartProcessing(switches.ToArray(), null);
        }
    }
HABJAN
  • 9,212
  • 3
  • 35
  • 59
Derek Toh
  • 15
  • 1
  • 7

1 Answers1

2

-dDEVICEWIDTHPOINTS and -dDEVICEHEIGHTPOINTS, along with -dFIXEDMEDIA will set a specific media size. You'll probably want to set -dPDFFitPage as well to scale the content onto the new media.

You can't be using Ghostscript directly, are you using jhabjan's Ghostscript.net ?

KenS
  • 30,202
  • 3
  • 34
  • 51
  • Yes I am.. and there is this other example I just editted in, is it this? – Derek Toh Feb 06 '15 at 08:12
  • 1
    @DerekToh, I would suggest you to upgrade your Ghostscript.NET to v.1.2.0 ( released yesterday ) and then you can use switches KenS suggested in this way: http://pastebin.com/NNXEVRR4 – HABJAN Feb 06 '15 at 09:20
  • I cant seem to find the GhostScript.NET.dll when I downloaded the new one – Derek Toh Feb 06 '15 at 09:40
  • Where from did you download it? – HABJAN Feb 06 '15 at 09:41
  • Seems that you downloaded source code only. You should download binaries from here: https://github.com/jhabjan/Ghostscript.NET/releases or use NuGet. – HABJAN Feb 06 '15 at 09:43