0

I am using tesseract for OCR. I am on ubuntu 18.04.

I have this program which extracts the texts from an image and print it. I want that program to create a new text file and paste the extracted content on to the new text file, but I am only able to do these

  • copy the content to clipboard
  • open new texteditor(geditor) file I don't know how to paste the copied content

Here is my program which extracts the text from image

from pytesseract import image_to_string 
from PIL import Image
print image_to_string(Image.open('sample.jpg'))

Here is the program which copies the text to clipboard,

import os
def addToClipBoard(text):
    command = 'echo ' + text.strip() + '| clip'
    os.system(command)

This program will open the geditor and create a new text file

import subprocess
proc = subprocess.Popen(['gedit', 'file.txt'])

Any help would be appreciated.

Mohit Motwani
  • 4,662
  • 3
  • 17
  • 45
Gaurav Bahadur
  • 189
  • 2
  • 14

2 Answers2

2

If you just want the text, then open a text file and write to it:

from pytesseract import image_to_string 
from PIL import Image
text =  image_to_string(Image.open('sample.jpg'))

with open('file.txt', mode = 'w') as f:
    f.write(text)
Mohit Motwani
  • 4,662
  • 3
  • 17
  • 45
1

Just as I proposed in the comment, create a new file and write the extracted text into it:

with open('file.txt', 'w') as outfile:
    outfile.write(image_to_string(Image.open('sample.jpg')))
DYZ
  • 55,249
  • 10
  • 64
  • 93