The excellent lib tika-python in its documentation at https://github.com/chrismattmann/tika-python shows that it is possible to set the tika_server.jar file to avoid downloading with each use of the algorithm. Has anyone done this and can post the configuration?
The first time the algorithm is used, tika_server.jar is downloaded so that lib can use it. I want to avoid this download by setting the file locally.
Extract text from PDF
def extraiPDF(f):
resultado = []
tika.TikaClientOnly = True
raw = parser.from_file(f)
metadados = raw["metadata"]
conteudo = raw["content"]
conteudo = (conteudo).replace('\n', '').replace('\r\n', '').replace('\r', '').replace('\\', '').replace('\t', ' ')
resultado.append(conteudo)
resultado.append(metadados)
return resultado