I am trying to parse pdf tables by using pdftables python library. But it is combining columns and ignoring spaces.
Here is my code:
pdf_page = get_pdf_page(fileobj, page)
tables = page_to_tables(pdf_page)
I am trying to parse pdf tables by using pdftables python library. But it is combining columns and ignoring spaces.
Here is my code:
pdf_page = get_pdf_page(fileobj, page)
tables = page_to_tables(pdf_page)
You can dodge some pdf frustration if you realize that its a % and easily you can read any number over 9 and under 100: Reading digits until you have 2 digits (11 to 99) combination or 1 digit combination (0-9) or 10. If you have 10, then you can add 0 but not any other number than 0 to the 3rd digit of the string.
I express myself better in python than English xD I Hope this can be helpfully for you:
def split(str):
number = '0'
numbers = []
for char in str:
if int(char) == 0 and int(number) == 10:
numbers.append(int(number + char))
number = '0'
elif int(number) > 9 and int(number) < 100 and int(char) != 0:
numbers.append(int(number))
number = char
elif int(number) >= 0 and int(number) < 10:
number = number + char
if int(number) > 0:
numbers.append(int(number))
return numbers
For example, with this code if I calls with:
split('25106387100')
it returns
[25, 10, 63, 87, 100]
Then with this code you can split any string in numbers over 10 to 100, the problem now its if you need to split one digit numbers, in this case you can add a conditional inside 0-9 condition to detect if 'isdigit()' in pdf having the position of digit reducing the processing of the pdf to the minimum