Questions tagged [invoice2data]

invoice2data is a command line tool and Python library to support accounting processes. Regular expressions can be used to customize data extraction. See https://github.com/invoice-x/invoice2data.

invoice2data is a command line tool and Python library to support accounting processes. It can extract text from PDF files using different techniques, search for regex in the result using a YAML-based template system and save results as CSV, JSON or XML or renames PDF files to match the content. See https://github.com/invoice-x/invoice2data.

6 questions
1
vote
1 answer

Python regex not capturing groups properly

I have the following regex (?:RE:\w+|Reference:)\s*((Mr|Mrs|Ms|Miss)?\s+([\w-]+)\s(\w+)). Input text examples: RE:11567 Miss Jane Doe 12345678 Reference: Miss Jane Doe 12345678 RE:J123 Miss Jane Doe 12345678 RE:J123 Miss Jane Doe 12345678…
West
  • 2,350
  • 5
  • 31
  • 67
0
votes
0 answers

I want to extract information from pdf

I have a pdf in which two order number is mentioned on different page I have to check order number is the same or not. I have a little idea about document layout analysis. Anyone can help me with how can i do this? I have to match different things…
0
votes
1 answer

TypeError: decoding to str: need a bytes-like object, list found

I'm currently working on Invoice2data library and getting error. My template is ready but its giving me error when i pass invoice to it. please help me out. here is my code: import re from invoice2data import extract_data from…
0
votes
1 answer

Regex to match first occurence of a string starting and ending with predefined characters

I'm trying to match the first occurence for the company name: EuroPayment Services S.R.L. I tried to make it non-greedy by adding ? but without success. What am I doing wrong? Name: EuroPayment Services S.R.L. Address: Str. Ion…
shAkur
  • 944
  • 2
  • 21
  • 45
0
votes
0 answers

Regex to match a name starting with a specific set of strings

Is it possible to match only the first string that occurs when specifying a specific subset of strings? I.e: SRL SC FAN COURIER EXPRESS SRL DISTRIBUTION SERVICES J40/4014/2001 - RO13838336 MANAGEMENT SRL …
shAkur
  • 944
  • 2
  • 21
  • 45
0
votes
1 answer

invoice2data format matching number

I'm using invoice2data to match specific values from an invoice PDF. However, it seems that some values are extracted without decimals and I don't know why. When I run invoice2data I get 'amount': 118989.0 instead of 'amount': 1 189,89 Regex: Total…
shAkur
  • 944
  • 2
  • 21
  • 45