0

I am looking to extract important keywords from a set of text pieces which are actually text messages received after any transaction. Below is a sample dataset:

{"message": "*boi star sandesh* rs 20 has been debited to your account xx2136 from pos-paytm.com on 08-11-2014.available balance 275.00.", "number": "boiind"}
{"message": "your a/c xxxxx388847 debited inr 7,500.00 on 12/08/16 -transferred to mr. rajendra kurmi . a/c balance inr 1,314.45", "number": "amcbssbi"}
{"message": "an amount of rs.10,000.00 has been debited from your account  number xxxx1152 for an online payment txn done using hdfc bank netbanking.", "number": "dmhdfcbk"}
{"message": "your a/c no. xxxxxxxx1152 is debited for rs. 10,000.00 on 11-08-16 and a/c xxxxxxx847 credited (imps ref no 622421331357)", "number": "vkhdfcmp"}
{"message": "one time password for netbanking transaction is 785516. please use the password to complete the transaction. pls do not share this with anyone. ref no- xxxx4763", "number": "imhdfcbk"}
{"message": "your a/c no. xxxxxxxx3962 is debited for rs.20000.00 on 11-08-16 and a/c of unregistered has been credited (imps ref no 622421342625).", "number": "dmaxisbk"}

And I need to extract information from the messages about the transaction amount, the remaining balance, the date, and the type of transaction.

What approach should I take and what module will be the best?

FYI The messages from the same number have the same message format but I have to deal with a large number of formats so writing code for each number will be repetitive and time consuming.

Darrick Herwehe
  • 3,553
  • 1
  • 21
  • 30
Rahul
  • 115
  • 10

1 Answers1

1

Use regular expressions from the re module.

For example to find the date for each string we could use the regex pattern

r" on (\d\d[-\/]\d\d[-\/]\d{2,4})"

FHTMitchell
  • 11,793
  • 2
  • 35
  • 47
  • Yeah, I have done the date part and the account number using regex but how to get the transaction amount and other details? – Rahul Jan 26 '18 at 12:55
  • Because message from each number won't have the same format. – Rahul Jan 26 '18 at 12:56
  • 1
    I think you should have a go and maybe ask a new question with your attempted code. SO is not a free coding service. Edit: I would point out you're not going to find a simple answer. If there are lots of variations, there isn't going to be a way of avoiding making multiple searches. Thats just how it is. – FHTMitchell Jan 26 '18 at 13:00