I have a large data set of Insurance Claims data with 2 columns. One column is a claim identifier. The other is a large string of notes that go with the claim.
My goal is to text mine the Claims Notes for a specific VIN number. Typically a VIN# is in a 17 digit format. See Below: https://www.autocheck.com/vehiclehistory/autocheck/en/vinbasics
However, with my data, some issues arise. Sometimes only the last 6 digits were input for a VIN#. I basically need a way to process my data and grab anything that looks like a 17 digit VIN Number and return it to that row of data. I am using Python 3 and am a rookie text miner but have some basic experience using regular expressions.
I am trying to create a function in python in which I can lambda apply it to the column of notes.
Attempt so far:
C_Notes['VIN#s'] = C_Notes['ClaimsNotes'].str.findall(r'[0-9]{1}[0-9a-zA-Z]{16}')
I am trying to mimic the format of the VIN in the link I provided.
So something that looks for a string with following qualities:
EDIT: Changed code snippet. This code example works if I make some toy examples of VINs with made up text but I am not having any success iterating through my pandas column. Each row entry has a large paragraph of text I want the function to go through each row at a time.
Thank you.