How do I detect text patterns in a list of text values so that I can test against that pattern to validate a new value?
For example,
Given a list of text values like this:
SKU-1242
SKU-5450
SKU-6532
SKU-2395
SKU-2393
SKU-9310
234321
I would like to be given this regex: [A-Z]{3}\-[0-9]{4,5}
. Ideally I would like to know what pecentage of existing values match this pattern.
This example is very similar to the one that the AWS documentation uses to demonstrate how AWS SageMaker Data Wranger provides this as a part of the Data Quality and Insights Report (seen here: https://aws.amazon.com/blogs/machine-learning/detect-patterns-in-text-data-with-amazon-sagemaker-data-wrangler/).
Is there a library or tool that can detect these sorts of patterns in lists of text values? Any language will work.