0

I have a large text file containing credit card numbers, I've been told I need to look through this file and then find possible credit card numbers, I've been using Notepad ++ 'Find in Files' regular expression search mode using this simple expression: 4\d{15} (this searches for a 16 digit long string starting with 4 which is usually a VISA Debit/Credit) and then I copy and paste it into a Credit Card Validation script.

Is there anyway to create an expression that will search for 16 digit long strings starting with 4, and checks to see if it uses the Luhn Algorithm (makes sure it is valid).

This is the Luhn Algorithm:

1) Starting with the second to last digit and moving left, double the value of all the alternating digits.

2) Starting from the left, take all the unaffected digits and add them to the results of all the individual digits from step 1. If the results from any of the numbers from step 1 are double digits, make sure to add the two numbers first (i.e. 18 would yield 1+8). Basically, your equation will look like a regular addition problem that adds every single digit.

3) The total from step 2 must end in zero for the credit-card number to be valid.

Source: http://www.webopedia.com/TERM/L/Luhn_formula.html

user2786228
  • 17
  • 2
  • 6
  • 2
    You can't do this using NOTEPAD++ alone. You will need to write some code to do this. May I ask why you have an unstructured text file with things that *look like* credit card numbers? That sounds highly suspicious. – Lasse V. Karlsen Dec 09 '13 at 11:22
  • I'm not entirely sure why I've been asked to do this, I work for a company which sells products online, I believe it had something to do with re-building our database... "You will need to write some code to do this."... That isn't very helpful, could you give me a better explanation? – user2786228 Dec 09 '13 at 12:56
  • As Lasse told you, you CAN'T validate the Luhn checksum simply with regular expressions. You will need to write something more complex, with the use of a programming language. If you search the internet you will find some implementations of the Luhn algorithm in different languages. The only way to do it in Notepad++ is by using the PythonScript plugin (so your implementation should be in Python). – psxls Dec 09 '13 at 13:46
  • You asked if it was possible to do with Notepad++. The answer is no, it can't. "You will need to write some code" means that you'll need to write a program that parses the file, finds the potential credit card numbers, and then checks them with the Luhn algorithm. If you need help with that, post a new question. – Jim Mischel Dec 09 '13 at 14:05

1 Answers1

0

Here's a simple LINQPad program that extracts all 16-digit numbers that starts with a 4 from the file:

void Main()
{
    const string inputFileName = @"d:\temp\input.txt";
    const string outputFileName = @"d:\temp\output.txt";

    string input = File.ReadAllText(inputFileName);
    var matches =
        from Match ma in Regex.Matches(input, @"\d+")
        let number = ma.Value
        where number.Length == 16 && number.StartsWith("4")
        select number;

    var creditCardNumbers =
        from match in matches
        where IsCreditCardNumber(match)
        select match;

    File.WriteAllLines(outputFileName, creditCardNumbers);
}

public static bool IsCreditCardNumber(string number)
{
    // validate luhn checksum here
    return true;
}
Lasse V. Karlsen
  • 380,855
  • 102
  • 628
  • 825