1

I'm building a manager class with PHP to manage credit card payment authorizations. With credit cards, we're allowed to keep First6, last4, expiration_Month and expiration_Year.

I'm really interested in knowing how unique the combination of these 4 variables is and how likely it would be to run into another one.

Depending on how likely it is will effect when to test if we've already got a valid authorization for a new card. If we've already got an authorization for a particular card, there's no need to run the numbers again. Instead, we can find the already authorized card and do a re-authorization. However, I wouldn't want to run the wrong card because it has a similar First6, last4, expiration_Month and expiration_Year..

My goal is to limit data redundancy of credit card data, hits to the CC processor API and unnecessary authorizations on customer cards.

JustinP
  • 1,351
  • 1
  • 14
  • 29

2 Answers2

5

The First 6 tell you what kind of card you are dealing with. For a list of issuers see:

http://en.wikipedia.org/wiki/List_of_Issuer_Identification_Numbers

The last four are essentially random. The month will be essentially random, and the year will be in a small range from the current year to perhaps 6 years out. The year will exhibit some bias between possible values.

You will almost certainly have collisions if you combine those items to attempt to uniquely identify a card. That is not a reliable thing to do.

EDIT

Here are examples of recent security breeches similar to this scenario

http://blogs.cisco.com/security/6-5-million-password-hashes-suggest-a-possible-breach-at-linkedin/

http://www.infoworld.com/d/security/nvidia-investigating-breach-of-hashed-passwords-197796

https://www.infoworld.com/d/security/passwords-leaked-yahoo-boozy-preachy-angry-and-easy-197696

If a hacker can download data from the database of a large web company (typically the most-firewalled-away part of the architecture), chances are pretty good they can also access the application tier and grab the source code or compiled application that accesses the data layer.

Eric J.
  • 147,927
  • 63
  • 340
  • 553
  • Would consider it safe to use in the session scope to prevent the same card from being authorized multiple times in the same user session. over the course of submitting an order.. deciding they want to use a different card, then going back to the first card... stuff like that? – JustinP Dec 10 '12 at 21:12
  • Someone mentioned hashing the full PAN. For me, the salt seems to complicate it. Because ultimately, I'm trying to do a lookup on the hash to detect duplication but a record level salt would make indexing and looking up difficult. I'm not against hashing the full PAN.. I just want to do it correctly. – JustinP Dec 10 '12 at 21:19
  • @JustinPfister: Session scope should be fine. However, if you do not control your hardware (e.g. shared hosting) that does introduce a security risk since session state will linger in memory. If you use session scope and do not have exclusive access to the hardware, consider encrypting any PII including the card number before storing it in session. – Eric J. Dec 11 '12 at 01:16
  • @JustinPfister: Hashing the full card number probably violates your terms of service with your payment gateway provider, since loss of the hashing algorithm and salt allows the card numbers to be re-created. There was a similar high-profile data breech just this year. – Eric J. Dec 11 '12 at 01:17
  • Using a one way hash (like SHA1) should be OK. Given a hash and the algorithm, it is hard to get back to the original PAN. Remember to iterate the HASH a couple 1000 times to slow it down. – brian beuning Dec 11 '12 at 01:33
  • A salt would protect you from rainbow table based attacks. But a salt is usually used with user passwords where you have the user name to look up the salt for that user. You don't have that. – brian beuning Dec 11 '12 at 01:39
  • @brianbeuning: It is *simple* to get back the original card number given the source code used to hash, because the attacker knows how many times to hash a given card number to create a rainbow table. Solid hash implementations help in the case where only the database is stolen. However, often if a hacker can steal the database, he can also steal the PHP source code. Added a few recent real-world, high profile hacks to my answer. – Eric J. Dec 11 '12 at 16:38
3

To expand on the previous answer. The left 6 are the BIN and are probably the same for all of your cards, so these are no help matching cards. Given the right 4 are random, the month is random, and the year has 1 of 6 values that means you have 10000 * 12 * 6 = 720,000 unique combinations.

If you have 100,000 cards total, then your odds are 1 in 7 of having a collision. If you have over 1,500,000 cards then a collision is a near certainty on every transaction.

brian beuning
  • 2,836
  • 18
  • 22
  • This is really helpful Brian. Thank you. These were the kind of figures I was looking for. – JustinP Dec 11 '12 at 12:28
  • 2
    To understand why a collision is so likely, it's worth having a look at the Birthday Problem http://en.wikipedia.org/wiki/Birthday_problem – Eric J. Dec 11 '12 at 16:41