11

How would you go about hiding sensitive information from going into log files? Yes, you can consciously choose not to log sensitive bits of information in the first place, but there can be general cases where you blindly log error messages upon failures or trace messages while investigating a problem etc. and end up with sensitive information landing in your log files.

For example, you could be trying to insert an order record that contains the credit card number of a customer into the database. Upon a database failure, you may want to log the SQL statement that was just executed. You would then end up with the credit card number of the customer in a log file.

Is there a design paradigm that can be employed to "tag" certain bits of information as sensitive so that a generic logging pipeline can filter them out?

Ates Goral
  • 137,716
  • 26
  • 137
  • 190
  • 3
    On the recommendation to move this question to serverfault.com: No. This question is from the software perspective. You want to ensure that you're software is smart enough to help users mask out sensitive information. This in fact a real-life requirement that I've seen from real-life customers. – Ates Goral Sep 23 '09 at 16:03
  • *you're -> your (after 7 years) – Ates Goral Apr 20 '16 at 13:50

7 Answers7

7

My current practice for the case in question is to log a hash of such sensitive information. This enables us to identify log records that belong to a specific claim (for example a specific credit-card number) but does not give anybody the power to just grab the logs and use the sensitive information for their evil purposes.

Of course, doing this consistently involves good coding practices. I usually choose to log all objects using their toString overloads (in Java or .NET) which serializes the hash of the values for fields marked with a Sensitive attribute applied to them.

Of course, SQL strings are more problematic, but we rely more on our ORM for data persistence and log the state of the system at various stages then log SQL queries, thus it is becomes a non-issue.

paracycle
  • 7,665
  • 1
  • 30
  • 34
  • I like the idea of overriding toSource; that way, the logging code doesn't have to care about what's being logged. Although it doesn't address the issue with memory dumps, I'll accept this as the best answer since it's directly answering the original question. Thanks! – Ates Goral Jan 05 '10 at 19:07
  • yes but hash of a credit card number is trivial to brute-force (or build a rainbow table for) – Andrey Fedorov May 04 '18 at 07:56
  • @AndreyFedorov good point, that's certainly the case. It would be better to log the HMAC of these type of constant-length strings, with an application known key. Thus, when/if a claim arises, it would be possible to compute the HMAC again for lookup, but an attacker that obtained the logs wouldn't be able to revert the HMAC using brute-force nor by building rainbow tables. – paracycle Jun 19 '18 at 21:42
  • 1
    yup, unless of course an attacker also gets the "application known key". seems that might be un-avoidable, however. – Andrey Fedorov Jun 19 '18 at 22:00
5

I would personally regard the log files themselves as sensitive information and make sure to restrict access to them.

Fredrik Mörk
  • 155,851
  • 29
  • 291
  • 343
  • True! I'm thinking of cases where you're a software vendor and asking your clients to send you the log files from their system in order to diagnose a system crash etc. Would the onus be on the client to first clean up their log files from sensitive information? Wouldn't it be nice if your system had a way to let clients get that for free? – Ates Goral Sep 23 '09 at 16:01
  • "Restricting access" isn't specific enough to provide sufficient protection for credit card information. The logs need to be encrypted, and access to the decryption keys needs to be spelled out in the security policy. – erickson Sep 23 '09 at 16:04
2

Logging a credit card number could be a PCI violation. And if you aren't PCI compliant, you will be charged higher card-processing fees. Either don't log sensitive information, or encrypt your entire log files.

Your idea of "tagging" sensitive information is intriguing. You could have a special data type for Sensitive information, that wrapped the real, underlying data type. Whenever this object is rendered as a character string, it just returns "***" or whatever.

However, this could require widespread coding changes, and requires a level of concious vigilance similar to that needed to avoid logging sensitive information in the first place.

erickson
  • 265,237
  • 58
  • 395
  • 493
1

In your example, you should be encrypting the credit card number or, better yet, not even storing it in the first place.

If, say, you were logging something else, like a login, you might want to explicitly replace a password with *****.

However, this manages to neatly avoid answering the question you've posed in the first place. In general, when dealing with sensitive information, it should be encrypted on its way to any form of permanent storage, be it a database file or a log file. Assume that a Bad Guy is going to be able to get their hands on either, and protect the information accordingly.

Bob Kaufman
  • 12,864
  • 16
  • 78
  • 107
  • I think encryption can be the answer: as soon as sensitive info enters your system, it gets encrypted and lives as encrypted. So, if you're doing low level logging (semantics-agnostic) or even getting a memory dump, the information will be reasonably secure. I think I like the idea of encrypting the info instead of the entire log file as suggested in other answers. – Ates Goral Sep 25 '09 at 05:52
1

If you know what you're trying to filter, you may run you log output through a Regex cleaning expression before you log it.

Esteban Araya
  • 29,284
  • 24
  • 107
  • 141
  • Yes, I thought of that. In fact, this may be a viable solution since there will always be a discreet number of different types of "sensitive" strings which you can identify with regexes. – Ates Goral Sep 23 '09 at 15:59
1

Regarding SQL statements specifically, if your language supports it, you should be using parameters instead of putting values in the statement itself. In other words:

select * from customers where credit_card = ?

Then set the parameter to the credit card number.

Of course, if you plan to log SQL statements with parameters filled in, you'd need some other way to filter out sensitive data.

Adam Crume
  • 15,614
  • 8
  • 46
  • 50
0

Refer this tool, created exactly for this use case.

If you want to mask only selected field, during logging and keep other field values as is. you can try this.

https://github.com/senthilaru/sp-util

<dependency>
    <groupId>com.immibytes</groupId>
    <artifactId>sp-utils</artifactId>
    <version>1.0.0-RELEASE</version>
</dependency>
Senthil Arumugam SP
  • 1,461
  • 12
  • 17