3

I am interested in doing a data mining website. Data in DB is really sensitive.

I would like to find a way to encrypt data in DB and to prove to my clients that even me, I can't read data.

The problem is that I would be able to "batch" rapports during night on the server side, and my software must be able to read data in clear.

Do you have an idea ?

Pier-Alexandre Bouchard
  • 5,135
  • 5
  • 37
  • 72

4 Answers4

1

You haven't described what you need done in terms of the reports. There are lots of approaches for doing computation on encrypted data. I suggest you start with these two approaches.

  1. Check out the book Translucent Databases 2nd Edition by Peter Wayner. The quote Wayner, " The book is still designed to help the world build databases that answer useful questions without keeping any useful information around. The examples show how most databases don't need to be filled with the world's secrets and personal information. If the client uses the right amount of encryption, the databases don't need to be dangerous one-stop shopping for the identity thieves and others who with malice aforethought."

  2. If you have a PhD in cryptography and you have a few billion cycles to burn, you should read up on Homomorphic Encryption.

vy32
  • 28,461
  • 37
  • 122
  • 246
0

As mentioned by @vy32 Homomorphic Encryption provides the theoretical way to do this, but it is not practical today.

How about requesting anonymized rather than encrypted data?

For example, you don't need customer names or national IDs to tell them apart--anonymous IDs would do. Another example: Some data values can be hashed, so that you can tell different entities apart but not what they are. Number values could be given as an order, so that you know for every pair which is greater, rather than precise amounts. Fields that don't matter to you, like personal names in most applications, can simply be omitted.

There is an entire body of work devoted to anonymization, and another body of work devoted to de-anonymization of anonymized data sets, but you can get a long way with some simple transformations.

Joshua Fox
  • 18,704
  • 23
  • 87
  • 147
0

You should consider the most basic data encryption : RSA. Google this, it's straightforward, there are two keys to the encryption, one is the public key, the other is the private key. Let us know how that works out for you.

BuZz
  • 16,318
  • 31
  • 86
  • 141
  • 1
    Yes, I tought about public/private keys. But, I would like to secure my clients by saying that I don't have the private key. On the server side, the software could decrypt it but I don't be able to even see the server side data. Excuse my english, I really try to be clear! – Pier-Alexandre Bouchard Nov 04 '11 at 17:27
  • Well, there are some work arounds i could think of on the server side that make the server holds the private key and not you... but in the end, as long as you are the coder of the server, realistically you can always undo those things (random range for keys, etc...) and manage to get it back. But you can try to fool your clients by saying that the private key is generated once and you don't know it, then encrypt your binaries (if you publish a jar for example or something) to say that you can't get it back. – BuZz Nov 04 '11 at 17:47
  • And, I like the idea by generating the private key. So, my database is encrypted by RSA with a generated private key. My client is a web interface. The client has the private key, and the server side also. I never do this kind of encryption with keys, where can I store the private key, not in the DB! – Pier-Alexandre Bouchard Nov 04 '11 at 18:00
  • Well, logically if you want to have only generated at runtime and in the RAM of the program, as soon as you switch off your service, you lose your data if you can't recover your key. – BuZz Nov 04 '11 at 18:10
  • Yeah, it's a fact.. What is the best way to store keys in a secure way ? – Pier-Alexandre Bouchard Nov 04 '11 at 18:14
  • It's not that nobody knows, it is that key management is probably the hardest part of PKI (public key infrastructure). There are many ways to implement it and many ways to do it wrong. Basically, you should get some expert help on this - explaining it would *certainly* go beyond this question on stackoverflow (and possibly stackoverflow in general). – Maarten Bodewes Nov 05 '11 at 15:40
0

There is no way that you can't decrypt the data, but your software can do it, as long as you have control over your software.

Somewhere needs to be a key so the software can decrypt the data, and if the software runs on a computer where you have access, you can get to the key. No way around this.

Your clients either have to trust you to not do anything malicious with the data, or they have to do the processing themselves (or with another service).

There might some ways to use homomorphic encryption (i.e. where you have enc(f1(a,b)) = f2(enc(a), enc(b)) for a pair of functions f1, f2), but this will only do for some very limited operations, encryption schemes specially made to support this, and quite likely not for stuff where your "data mining" is necessary.

Paŭlo Ebermann
  • 73,284
  • 20
  • 146
  • 210