My team needs to develop a solution to encrypt binary data (stored as a
byte) in the context of an Android application written in Java. The encrypted data will be transmitted and stored in various ways, during which data corruption cannot be ruled out. Eventually another Android application (again written in Java) will have to decrypt the data.
It has already been decided that the encryption algorithm has to be AES, with a key of 256 bits.
However I would like to make an informed decision about which AES implementation and/or “mode” we should use. I have read about something called GCM mode, and we have done some tests with it (using BouncyCastle/SpongyCastle), but it is not entirely clear to me what exactly AES-GCM is for and what it “buys” us in comparison to plain AES – and whether there are any trade-off’s to be taken into account.
Here’s a list of concerns/requirements/questions we have:
Padding: the data we need to encrypt will not always be a multiple of the 128 bits, so the AES implementation/mode should add padding, yet only when necessary.
I was under the impression that a plain AES implementation, such as provided by
javax.crypto.Cipher, would not do that, but initial tests indicated that it does. So I’m guessing the padding requirement in itself is no reason to resort to something like GCM instead of “plain” AES. Is that correct?
Authentication: We need a foolproof way of detecting if data corruption has occurred. However, ideally we also want to detect when decryption is attempted with an incorrect key. Hence, we want to be able to differentiate between both of these cases. The reason I ended up considering GCM in the first place was due to this Stackoverflow question, where one of the responders seems to imply that making this distinction is possible using AES-GCM, although he does not provide a detailed explanation (let alone code).
Minimise overhead: We need to limit overhead on storage and transmission of the encrypted data. Therefore we wish to know whether, and to what extent, the choice for a specific AES implementation/mode influences the amount of overhead.
Encryption/decryption performance: Although it is not a primary concern we are wondering to what extent the choice of a specific AES implementation/mode influences encryption and decryption performance, both in terms of CPU time and memory footprint.
Thanks in advance for any advice, clarification and/or code examples.
EDIT: delnan helpfully pointed out there is no such thing as “plain AES”. So to clarify, what I meant by that is using Java’s built-in AES support.
Cipher localCipher = Cipher.getInstance("AES");
In 2012 the answer is to go for GCM, unless you have serious compatibility issues.
GCM is an Authenticated Encryption mode. It provides you with confidentiality (encryption), integrity, and authentication (MAC) in one go.
So far, the normal modes of operation have been ECB (which is the default for Java), CBC, CTR, OFB, and a few others. They all provided encryption only. Confidentiality by itself is seldom useful without integrity though; one had to combine such classic modes with integrity checks in an ad-hoc way. Since cryptography is hard to get right, often such combinations were insecure, slower than necessary or even both.
Authenticated Encryption modes have been (fairly recently) created by cryptographers to solve that problem. GCM is one of the most successful: it has been selected by NIST, it efficient, it is is patent free, and it can carry Additional Authenticated Data (that is, data which stays in the clear, but for which you can verify authenticity). For a description of other modes see this excellent article of Matthew Green.
Coming to your concerns:
Authentication: AES-GCM only takes one AES key as input, not passwords. It will tell you if the AES key is wrong or the payload has been tampered with, but such conditions are treated as one. Instead, you should consider using an appropriate Key Derivation Algorithm like PBKDF2 or bcrypt to derive the AES key from the password. I don’t think it is always possible to tell if the password is incorrect or if the payload has been modified, because the data necessary to verify the former can always be corrupted. You can encrypt a small known string (with ECB AES), send it along, and use it to verify if the password is correct.
Minimise overhead: At the end of the day, all modes leads to the same overhead (around 10-20 bytes) if you want authentication. Unless you are working with very small payloads, this point can be ignored.
Performance: GCM is pretty good in that it is an online mode (no need to buffer the whole payload, so less memory), it is parallelizable, and it requires one AES operation and one Galois multiplication per plaintext block. Classic modes like ECB are faster (one AES operation per block only), but – again – you must also factor in the integrity logic, which may end up being slower than GMAC.
Having said that, one must be aware that GCM security relies on a good random number generation for creation of the IV.