I have a question relating to the use of an Initialization Vector in AES encryption. I am referencing the following articles / posts to build encryption into my program: Java 256-bit AES Password-Based Encryption
I was originally following erickson’s solution from the first link but, from what I can tell, PBKDF2WithHmacSHA1 is not supported on my implementation. So, I turned to the second link to get an idea for my own iterative SHA-256 hash creation.
My question comes in how the IV is created. One implementation () uses methods from the Cypher class to derive the IV where are the other () uses the second 16 bytes of the hash as the IV. Quite simply, why the difference and which is better from a security standpoint? I am kinda confused to the derivation and use of IVs as well (I understand what they are used for, just not the subtler differences), so any clarification is also very welcome.
I noticed that the second link uses AES-128 rather than AES-256 which would suggest to me that I would have to go up to SHA-512 is I wanted to use this method. This seems like it would be an unfortunate requirement as the user’s password would have to be 16 characters longer to ensure a remotely secure hash and this app is destined for a cell phone.
Source is available on request, though it is still incomplete.
Thank you in advance.
The IV should not be generated from the password alone.
The point of the IV that even with the same key and plaintext is re-used, a different ciphertext will be produced. If the IV is deterministically produced from the password only, you’d get the same ciphertext every time. In the cited example, a salt is randomly chosen, so a new key is generated even with the same password.
Just use a random number generator to choose an IV. That’s what the cipher is doing internally.
I want to stress that you have to store either the IV (if you use the first method) or a salt (if you use the second method) together with the ciphertext. You won’t have good security if everything is derived from the password; you need some randomness in every message.
Cryptographers should generate IVs using a secure pseudo-random random number generator.
Application developers should use existing, off the shelf cryptography. I suggest that you use SSL with certificates to secure your network traffic and GPG to secure file data.
There are so many details that can make an implementation insecure, such as timing attacks. When an application developer is making decisions between AES 128 and AES 256 it is nearly always pointless since you’ve likely left a timing attack that renders the extra key bits useless.
My understanding is that Initialization Vector is just random input to encryption algorithm, otherwise you would always get same result for same input. Initialization Vector is stored together with cipher text, it’s not secret in any way. Just use secure random function to generate initialization vector. PBKDF* algorithms are used to derive secret keys of desired length for encryption algorithms from user-entered passwords.
First implementation that you link to simply lets Cipher object to generate Initialization Vector. Then it fetches this generated IV to store it together with cipher text.
Second one uses part of hash bytes. Any approach that generates non-repeating IVs is good enough.
Most important property of IV is that it doesn’t repeat (very often).
The IV is just a consequence of the use of block chaining. I presume that this is more than a simple API design question. I assume that you know that the reasoning for using it is so that the same plaintext will not show up as the same ciphertext in multiple blocks.
Think about recursion from the last block where the Nth ciphertext block depends in some way on the (N-1)th block, etc. When you get to the first block, 0th block, you need some data to get started. It doesn’t matter what that data is as long as you know it before you attempt to decrypt. Using non-secret random data as an initialization vector will cause identical messages encrypted under the same key to come out as completely different ciphertext.
It’s similar in concept to salting a hash. And that source code looks a little fishy to me. An IV should simply be fresh-at-encryption-time random bits dependent upon nothing, like a nonce. The IV is basically part of the encrypted message. If you re-encrypt identical data with identical key, you should not be able to correlate the messages. (Hey, think about the consequences of correlating by ciphertext length as well.)
As with everyone else here, I’ve always known IVs to be just chosen randomly using the standard algorithms for doing so.
The second reference you provided, though, doesn’t seem to be doing that. Looks like he salts a password and hashes it. Then takes that hash and splits it up into halves. One half is the encryption key, one is the IV. So the IV is derived from the password.
I don’t have any strong breaks for such a method, but it’s bad design. The IV should be independent and random all on its own. Maybe if there’s a weakness in the hashing algorithm or if you choose a weak password. You don’t want to be able to derive the IV from anything else or it’s conceivable to launch pre-computation attacks.