I’m exploring voice recognition and DSP, and so I would like to implement a simple sound frequency analyzer on my smartphone (I have both an iPhone and a Samsung Nexus S running Android). I have done basic DSP in Matlab previously.
From my understanding, I need to perform an FFT to get the fundamental frequencies of a signal.
So now, I would like to sample the microphone at 44100 Hz. If I use a sliding window of sample size 512 with 50% overlap, that means I need to do an FFT every 256 samples, or 0.00580 seconds.
That rate seems really high, particularly if I program in Java for Android. Will my smartphone be able to handle that speed? I am aware that you can program in C/C++ on Android, but I would like to keep it with Java for the time being.
Performing a real-to-complex FFT requires ~5/2 n lg n floating-point operations (additions and multiplications). In your case, n=512, so:
flops per fft ~= (5/2) * 512 * 9 = 11520
So 172 ffts per second requires about 2 million floating-point operations per second. That sounds like a lot, but it really isn’t that many. The hardware of a typical armv7-class smartphone is capable of hundreds of millions or billions of floating-point operations per second.
Note however that you will want to have a carefully-written high-performance FFT; poorly written FFTs are notoriously inefficient. On the iPhone, you can use the Accelerate framework (built right into the OS, and available in the SDK), which provides a nice set of FFT functions; I’m not sure what’s available on Android.
For the iPhone, the Accelerate framework for iOS can do all the FFTs you specify using on the order of 1% of CPU time (exact percentage depending on device model and FFT data types).
For Android, you might strongly want to consider using an NDK native library for processor intensive numerical calculations.
Also note that an FFT will give you the peak frequencies, which will not necessarily include the fundamental or voice pitch frequency.
ADDED: This Java benchmark web page suggests that Android phones are capable of in the range of 5 to over 50 MFlops using Java for well written matrix math. A well written FFT should fall around roughly the same performance range in MFlops. @Stephan Cannon posted that on the order of 2 MFlops might be required for your spec.
Your Android device will be able to handle this fine. I’ve written realtime, FFT-based frequency analyzers that ran on Windows Mobile devices from a few years ago (using pure C#), and these devices had much worse processors than current Android devices. The most computationally expensive aspect of FFT is the trig functions, and since you’re using a fixed-size window you can easily replace the trig function calls with a pre-calculated lookup table.
As an aside you can probably cut down your computation time by reducing your sampling rate. Speech doesn’t have much energy above 8 kHz, so you could likely downsample your audio to 16 KHz before doing any FFTs, without losing much accuracy. At 16 kHz your FFTs would be smaller, and so faster.
Wikipedia claims that 16 kHz is a standard sampling rate for speech recognition in desktop applications.
(I realize that this doesn’t answer the OP’s question, but I think it might be helpful to him nonetheless, given his application.)