Pitch Detection (F0 frequency) has been a big problem for many many years now. One of the solutions, proposed way back in 1993 was to use a

Two-Way Mismatch algorithm. Its an excellent idea, and works much better and faster than the standard or even modified autocorrelation technique. The proposed method specifically works well for pitch tracking in music signals.

The working is extremely simple. Small segments of the time domain signal are analysed and their spectrum computed. A peak detection algorithm is run on this which obtains the amplitudes and frequencies (bins) of the strong harmonics seen in the spectrum. The important parameter here is the FFT size and the window length. This is because the frequency resolution and time resolution are dependent on this factor, as described in this earlier post on Formants and Harmonics.

Now, the pitch variation for that music signal or speech signal is generally known. Hence, a range of pitch values are chosen, and their harmonics are computed. These harmonics are are tried to match with the measured ones of the signal. The closer the match, the better the estimation of the pitch. This matching technique ensures that if a few peaks are missed in the signal, they will be omitted and penalised accordingly. Also extra peaks will be penalised.

This matching technique is performed in both ways. Matching the predicted harmonics with the measured partials and the other way round, measured partials with predicted harmonics. A weighting of these two errors is carried out, and the final Pitch estimate error is obtained. The global minimum is computed among the chosen range of pitch values, and this corresponds to the pitch frequency of that segment.

A smoothing (median filtering) may be carried out later in case of speech signals which generally tend to have a constant-pitch over small segments of time.

### Like this:

Like Loading...

*Related*

SavannahAwesome blog!

I thought about starting my own blog too but I’m just too lazy so, I guess I’ll just have to keep checking yours out

LOL,

...Can this work with non-harmonic sounds, like bells, metallophones, gamelan, timpani, etc? I suspect it would work even better, because the spectrum is not repetitive and so there are less possibilities for false positives?

Makarand TapaswiPost authorI guess it should. However, I would assume that autocorrelation based techniques might be fine too (although you might still have pitch halving and pitch doubling issues).

...The algorithm in the paper says “For n corresponding

to the closest frequency, set a_k=A_n”, but where do you get A_n from?? Presumably, since it’s normalized by A_max of the measured partials, A_n for the predicted spectrum has to be similar in amplitude to the measured partial amplitudes. I kind of wonder if this is a bug in the paper.

Any ideas? There’s an implementation in Matlab here, but I don’t understand it. It just uses the same amplitude for all A_n? http://www.dtic.upf.edu/~xserra/cursos/CCRMA-workshop/labs/lab7/f0detection.m