Over the summer I picked up an old project I had started a year ago: a realtime beat tracking app. The analysis process is built from scratch (except the audio driver and FFT libraries) in Rust. It listens to live audio and predicts the current tempo and beat phase which can then be used as a transport source for lighting and visuals.
The pipeline consists of a couple of steps. First, the FFT is taken of the audio signal with multiple bin sizes to capture bass with good frequency precision and preserve the strong transients of highs. The bins are then logarithmically compressed and summed, plus a few extra steps to produce an overall energy and onset signal. This signal is then autocorrelated with different intervals to find the repetition period - the tempo. The guessed tempo is then used to find the maximum transient position in the recent history to get the beat phase.
While not production ready, I have used it a few times when running lights for other people. I plan to eventually publish it as a product, but before that I would like to add an optimisation process that tunes the pile of magic numbers in the analysis chain. I am guessing that different genres of music and input sources (mics, line-in, etc) have fairly different optimal settings.
There's also some work to do to make it as intuitive and non-intrusive as possible. Since it's supposed to replace manual tempo tapping, to provide any advantage it needs to be similarly reliable while still requiring less brain power to use.
The app showing from top to bottom: spectrogram, tempo probability distribution, zoomed probability, beat phase tracking.