We present a highly efficient implementation of the coherent $\mathcal{F}$-statistic for continuous gravitational waves, which leverages the power of the FFT for substantial speed improvements compared to the (``Demodulation'') algorithm used by Einstein@Home previously. The basic idea for this goes back to Jaranowski, Krolak, Schutz (1998). Our implementation features a number of performance and usability improvements compared to previous implementations, among other things the ability to chose an arbitrary frequency resolution, internal enforcement of efficient power-of-two FFTs, and the use of optimal time-domain (windowed-)sinc-interpolation for barycentering. This code has been ported to GPUs (using openCL), resulting in further efficiency gains by 1-2 orders of magnitude compared to the CPU version.
Link to PDF (may not be available yet): P12-12.pdf