With the advances in deep generative models, facial forgery by advanced deepfake generation techniques has posed a severe societal and political threat. Recently, a new problem of generating a synthesized human voice of a person is emerging. With the emerging threat of impersonation attacks using deepfake audios and videos, a new generation of deepfake detectors is being investigated to collectively focus on both audio and video data. For the commercial adoption and large-scale roll-out of the deepfake detection technology, it is vital that they don’t discriminate across demographic variations such as gender and race. The aim of this chapter is to thoroughly examine the bias of audio and facial video based unimodal and bi-modal deepfake detectors across gender and race. Thorough experimental investigations are conducted on bi-modal audio-video FaceAVCeleb and KoDF deepfake datasets annotated with gender and race labels. Our experimental results suggest that both audio and video based deepfake detectors obtain performance differential across gender and race. However, the bi-modal audiovisual deepfake detectors obtain a lower performance differential of about 2.05% and 7.3% across gender and race in terms of EER over audio and facial video based deepfake detectors, individually.