Heart Disease Detection from Audio Signals
Machine Learning

Heart Disease Detection from Audio Signals

Advanced Machine Learning for Early Detection of Heart Diseases through Audio Analysis

Phone-grade heart sound recordings fuel explainable ensembles for proactive cardiac screening.

Home/Research/Heart Disease Detection from Audio Signals

Project Information

Course
Advanced Biomedical Machine Learning
Authors
Andrea Alberti, Davide Ligari
Date
July 2024
Pages
17
View Code

Technologies

Scikit-learnTorchaudioLibrosaImblearnXGBoostCatBoostLightGBMPyTorchTensorFlowKerasNumPyPandasMatplotlibSeabornSHAP

Abstract

Designed prevention and clinical support ensembles for early cardiac screening on the Dangerous Heartbeat Dataset (CHSC2011). Heart sounds were resampled at 4 kHz, segmented into 1-second windows, described with MFCC, chroma, spectral and temporal descriptors, and reduced from 338 to 41 features via Spearman-based filters. The prevention ensemble keeps false normals under control (ROC-AUC 0.96, TPR 43.4% at 1% FPR) while the five-class support ensemble delivers macro F1 81.6 with per-class risk analysis and SHAP explanations.

About

The study processes the Dangerous Heartbeat Dataset (CHSC2011) by resampling heterogeneous recordings at 4 kHz, slicing 1-second windows and extracting 338 temporal, spectral and cepstral descriptors (MFCC, chroma STFT, RMS, ZCR, CQT, spectral centroid/bandwidth/roll-off). A two-step Spearman filter removes weakly correlated and redundant attributes, shrinking the footprint to 41 features. Two complementary pipelines are trained: a prevention ensemble (Random Forest + MLP Ultra + MLP Rollercoaster) that evaluates normal-vs-rest performance across strict FPR thresholds, and a five-class diagnostic ensemble (Random Forest + MLP Ultra) that balances macro F1, MCC and per-class risk. SHAP values and permutation importance explain decisions and highlight waveform regions that drive predictions.

Key Results

0.82
F1-Score
0.96
ROC-AUC
43.4%
TPR @1% FPR
74.3%
TPR @5% FPR
86.6%
TPR @10% FPR
95.8%
TPR @20% FPR
41 of 338
Features Retained
81.53
Support MCC

Key Findings

  • The Dangerous Heartbeat Dataset (CHSC2011) was resampled to a uniform 4 kHz, windowed at 1 s and enriched with MFCC, chroma, spectral and temporal descriptors before Spearman filtering removed 87.9% of redundant variables.
  • Complementary prevention (normal vs rest) and clinician-support (5-class) ensembles were trained from Random Forest and diverse MLP backbones to balance low false normals with detailed disease classification.
  • Explainability with permutation importance and SHAP highlighted waveform regions influencing predictions, surfacing where murmurs and extra systoles overlap normal beats and guiding future data collection.

Methodology

Heart sound clips from CHSC2011 were resampled at 4 kHz, segmented into 1-second windows, described with 338 spectral/temporal features, pruned to 41 via two-stage Spearman filtering, and used to train prevention and five-class support ensembles whose behaviour was analysed with risk metrics, permutation importance and SHAP.

License

This project is licensed under the MIT License. Feel free to use, modify, and distribute the code as per the terms of the license.

Heart Disease Detection from Audio Signals | Andrea Alberti | Andrea Alberti