ISSN print edition: 0366-6352
ISSN electronic edition: 1336-9075
Registr. No.: MK SR 9/7

Published monthly

A concentration-invariant FTIR chemometric workflow for closed-library compound identification with peak-sparse representation and machine-learning classification

Otabek Atabaev, Moulay Rachid Babaa, Shakhzodbek Samandarov, and Asadbek Tajimuratov

Chemical & Materials Engineering Department, New Uzbekistan University, Tashkent, Uzbekistan

Received: 8 December 2025 Accepted: 9 March 2026

Abstract:

Fourier-transform infrared (FTIR) spectroscopy is widely used for compound identification; however, reported high performance often reflects replicate-level consistency rather than true methodological robustness. We present a concentration-invariant FTIR workflow integrating Savitzky–Golay smoothing, asymmetric least-squares baseline correction, area normalization, percentile-based peak sparsification, PCA compression, and supervised classification. Using a library of 89 pure organic compounds measured at four concentration levels (356 spectra), tree-based ensembles achieved 100% top-1 accuracy and macro-F1 = 1.00 under replicate-stratified evaluation, while SVM achieved 0.989 accuracy (macro-F1 = 0.985) and PLS-DA achieved 0.854 accuracy (macro-F1 = 0.807). Classical cosine- and correlation-based library matching performed comparably to machine-learning models, indicating that performance is governed primarily by preprocessing and spectral alignment rather than classifier complexity. Additive Gaussian noise applied to raw test spectra did not degrade identification accuracy, whereas controlled global wavenumber shifts of ± 1–3 cm⁻¹ reduced performance to chance levels (~ 1–2%), identifying spectral alignment as the dominant failure mode. The peak-sparse representation reduced input dimensionality from 20,742 to ~ 600 features (≈ 97% reduction) while preserving classification performance and enabling sub-400 ms inference in a Python/PyQt5 implementation. These results establish a reproducible benchmark for closed-library FTIR identification under aligned acquisition conditions and delineate the operational limits of such workflows.

Keywords: FTIR spectroscopy; Chemometric preprocessing; Peak-sparse features; PCA; Machine learning classification; Concentration invariance

Full paper is available at www.springerlink.com.

DOI: 10.1007/s11696-026-04796-4

Chemical Papers 80 (6) 6745–6761 (2026)

Saturday, July 04, 2026