Recently, a research team led by Bai Xiangzhi and Meng Cai from the School of Astronautics published their latest findings in IEEE Transactions on Pattern Analysis and Machine Intelligence under the title "TherNet: Thermal Segmentation Network Harnessing Physical Properties."
Based on the physical principles of thermal radiation infrared imaging, this research constructs an encoder-decoder deep network guided by thermal imaging mechanisms, achieving high-precision segmentation of thermal infrared images. It offers a novel pathway for transforming deep learning models from purely "data-driven" approaches to "physics-informed+data-driven" paradigms.
Thermal infrared images possess distinctive capabilities for all-weather imaging, disregarding restrictions posed by lighting and weather conditions. However, the unique physical characteristics of thermal infrared imaging pose significant challenges to the segmentation of thermal infrared images. Unlike visible light images with prominent color differences, the singular intensity representation in thermal values often leads to multiple interpretations of objects with similar thermal values due to the combined influence of emissivity and temperature. Inter-object heat transfer causes unclear material boundaries and introduces artifacts, while atmospheric scattering impacts the overall image quality. Additionally, the thermal inertia effect of imaging devices improperly accumulates object thermal values.


Figure 1. Physical characteristics modeling of thermal radiation
To address the aforementioned issues, the research team conducted an in-depth investigation from two dimensions: physical modeling and network design. At the level of physical mechanism modeling, a mathematical analytical model for thermal infrared imaging was established based on the theory of thermal radiation imaging and the operating principles of sensors. Furthermore, theoretical analysis via operator theory verified that neural networks approximating the inverse operator can stably reconstruct the inverse process of thermal infrared imaging. This model lays a theoretical foundation for constructing a deep network guided by the principles of thermal radiation imaging.

Figure 2. Framework of the proposed method
In terms of network design, four distinct modules were developed (Figure 1). Specifically, the Atmospheric Transmission Model (ATM) was introduced to delineate the atmospheric transmission effect encountered during the propagation of thermal light. The Thermal Inertia Module (TIM) was devised to reinstate the genuine radiation influenced by thermal inertia. Additionally, the Material Boundary Module (MBM) was proposed to alleviate the impact of interaction radiation by enhancing boundary features. Lastly, the Material Radiation Module (MRM) was developed to refine the mapping of emissivity across various materials within semantic objects.
The proposed framework builds upon the transformer-based encoder-decoder architecture SegFormer. As shown in Figure 2, a multi-level cascade of transformer blocks in the encoder extracts multi-scale representations, while a lightweight MLP decoder ensures efficient semantic decoding.
Experimental results demonstrate that the proposed method achieves significant improvements in multi-classtarget segmentation of thermal infrared images. Moreover, the weight distributions of the corresponding physical modules align with physical distributions such as thermal inertia variables and atmospheric transmission factors. The method has been validated in scenarios such as nighttime perception for autonomous driving and navigation assistance for the visually impaired, further demonstrating its advancement in terms of physical interpretability and practical application. This research provides a novel solution for intelligent analysis of thermal infrared images.
The co-first authors of the paper are Chen Junzhang, a 2017 doctoral student (now a faculty member at the School of Artificial Intelligence and Data Science, University of International Business and Economics), and Shu Shihao, a 2023 master student. Meng Cai and Bai Xiangzhi serve as co-corresponding authors. Beihang University is the primary affiliated institution.
This research was supported by the National Natural Science Foundation of China (Grant No. 62271016), the Beijing Natural Science Foundation (Grant No. 4222007), and the Fundamental Research Funds for the Central Universities.
Link to the article:https://ieeexplore.ieee.org/document/11124592
Editor: Liu Tingting