Abstract: Spaceborne infrared hyperspectral data have emerged as a cornerstone of modern Numerical
Weather Prediction (NWP) systems, enabling high-resolution atmospheric profiling and improved forecast
accuracy. However, the utility of these data is significantly constrained by cloud interference, as infrared spectral
radiation is strongly attenuated or scattered by cloud particles. Consequently, clear-sky identification—
specifically the discrimination of cloud-free pixels and channels—has become an indispensable preprocessing
step in data assimilation, ensuring that only reliable observations are integrated into NWP models.This review
provides a systematic overview of the evolutionary landscape of clear-sky identification methods for spaceborne
infrared atmospheric sounding data, spanning both foreign and domestic sensor systems. It critically evaluates
techniques applied to datasets from iconic foreign missions, such as the High-Resolution Infrared Sounder
(HIRS), Atmospheric Infrared Sounder (AIRS), Infrared Atmospheric Sounding Interferometer (IASI), and Crosstrack
Infrared Sounder (CrIS), alongside domestic advancements using the Hyperspectral Infrared Radiation
Sounder (HIRAS) and Global Infrared Imager and Interferometer Sounder (GIIRS) aboard China’s Fengyun
satellites.The methodologies are categorized into three distinct technological frameworks: Spectral Feature-Based
Approaches: Early techniques rely on single-spectral thresholding, where channels are flagged as clear-sky based
on predefined radiance thresholds sensitive to cloud absorption or emission. Advanced variants employ crossspectral
consistency checks, leveraging the spectral dependence of cloud properties across multiple wavelength
bands to enhance discrimination accuracy. For example, IASI’s cloud-clearing algorithm uses a combination of shortwave and longwave infrared channels to identify consistent clear-sky signatures.Data-Driven and Machine
Learning Techniques: Principal component analysis (PCA) has been widely used to reduce the dimensionality of
hyperspectral datasets, enabling the extraction of latent variables that distinguish clear-sky from cloudy
conditions. More recently, machine learning models—including random forests, support vector machines, and
deep neural networks—have demonstrated superior performance in pixel-level clear-sky classification. These
models learn complex nonlinear relationships between spectral features and cloud states, achieving higher
precision in heterogeneous cloud environments. For instance, AIRS has adopted neural networks to improve clearsky
identification in regions with thin cirrus clouds. Domestic Innovations in Assimilation Systems: China’s
Fengyun satellite program has developed bespoke clear-sky identification schemes tailored to the HIRAS and
GIIRS instruments. These methods integrate physical constraints from radiative transfer models with statistical
learning, optimizing clear-sky channel selection for regional NWP models over the Tibetan Plateau and monsoonaffected
areas. Such innovations have significantly enhanced the utilization of domestic hyperspectral data in
operational assimilation systems. The review highlights two transformative technologies: Three-Dimensional
(3D) Clear-Sky Identification: By incorporating vertical atmospheric structure from NWP model forecasts, 3D
methods enable the assimilation of clear-sky data above cloud tops, extending the utility of hyperspectral
observations in partially cloudy conditions. This approach has been shown to improve upper-tropospheric
humidity analysis in Arctic NWP systems. Cross-Spectral Matching with Cloud Parameter Inversion: At the pixel
scale, matching hyperspectral observations with cloud properties derived from complementary sensors (e. g.,
microwave radiometers or visible imagers) has proven particularly effective in multi-phase cloud environments.
Compared to standalone 3D methods, this hybrid approach achieves a 15%~20% improvement in clear-sky
identification accuracy over ice-water mixed clouds, as demonstrated in CrIS data applications. Looking forward,
the review identifies key challenges in all-sky and full-spectrum assimilation, including the handling of sub-pixel
cloud heterogeneity, spectral bias in multi-sensor datasets, and computational scalability for real-time operations.
To address these, a novel framework is proposed: a machine learning-enhanced 3D clear-sky identification model
trained on cross-spectral matching datasets. By fusing physical radiative transfer principles with data-driven
learning, this approach promises to unlock the full potential of spaceborne infrared hyperspectral data, offering
robust technical support for next-generation NWP systems and advancing global weather forecasting capabilities.