Kechen Song(Associate professor)
+
- Supervisor of Doctorate Candidates Supervisor of Master's Candidates
- Name (English):Kechen Song
- E-Mail:
- Education Level:With Certificate of Graduation for Doctorate Study
- Gender:Male
- Degree:博士
- Status:Employed
- Alma Mater:东北大学
- Teacher College:机械工程与自动化学院
- Email:
Strip Steel Surface Defect Detection
Surface Defect Segmentation and Classification based on Few-shot Learning
Segmentation-CPANet | |
This paper proposed a simple but effective few-shot segmentation method named cross position aggregation network (CPANet), which intends to learn a network that can segment untrained S3D categories with only a few labeled defective samples. Using a cross-position proxy (CPP) module, our CPANet can effectively aggregate long-range relationships of discrete defects, and support auxiliary (SA) can further improve the feature aggregation capability of CPP. Moreover, CPANet introduces a space-squeeze attention (SSA) module to aggregate multi-scale context information of defect features and suppresses disadvantageous interference from background information. In addition, a novel S3D few-shot semantic segmentation dataset FSSD-12 is proposed to evaluate our CPANet. Through extensive comparison experiments and ablation experiments, we explicitly evaluate that our CPANet with the ResNet-50 backbone achieves state-of-the-art performance on dataset FSSD-12. | |
Hu Feng, Kechen Song, et al. Cross Position Aggregation Network for Few-shot Strip Steel Surface Defect Segmentation [J]. IEEE Transactions on Instrumentation and Measuremente, 2023, 72, 5007410. (paper) (code & dataset) | |
Segmentation-TGRNet | |
Metal surface defect segmentation can play an important role in dealing with the issue of quality control during the production and manufacturing stages. There are still two major challenges in industrial applications. One is the case that the number of metal surface defect samples is severely insufficient, and the other is that the most existing algorithms can only be used for a specific surface defects and it is difficult to generalize to other metal surfaces. In this work, a theory of few-shot metal generic surface defect segmentation is introduced to solve these challenges. Simultaneously, the Triplet-Graph Reasoning Network (TGRNet) and a novel dataset Surface Defects-4i are proposed to achieve this theory. For Surface Defects-4i, it includes multiple categories of metal surface defect images to verify the generalization performance of our TGRNet and adds the non-metal categories (leather and tile) as extensions. | |
Yanqi Bao, Kechen Song, et al. Triplet- Graph Reasoning Network for Few-shot Metal Generic Surface Defect Segmentation [J]. IEEE Transactions on Instrumentation and Measuremente, 2021. (paper) (code & dataset)(ESI highly cited, 5/2022) | |
Classification-GTNet | |
In this article, we propose a novel few-shot defect classification method, which aims to recognize novel defective classes with few labeled samples. Specifically, the proposed method follows a transductive paradigm and consists of two modules, i.e., graph embedding and distribution transformation (GEDT) module and optimal transport (OPT) module. The GEDT module not only makes full use of the relevant correlation information between different features in the support set and the query set but also ensures the consistent distribution of the graph embedding results. Then, the OPT module is leveraged to implement few-shot classification in a transductive manner. Finally, experiments conducted on the proposed metal surface defect dataset, and the results demonstrate that the proposed method achieves the state-of-the-art performance under both one-shot and five-shot settings. | |
Weiwei Xiao, Kechen Song, et al. Graph Embedding and Optimal Transport for Few-Shot Classification of Metal Surface Defect [J]. IEEE Transactions on Instrumentation and Measuremente, 2022. (paper) (code & dataset) | |
Classification-FaNet | |
In this paper, we propose a feature-aware network (FaNet) for a few shot defect classification, which can effectively distinguish new classes with a small number of labeled samples. In our proposed FaNet, we use ResNet12 as our baseline. The feature-attention convolution module (FAC) is applied to extract the comprehensive feature information from the base classes, as well as to fuse semantic information by capturing the long-range feature relationships between the upper and lower layers. Meanwhile, during the test phase, an online feature-enhance integration module (FEI) is adopted to average the noise from the support set and query set defect images, further enhancing image features among the different tasks. In addition, we construct a large-scale strip steel surface defects few shot classification dataset (FSC-20) with 20 different types. Experimental results show that the proposed method achieves the best performance compared to state-of-the-art methods for the 5-way 1-shot and 5-way 5-shot tasks. |
|
Wenli Zhao, Kechen Song, et al. FaNet: Feature-aware Network for Few Shot Classification of Strip Steel Surface Defects [J]. Measurement, 2023. (paper) (code & dataset) |
Surface Defect Detection, Segmentation and Classification based on Supervised/ Semi-supervised
Detection | |
In this paper, we proposed a novel defect detection system based on deep learning and focused on a practical industrial application: steel plate defect inspection. In order to achieve strong classification-ability, this system employs a baseline convolution neural network (CNN) to generate feature maps at each stage. And then the proposed multilevel-feature fusion network (MFN) combines multiple hierarchical features into one feature, which can include more location details of defects. Based on these multilevel features, a region proposal network (RPN) is adopted to generate regions of interest (ROIs). For each ROI, a detector, consisting of a classifier and a bounding box regressor, produces the final detection results. Finally, we set up a defect detection dataset NEU-DET for training and evaluating our method. On the NEU-DET, our method achieves 74.8/82.3 mAP with baseline networks ResNet34/50 by using 300 proposals. In addition, by using only 50 proposals, our method can detect at 20 fps on a single GPU and reach 92% of the above performance, hence the potential for real-time detection. | |
Yu He, Kechen Song, et al. An End-to-end Steel Surface Defect Detection Approach via Fusing Multiple Hierarchical Features[J]. IEEE Transactions on Instrumentation and Measuremente, 2020. (paper) (dataset) (Popular Articles, 12/2020--1/2023) (ESI highly cited, 12/2020-1/2023)(ESI Hot Paper,5/2022) | |
Segmentation | |
This article proposes a pyramid feature fusion and global context attention network for pixel-wise detection of surface defect, called PGA-Net. In the framework, the multiscale features are extracted at first from backbone network. Then the pyramid feature fusion module is used to fuse these features into five resolutions through some efficient dense skip connections. Finally, the global context attention module is applied to the fusion feature maps of adjacent resolution, which allows effective information propagate from low-resolution fusion feature maps to high-resolution fusion ones. In addition, the boundary refinement block is added to the framework to refine the boundary of defect and improve the result of the prediction. The final prediction is the fusion of the five resolutions fusion feature maps. The results of evaluation on four real-world defect datasets demonstrate that the proposed method outperforms the state-of-the-art methods on mean intersection of union and mean pixel accuracy (NEU-Seg: 82.15%, DAGM 2007: 74.78%, MT_defect: 71.31%, Road_defect: 79.54%). |
|
Hongwen Dong, Kechen Song, et al. PGA-Net: Pyramid Feature Fusion and Global Context Attention Network for Automated Surface Defect Detection [J]. IEEE Transactions on Industrial Informatics, 2020,16(12),7448-7458. (paper) (dataset)(ESI highly cited, 7/2021--1/2023) | |
Classification of Semi-supervised | |
Defect inspection is very important for guaranteeing the surface quality of industrial steel products, but related methods are based primarily on supervised learning which requires ample labeled samples for training. However, there can be no doubt that inspecting defects on steel surface is always a data-limited task due to difficult sample collection and expensive expert labeling. Unlike the previous works in which only labeled samples are treated using supervised classifiers, we propose a semi-supervised learning (SSL) defect classification approach based on multi-training of two different networks: a categorized generative adversarial network (GAN) and a residual network. This method uses the GAN to generate a large number of unlabeled samples. And then the multi-training algorithm that uses two classifiers based on different learning strategies is proposed to integrate both labeled and unlabeled into SSL process. Finally, through the multiple training process, our SSL method can acquire higher accuracy and better robustness than the supervised one using only limited labeled samples. Experimental results clearly demonstrate that the effectiveness of our proposed method, achieving the classification accuracy of 99.56%. | |
Yu He, Kechen Song, et al. Semi-supervised Defect Classification of Steel Surface Based on Multi-training and Generative Adversarial Network [J]. Optics and Lasers in Engineering, 2019, 122: 294-302. (paper) |
Surface Defect Feature Extraction and Recognition based on Traditional Methods
Adjacent Evaluation Completed Local Binary Patterns (AECLBP): | |
Automatic recognition method for hot-rolled steel strip surface defects is important to the steel surface inspection system. In order to improve the recognition rate, a new, simple, yet robust feature descriptor against noise named the adjacent evaluation completed local binary patterns (AECLBP) is proposed for defect recognition. In the proposed approach, an adjacent evaluation window which is around the neighbor is constructed to modify the threshold scheme of the completed local binary pattern (CLBP). Experimental results demonstrate that the proposed approach presents the performance of defect recognition under the influence of the feature variations of the intra-class changes, the illumination and grayscale changes. Even in the toughest situation with additive Gaussian noise, the AECLBP can still achieve the moderate recognition accuracy. In addition, the strategy of using adjacent evaluation window can also be used in other methods of local binary pattern (LBP) variants. | |
Kechen Song and Yunhui Yan. A noise robust method based on completed local binary patterns for hot-rolled steel strip surface defects [J].Applied Surface Science, 2013, 285: 858-864. (database) |
|
Scattering Convolution Network(SCN): | |
In order to improve the tolerance ability of local deformations for current feature extraction methods, a scattering operator is applied to extract features for defect recognition. Firstly, a scattering transform builds non-linear invariants representation by cascading wavelet transforms and modulus pooling operators, which average the amplitude of iterated wavelet coefficients. Then, an improved network named the scattering convolution network (SCN) is introduced to build largescale invariants. Finally, a surface defect database named the Northeastern University (NEU) surface defect database is constructed to evaluate the effectiveness of the feature extraction methods for defect recognition. Experimental results demonstrate that the SCN method presents the excellent performance of defect recognition under the influence of the feature variations of the intra-class changes, the illumination and grayscale changes. Even in the less number of training, the SCN method can still achieve the moderate recognition accuracy. | |
Kechen Song, Shaopeng Hu and Yunhui Yan. Automatic Recognition of Surface Defects on Hot-Rolled Steel Strip Using Scattering Convolution Network [J].Journal of Computational Information Systems, 2014, 10(7):3049-3055 . (paper) |
Surface Defect Detection based on Traditional Methods
Saliency Linear Scanning Morphology(SLSM): | |
Surface defect detection of silicon steel strip is an important section for non-destructive testing system in iron and steel industry. To detect the interesting defect objects for silicon steel strip under oil pollution interference, a new detection method based on saliency linear scanning morphology is proposed. In the proposed method, visual saliency extraction is employed to suppress the clutter background. Meanwhile, a saliency map is obtained for the purpose of highlighting the potential objects. Then, the linear scanning operation is proposed to obtain the region of oil pollution. Finally, the morphology edge processing is proposed to remove the edge of oil pollution interference and the edge of reflective pseudo-defect. Experimental results demonstrate that the proposed method presents the well performance for detecting surface defects including wipe-crack-defect, scratch-defect and small-defect. | |
Kechen Song, Shaopeng Hu, Yunhui Yan and Jun Li. Surface defect detection method using saliency linear scanning morphology for silicon steel strip under oil pollution interference[J]. ISIJ International, 2014, 54(11):2598-2607 . |
|
Saliency Convex Active Contour Model(SCACM): | |
Accurate detection of surface defects is an indispensable section in steel surface inspection system. In order to detect the micro surface defect of silicon steel strip, a new detection method based on the saliency convex active contour model is proposed. In the proposed method, visual saliency extraction is employed to suppress the clutter background for the purpose of highlighting the potential objects. The extracted saliency map is then exploited as a feature, which is fused into a convex energy minimization function of local-based active contour. Meanwhile, a numerical minimization algorithm is introduced to separate the micro surface defects from cluttered background. Experimental results demonstrate that the proposed method presents the well performance for detecting micro surface defects including spot-defect and steel-pit-defect. Even in the cluttered background, the proposed method almost detects all of the micro defects without any false objects. | |
Kechen Song and Yunhui Yan. Micro surface defect detection method for silicon steel strip based on saliency convex active contour model[J].Mathematical Problems in Engineering, 2013, (paper) |
Surface Defect Image Segmentation based on Traditional Methods
Convex Active Contour Segmentation Model: | |
In order to solve problems existing in Chan-Vese model and Local Binary Fitting (LBF) model, such as model sensitivity to the initial contour position and running slow in the segmentation of strip steel defect image, a novel model local information-based convex active contour (LICAC) is proposed. By converting non-convex optimization problem to a convex optimization problem via convex optimization technology , and applying the Split Bregman method for fast solution,the issues of the sensitivity to the initial contour position occurring in Chan-Vese model and LBF model are solved. With introduction of the local information, the new model is efficient in the segmentation of the strip surface defect image which is non-uniform gray. By using this model to segment single-target region strip defect image, four common defect categories, including weld, rust, holes and scratches are experimented, and experimental results show that the segmentation effect and operation time of the proposed model are better than the rest two kinds. In addition, this model can also be used to segment multi-target regions defect image, four common defect categories are experimented, including scratches, inclusion, pitting, and wrinkles, and experimental results have verified the validity of the model. | |
SONG Kechen, YAN Yunhui, PENG Yishu, DONG Dewei. Convex Active Contour Segmentation Model of Strip Steel Defects Image Based on Local Information[J].JOURNAL OF MECHANICAL ENGINEERING,2012,48(20):1-7. (Chinese) |
|
Structure Tensor and Active Contour: | |
In order to address the segmentation problem for cold rolled silicon steel surface defect based on the texture background, a novel method based on structure tensor and active contour model is proposed. Firstly, image local information is introduced to the structure tensor. In the extracted feature space of structure tensor, KL distance is treated as a regional similarity measure of the probability density to establish active contour model for image segmentation. Finally the numerical solution of Split-Bregman is used to solve the model. The proposed method is introduced to segment silicon steel surface defects, which are longitudinal scratches, horizontal scratches, foreign bodies, and holes. The experimental results show that this method can segment the silicon steel surface defect areas accurately. | |
SONG Kechen, YAN Yunhui,WANG Zhan, HU Changfa. Research on segmentation method for silicon steel surface defect based on structure tensor and active contour[J].Computer Engineering and Applications,2012,48(32):224-228.(Chinese) |
Rail Surface Defect Detection
In-service and No-service Rail
No-service Rail Surface Defect Detection based on Stereoscopic Images (RGB-D)
Unsupervised Saliency Detection | |
An unsupervised stereoscopic saliency detection method based on a binocular line-scanning system is proposed in this article. This method can simultaneously obtain a highly precise image as well as profile information while also avoids the decoding distortion of the structured light reconstruction method. | |
Menghui Niu, Kechen Song, et al. Unsupervised Saliency Detection of Rail Surface Defects using Stereoscopic Images [J]. IEEE Transactions on Industrial Informatics, 2021,17(3),2271-2281. (paper) (code)(RSDDS-113 dataset) (ESI highly cited, 7/2021-1/2023) Reported by 《Imaging & Machine Vision Europe》 | |
Collaborative Learning Attention Network | |
We propose a neural network named collaborative learning attention network (CLANet) for no-service rail surface defect inspection. The proposed method consists of three main stages: feature extraction, cross-modal information fusion, and defect location and segmentation. A multimodal attention block is proposed to highlight complex defect object with a new cross-modal fusion strategy. Furthermore, dual stream decoder enriches the representation of advanced features and avoids the dilution of information in the decoding stage. Suffering from the scarcity of defective data, an industrial RGB-D dataset NEU RSDDS-AUG is built. Finally, ablation studies verify the effectiveness of our proposed method. |
|
Jingpeng Wang, Kechen Song, et al. Collaborative Learning Attention Network Based on RGB Image and Depth Image for Surface Defect Inspection of No-Service Rail [J]. IEEE/ASME Transactions on Mechatronics, 2022 . (paper) |
No-service Rail Surface Defect Detection based on RGB Images
MCnet | |
In this article, we propose an acquisition scheme with two lamp light and color scan line charge-coupled device (CCD) to alleviate uneven illumination. Then, a multiple context information segmentation network is proposed to improve NRSD segmentation. The network makes full use of context information based on dense block, pyramid pooling module, and multi-information integration. Besides, the attention mechanism is applied to optimize extracted information by filtering noise. For the problem of real sample shortage, we propose to utilize artificial samples to train the network. And an NRSD data set NRSD-MN is built with artificial NRSDs and natural NRSDs. Experimental results show that our method is feasible and has a good segmentation effect on artificial and natural NRSDs. | |
Defu Zhang, Kechen Song, et al. MCnet: Multiple Context Information Segmentation Network of No-service Rail Surface Defects [J]. IEEE Transactions on Instrumentation and Measuremente, 2021, 70,5004309 (paper) (code) (dataset) | |
Image-level weakly supervised segmentation | |
A novel image-level weakly supervised segmentation formulation is proposed for no-service rail surface defects. These defects are decomposed into three sub-categories (strip-shaped, spot-shaped, block-shaped) according to the size prior information (area and shape). Then, a method is presented with a pooling combination module. The pooling combination module makes full use of the size attributes of the sub-category by utilizing different pooling functions for different sub-categories. Experimental results demonstrate that our method is effective and outperforms the state-of-the-art methods. |
|
Defu Zhang, Kechen Song, et al. An image-level weakly supervised segmentation method for No-service rail surface defect with size prior [J]. Mechanical Systems and Signal Processing, 2022, 165, 108334. (paper) (code) |
In-service Rail Surface Defect Detection
Line-Level Label | |
A novel inspection scheme for RSDs is presented for limited samples with a line-level label, which regards defect images as sequence data and classifies pixel lines. Thousands of pixel lines are easy to be collected and labeling line-level is a simple task in labeling works. Then two methods OC-IAN and OC-TD are designed for inspecting express rail defects and common/heavy rail defects, respectively. OC-IAN and OC-TD both employ one-dimensional convolutional neural network (ODCNN) to extract features and long- and short-term memory (LSTM) network to extract context information. The main differences between OC-IAN and OC-TD are that OC-TD applies a double-branch structure and removes the attention module. | |
Defu Zhang, Kechen Song, et al. Two Deep Learning Networks for Rail Surface Defect Inspection of Limited Samples with Line-Level Label [J]. IEEE Transactions on Industrial Informatics, 2021,17(10),6731-6741. (paper) |
|
In-service and No-service Rail Surface Defect Detection
SC-OSDA | |
We propose a novel one-shot unsupervised domain adaptation framework. Specifically, we introduce a shape consistent style transfer module that performs pixel-level distribution alignment between the training and test images. Based on the one-shot test image, the training image is reconstructed to have the same appearance as the test image. Meanwhile, we employ a multi-task learning strategy to prevent content distortion of the reconstructed images. To improve the robustness of the model to distribution differences, we design an edge-aware defect segmentation model and train the model using the reconstructed training images. The experimental results show that our method effectively improves the robustness of the model to distribution differences and achieves satisfying results in the task of rail surface defect segmentation. | |
Shuai Ma, Kechen Song, et al. Shape Consistent One-Shot Unsupervised Domain Adaptation for Rail Surface Defect Segmentation [J]. IEEE Transactions on Industrial Informatics, 2022 . (paper) | |
CFDANet | |
We propose a cross-scale fusion and domain adversarial network (CFDANet) to improve the generalization ability of deep neural networks on unseen datasets. To alleviate the domain shift caused by defect scale differences, we design a dual-encoder to extract multi-scale features from images of different resolutions. Then, those features are adaptively fused through a cross-scale fusion module. For the domain shift caused by inconsistent rail appearance, we introduce transferable-aware domain adversarial learning to extract domain invariant features from different datasets. Moreover, we further propose a transferable curriculum to suppress the negative impact of images with low transferability. Experimental results show that our CFDANet can accurately segment defects in unseen datasets and surpass other state-of-the-art domain generalization methods in all five target domain settings. | |
Shuai Ma, Kechen Song, et al. Cross-scale Fusion and Domain Adversarial Network for Generalizable Rail Surface Defect Segmentation on Unseen Datasets [J]. Journal of Intelligent Manufacturing, 2022 . (paper) |
|
Anomaly Detection | |
An innovative generative adversarial network based on adaptive pyramid graph (APG) and variation residuals (APGVR-GAN) is proposed, aiming to improve the robustness of anomaly detection in railway products and other complex industrial supplies. First, the APG module is embedded in the encoder–decoder–encoder pattern, capturing the correlation description between neighbor regions, which is utilized to enhance the detection of abnormal defects with weak texture. Next, the variation residual module is employed to enhance the expression of various normal samples in the latent space and improve the identification ability for abnormal samples. Then, the dual-probability prototype loss is proposed to make different normal samples have more concentrated expression and more similar probability distribution centers in latent space. Finally, an adaptive focal-gate loss and a regularized log-likelihood loss are designed to overcome the imbalance problem in training samples with different background information. The effectiveness of the model is verified on three new railway datasets and three other industrial public datasets. | |
Menghui Niu, Kechen Song, et al. An Adaptive Pyramid Graph and Variation Residual-Based Anomaly Detection Network for Rail Surface Defects [J]. IEEE Transactions on Instrumentation and Measuremente, 2021,70,5020013 . (paper) |
Pavement Distress Detection
Automatic Inspection and Evaluation System for Pavement Distress
We propose a three-stage automatic inspection and evaluation system for pavement distress based on improved deep convolutional neural networks (CNNs). First, the system integrates multi-level context information from the CNN classification model to construct discriminative super-features to determine whether there is distress in the pavement image and the type of the distress, so as to achieve rapid detection of pavement distress. Then, the pavement images with distress are fed into the CNN segmentation model to highlight the distress region with pixel-wise. In the segmentation model, a novel pyramid feature extraction module and a novel guidance attention mechanism are introduced. Finally, we evaluate the degree of pavement damage according to the segmentation results of the CNN segmentation model. | |
Hongwen Dong, Kechen Song, et al. Automatic Inspection and Evaluation System for Pavement Distress [J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(8),12377-12387. (paper) |
Detection and Classification for Pavement Distress Images
Few-Shot Classification | |
We propose a new few-shot pavement distress detection method based on metric learning, which can effectively learn new categories from a few labeled samples. We adopt the backend network (ResNet18) to extract multilevel feature information from the base classes and then send the extracted features into the metric module. In the metric module, we introduce the attention mechanism to learn the feature attributes of “what” and “where” and focus the model on the desired characteristics. We also introduce a new metric loss function to maximize the distance between different categories while minimizing the distance between the same categories. In the testing stage, we calculate the cosine similarity between the support set and query set to complete novel category detection. | |
Hongwen Dong, Kechen Song, et al. Deep metric learning-based for multi-target few-shot pavement distress classification [J]. IEEE Transactions on Industrial Informatics, 2022,18(3),1801-1810. (paper) (code) (ESI highly cited, 7/2022-1/2023) | |
Patch-aware Mutual Reasoning Network (PMRN) | |
We propose a novel Patch-aware Mutual Reasoning Network (PMRN) that utilizes only the prior knowledge of non-defective samples for defect detection. Concretely, a patch-aware mutual reasoning module and a spatial shuffle perception module are devised to reason mutual dependencies and explore dislocations relationships. Besides, an adaptive soft gated anomaly measurement function is developed to calculate reconstruction deviations, which can soft control the information flow according to the complexity of the current scenario. | |
Yanyan Wang, Kechen Song, et al. Unsupervised defect detection with patch-aware mutual reasoning network in image data [J]. Automation in Construction, 2022, 142, 104472. (paper) | |
RENet | |
RENet is proposed for accurate and robust pavement crack detection. The rectangular convolution pyramid module is first built on deep layers so that the features can describe defects with different structures. The optimized contextual information and features of shallower layers are gradually merged into three resolutions. Subsequently, the hierarchical feature fusion refinement module and the boundary refinement module are applied to each branch. These two modules effectively promote the seamless fusion of features at various scales and make the model pay more attention to boundaries. Finally, the outputs of the three branches are integrated to obtain the final prediction map. |
|
Yanyan Wang, Kechen Song, et al. RENet: Rectangular Convolution Pyramid and Edge Enhancement Network for Salient Object Detection of Pavement Cracks [J]. Measurement, 2021, 170, 108698. (paper) | |
Relevance-aware and Cross-reasoning Network (RCN) | |
This paper proposes a relevance-aware and cross-reasoning network (RCN) for anomaly segmentation of pavement defects, which can segment defects using merely non-defective images for training. A relevance-aware transformer-based encoder is first devised to model intrinsic interdependencies across local features, thus improving representations of complex non-defective images. Next, a dual decoder strategy is proposed to remap the encoder-generated latent dependencies at the local semantic and global detailed levels, respectively. Specifically, a cross-reasoning refinement module is built in the local decoder to reason the crossrelationship between spatial and channel dimensions. Finally, a context-aware abnormal distillation measurement is developed to evaluate the semantic reconstruction deviations during the inference. Under the guidance of semantic affinity, this measurement allows our model to highlight defective areas adaptively. Extensive experimental results on four datasets indicate that RCN outperforms other leading anomaly segmentation methods. |
|
Yanyan Wang, Menghui Niu, Kechen Song, et al. Normal-knowledge-based Pavement Defect Segmentation Using Relevance-aware and Cross-reasoning Mechanisms [J]. IEEE Transactions on Intelligent Transportation Systems, 2022. (paper) |
Multi-Modal Image Analysis and Application
A Novel Visible-Depth-Thermal Image Dataset of Salient Object Detection
Visual perception plays an important role in industrial information field, especially in robotic grasping application. In order to detect the object to be grasped quickly and accurately, salient object detection (SOD) is employed to the above task. Although the existing SOD methods have achieved impressive performance, they still have some limitations in the complex interference environment of practical application. To better deal with the complex interference environment, a novel triple-modal images fusion strategy is proposed to implement SOD for robotic visual perception, namely visible-depth-thermal (VDT) SOD. Meanwhile, we build an image acquisition system under variable lighting scene and construct a novel benchmark dataset for VDT SOD (VDT-2048 dataset). Multiple modal images will be introduced to assist each other to highlight the salient regions. But, inevitably, interference will also be introduced. In order to achieve effective cross-modal feature fusion while suppressing information interference, a hierarchical weighted suppress interference (HWSI) method is proposed. The comprehensive experimental results prove that our method achieves better performance than the state-of-the-art methods. | |
Kechen Song, et al. A Novel Visible-Depth-Thermal Image Dataset of Salient Object Detection for Robotic Visual Perception [J]. IEEE/ASME Transactions on Mechatronics, 2022. (paper) (Dataset & Code) |
RGB-T Image Analysis Technology and Application: A Survey
RGB-T Image Analysis Technology and Application: A Survey | |
RGB-Thermal infrared (RGB-T) image analysis has been actively studied in recent years. In the past decade, it has received wide attention and made a lot of important research progress in many applications. This paper provides a comprehensive review of RGB-T image analysis technology and application, including several hot fields: image fusion, salient object detection, semantic segmentation, pedestrian detection, object tracking, and person re-identification. The first two belong to the preprocessing technology for many computer vision tasks, and the rest belong to the application direction. This paper extensively reviews 400+ papers spanning more than 10 different application tasks. Furthermore, for each specific task, this paper comprehensively analyzes the various methods and presents the performance of the state-of-the-art methods. This paper also makes an in-deep analysis of challenges for RGB-T image analysis as well as some potential technical improvements in the future. | |
Kechen Song, Ying Zhao, et al. RGB-T Image Analysis Technology and Application: A Survey [J]. Engineering Applications of Artificial Intelligence, 2023, 120, 105919. (paper) | |
RGB-T Salient Object Detection
A Variable Illumination Dataset: VI-RGBT1500 | |
We propose a variable illumination dataset named VI-RGBT1500 for RGBT image SOD. This is the first time that different illuminations are taken into account in the construction of the RGBT SOD dataset. Three illumination conditions, which are sufficient illumination, uneven illumination and insufficient illumination, are adopted to collect 1500 pairs of RGBT images. |
|
Kechen Song, Liming Huang, et al. Multiple Graph Affinity Interactive Network and A Variable Illumination Dataset for RGBT Image Salient Object Detection [J]. IEEE Transactions on Circuits and Systems for Video Technology, 2023. (paper)(code & dataset) | |
TAGFNet: | |
The current RGB-T datasets contain only a tiny amount of low-illumination data. The RGB-T SOD method trained based on these RGB-T datasets does not detect the salient objects in extremely low-illumination scenes very well. To improve the detection performance of low-illumination data, we can spend a lot of labor to label low-illumination data, but we tried another new idea to solve the problem by making full use of the characteristics of Thermal (T) images. Therefore, we propose a T-aware guided early fusion network for cross-illumination salient object detection. | |
Han Wang, Kechen Song, et al. Thermal Images-Aware Guided Early Fusion Network for Cross-Illumination RGB-T Salient Object Detection [J]. Engineering Applications of Artificial Intelligence, 2023, 118, 105640. (paper)(code) | |
CGFNet: Cross-Guided Fusion Network for RGB-T Salient Object Detection | |
A novel Cross-Guided Fusion Network (CGFNet) for RGB-T salient object detection is proposed. Specifically, a Cross-Scale Alternate Guiding Fusion (CSAGF) module is proposed to mine the high-level semantic information and provide global context support. Subsequently, we design a Guidance Fusion Module (GFM) to achieve sufficient cross-modality fusion by using single modal as the main guidance and the other modal as auxiliary. Finally, the Cross-Guided Fusion Module (CGFM) is presented and serves as the main decoding block. | |
Jie Wang, Kechen Song, et al. CGFNet: Cross-Guided Fusion Network for RGB-T Salient Object Detection [J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(5),2949-2961. (paper)(code)(ESI highly cited, 11/2022-1/2023) | |
Multi-graph Fusion and Learning for RGBT Image Saliency Detection | |
This research presents an unsupervised RGBT saliency detection method based on multi-graph fusion and learning. Firstly, RGB images and T images are adaptively fused based on boundary information to produce more accurate superpixels. Next, a multi-graph fusion model is proposed to selectively learn useful information from multi-modal images. Finally, we implement the theory of finding good neighbors in the graph affinity and propose different algorithms for two stages of saliency ranking. Experimental results on three RGBT datasets show that the proposed method is effective compared with the state-of-the-art algorithms. | |
Liming Huang, Kechen Song, et al. Multi-graph Fusion and Learning for RGBT Image Saliency Detection [J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022,32(3),1366-1377. (paper)(code)(ESI highly cited, 1/2023) | |
Unidirectional RGB-T salient object detection | |
The U-shaped encoder–decoder architecture based on CNNs has been rooted in salient object detection (SOD) tasks, and it have revealed two drawbacks while driving the rapid development of saliency detection. (1) The inherent characteristics of CNNs dictate that it is difficult to learn long-range dependencies and model global correlations. (2) For the common purpose of improving the performance of saliency detection, the encoder and decoder should complement each other and work together. However, the existing encoder–decoder architecture treats encoder and decoder independently of each other. Specifically, the encoder is responsible for extracting features and the decoder fuses multi-level or multi-modal features to produce prediction maps. That is, the encoder alone needs to be responsible for the decoder, while the valuable information after the decoder fusion will not facilitate feature extraction. Therefore, we propose a unidirectional RGB-T salient object detection network with intertwined driving of encoding and fusion to solve the above problems. |
|
Jie Wang, Kechen Song, et al. Unidirectional RGB-T salient object detection with intertwined driving of encoding and fusion [J]. Engineering Applications of Artificial Intelligence, 2022, 114, 105162. (paper) | |
MCFNet | |
We propose a novel Modal Complementary Fusion Network (MCFNet) to alleviate the contamination effect of low-quality images from both global and local perspectives. Specifically, we design a modal reweight module (MRM) to evaluate the global quality of images and adaptively reweight RGB-T features by explicitly modelling interdependencies between RGB and thermal images. Furthermore, we propose a spatial complementary fusion module (SCFM) to explore the complementary local regions between RGB-T images and selectively fuse multi-modal features. Finally, multi-scale features are fused to obtain the salient detection result. | |
Shuai Ma, Kechen Song, et al. Modal Complementary Fusion Network for RGB-T Salient Object Detection [J]. Applied Intelligence, 2023, 53, 9038-9055. (paper) (code) | |
Low-rank Tensor Learning and Unified Collaborative Ranking | |
We propose a novel RGB-T saliency detection method in this letter. To this end, we first regard superpixels as graph nodes and calculate the affinity matrix for each feature. Then, we propose a low-rank tensor learning model for the graph affinity, which can suppress redundant information and improve the relevance of similar image regions. Finally, a novel ranking algorithm is proposed to jointly obtain the optimal affinity matrix and saliency values under a unified structure. Test results on two RGB-T datasets illustrate the proposed method performs well when against the state-of-the-art algorithms. | |
Liming Huang, Kechen Song, et al. RGB-T Saliency Detection via Low-rank Tensor Learning and Unified Collaborative Ranking [J]. IEEE Signal Processing Letters, 2020, 27,1585-1589. (paper) (code and datasets) | |
IFFNet | |
A novel information flow fusion network (IFFNet) method is proposed for the RGB-T cross-modal images. The proposed IFFNet consists of an information filtering module and a novel information flow paradigm. Validation on three available RGB-T salient object detection datasets shows that our proposed method performs more competitive than the state-of-the-art methods. | |
Kechen Song, Liming Huang, et al. A Potential Vision-Based Measurements Technology: Information Flow Fusion Detection Method Using RGB-Thermal Infrared Images [J]. IEEE Transactions on Instrumentation & Measurement, 2023, 72, 5004813. (paper) (code) | |
GRNet: Cross-Modality Salient Object Detection Network with Universality and Anti-interference | |
Although cross-modality salient object detection has achieved excellent results, the current methods need to be improved in terms of universality and anti-interference. Therefore, we propose a cross-modality salient object detection network with universality and anti-interference. First, we offer a feature extraction strategy to enhance the features in the feature extraction stage. It can promote the mutual improvement of different modal information and avoid the influence of interference on the subsequent process. Then we use the graph mapping reasoning module (GMRM) to infer the high-level semantics to obtain valuable information. It enables our proposed method to accurately locate the objects in different scenes and interference to improve the universality and anti-interference of the method. Finally, we adopt a mutual guidance fusion module (MGFM), including a modality adaptive fusion module (MAFM) and across-level mutual guidance fusion module (ALMGFM), to carry out an efficient and reasonable fusion of multi-scale and multi-modality information. | |
Hongwei Wen, Kechen Song,et al. Cross-Modality Salient Object Detection Network with Universality and Anti-interference [J]. Knowledge-Based Systems, 2023, 264, 110322. (paper) (code) | |
RGB-T Few-shot Semantic Segmentation
V-TFSS | |
Few-shot semantic segmentation (FSS) has drawn great attention in the community of computer vision, due to its remarkable potential for segmenting novel objects with few pixel-annotated samples. However, some interference factors, such as insufficient illumination and complex background, can impose more challenge to the segmentation performance than fully-supervised when the number of samples is insufficient. Therefore, we propose the visible and thermal (V-T) few-shot semantic segmentation task, which utilize the complementary and similar information of visible and thermal images to boost few-shot segmentation performance. | |
Yanqi Bao, Kechen Song, et al. Visible and Thermal Images Fusion Architecture for Few-shot Semantic Segmentation [J]. Journal of Visual Communication and Image Representation, 2021, 80, 103306. (paper)(code and datasets) |
|
RGB-T Object Tracking
Learning Discriminative Update Adaptive Spatial-Temporal Regularized Correlation Filter for RGB-T Tracking | |
We propose a novel adaptive spatial-temporal regularized correlation filter model to learn an appropriate regularization for achieving robust tracking and a relative peak discriminative method for model updating to avoid the model degradation. Besides, to make better integrate the unique advantages of the two modes and adapt the changing appearance of the target, an adaptive weighting ensemble scheme and a multi-scale search mechanism are adopted, respectively. To optimize the proposed model, we designed an efficient ADMM algorithm, which greatly improved the efficiency. Extensive experiments have been carried out on two available datasets, RGBT234 and RGBT210, and the experimental results indicate that the tracker proposed by us performs favorably in both accuracy and robustness against the state-of-the-art RGB-T trackers. | |
Mingzheng Feng, Kechen Song, et al. Learning Discriminative Update Adaptive Spatial-Temporal Regularized Correlation Filter for RGB-T Tracking [J]. Journal of Visual Communication and Image Representation, 2020, 72, 102881. (paper) |
|
Robotic Visual Grasping Detection
Robotic Visual Grasping Detection
Journal Papers(incomplete):
1. Ling Tong, Kechen Song*, Hongkun Tian, Yi Man, Yunhui Yan and Qinggang Meng. A Novel RGB-D Cross-Background Robot Grasp Detection Dataset and Background-Adaptive Grasping Network [J]. IEEE Transactions on Instrumentation & Measurement, 2024, 73, 9511615.
2. Hongkun Tian, Kechen Song*, Ling Tong, Yi Man and Yunhui Yan. Robot Unknown Objects Instance Segmentation based on Collaborative Weight Assignment RGB-Depth Fusion Strategy [J]. IEEE/ASME Transactions on Mechatronics, 2024, 29(3), 2032-2043.
3. Yunhui Yan, Hongkun Tian, Kechen Song*, Yuntian Li, Yi Man, Ling Tong. Transparent Object Depth Perception Network for Robotic Manipulation Based on Orientation-aware Guidance and Texture Enhancement [J]. IEEE Transactions on Instrumentation & Measurement, 2024, 73, 7505711.
4. Yunhui Yan, Ling Tong, Kechen Song*, Hongkun Tian, Yi Man, Wenkang Yang. SISG-Net: Simultaneous Instance Segmentation and Grasp Detection for Robot Grasp in Clutter [J]. Advanced Engineering Informatics, 2023, 58, 102189.
5. Hongkun Tian, Kechen Song*, Jing Xu, Shuai Ma, Yunhui Yan. Antipodal-Points-aware Dual-decoding Network for Robotic Visual Grasp Detection Oriented to Multi-object Clutter Scenes [J]. Expert Systems With Applications, 2023, 230, 120545.
6. Ling Tong, Kechen Song*, Hongkun Tian, Yi Man, Yunhui Yan, Qinggang Meng. SG-Grasp: Semantic Segmentation Guided Robotic Grasp Oriented to Weakly Textured Objects Based on Visual Perception Sensors [J]. IEEE Sensors Journal, 2023, 23(22), 28430-28441.
7. Hongkun Tian, Kechen Song*, Song Li, Shuai Ma, Jing Xu and Yunhui Yan. Data-driven Robotic Visual Grasping Detection for Unknown Objects A Problem-oriented Review [J]. Expert Systems With Applications, 2023, 211, 118624.
8. Hongkun Tian, Kechen Song*, Song Li, Shuai Ma, Yunhui Yan. Rotation Adaptive Grasping Estimation Network Oriented to Unknown Objects Based on Novel RGB-D Fusion Strategy [J]. Engineering Applications of Artificial Intelligence, 2023,120, 105842.
9. Hongkun Tian, Kechen Song*, Song Li, Shuai Ma and Yunhui Yan. Light-weight Pixel-wise Generative Robot Grasping Detection Based on RGB-D Dense Fusion [J]. IEEE Transactions on Instrumentation and Measurement, 2022,71, 5017912.
RGB-D cross-background robot grasp detection dataset (CBRGD) | |
This paper presents a novel RGB-D cross-background robot grasp detection dataset (CBRGD). The dataset aims to enhance the performance of intelligent robots when confronted with diverse and dynamic scenarios. The CBRGD dataset consists of seven different common backgrounds and 58 common objects, covering various grasping scenarios. Alongside the dataset, we also propose a background-adaptive robot multiobject grasp detection network named BA-Grasp. Using the bimodal foreground activation strategy (BFAS), we can suppress background noise while highlighting the target object in the foreground. The multiscale foreground region features enhancement module (MFRFEM), on the other hand, allows the network to adapt to objects of different sizes while highlighting important areas in the foreground. Extensive experiments indicate that our robotic grasping method demonstrates significant advantages when facing various scenarios and complex, variable objects. The proposed method achieved the accuracies of 97.8% and 96.7% on the publicly available Cornell dataset and 94.6% on the Jacquard dataset, which are comparable to the SOTA method. Moreover, on the CBRGD dataset, the proposed method demonstrated an average accuracy improvement of 5%–10% over the SOTA method. Robot grasping experiments validate that our method can maintain high grasping accuracy in real-world applications. Experimental videos and the CBRGD dataset can be found at the following address: https://github.com/meiguiz/BA-Grasp. | |
Ling Tong, Kechen Song, Hongkun Tian, Yi Man, Yunhui Yan and Qinggang Meng. A Novel RGB-D Cross-Background Robot Grasp Detection Dataset and Background-Adaptive Grasping Network [J]. IEEE Transactions on Instrumentation & Measurement, 2024, 73, 9511615. (paper) (code) | |
Robot Unknown Objects Instance Segmentation based on Collaborative Weight Assignment RGB-Depth Fusion Strategy | |
This paper proposes a collaborative weight assignment (CWA) fusion strategy for fusing RGB and Depth (RGB-D). It contains three carefully designed modules, motivational pixel weight assignment(MPWA) module, dual-direction spatial weight assignment (DSWA) module, and stepwise global feature aggregation (SGFA) module. Our method aims to adaptively assign fusion weights between two modalities to exploit RGB-D features from multiple dimensions better. On the popular Graspnet-1Bilion and WISDOM RGB-D robot operation datasets, the proposed method achieves competitive performance with state-of-the-art techniques, proving our approach can make good use of the information between the two modalities. Furthermore, we have deployed the fusion model on the AUBO i5 robotic manipulation platform to test its segmentation and grasping optimization effects oriented to unknown objects. The proposed method achieves robust performance through qualitative and quantitative analysis experiments.. | |
Hongkun Tian, Kechen Song, Ling Tong, Yi Man and Yunhui Yan. Robot Unknown Objects Instance Segmentation based on Collaborative Weight Assignment RGB-Depth Fusion Strategy [J]. IEEE/ASME Transactions on Mechatronics, 2024, 29(3), 2032-2043. (paper) | |
Transparent Object Depth Perception Network for Robotic Manipulation Based on Orientation-Aware Guidance and Texture Enhancement | |
Industrial robots frequently encounter transparent objects in their work environments. Unlike conventional objects, transparent objects often lack distinct texture features in RGB images and result in incomplete and inaccurate depth images. This presents a significant challenge to robotic perception and operation. As a result, many studies have focused on reconstructing depth data by encoding and decoding RGB and depth information. However, current research faces two limitations: insufficiently addressing challenges posed by textureless transparent objects during the encoding-decoding process and inadequate emphasis on capturing shallow characteristics and cross-modal interaction of RGB-D bimodal data. To overcome these limitations, this study proposes a depth perception network based on orientation-aware guidance and texture enhancement for robots to perceive transparent objects. The backbone network incorporates an orientation-aware guidance module to integrate shallow RGB-D features, providing prior direction. In addition, this study designs a multibranch, multisensory field interactive texture nonlinear enhancement architecture, inspired by human vision, to tackle the challenges presented by textureless transparent objects. The proposed approach is extensively validated on both public datasets and industrial robotics platforms, demonstrating highly competitive performance. |
|
Yunhui Yan, Hongkun Tian, Kechen Song, Yuntian Li, Yi Man, Ling Tong. Transparent Object Depth Perception Network for Robotic Manipulation Based on Orientation-aware Guidance and Texture Enhancement [J]. IEEE Transactions on Instrumentation & Measurement, 2024, 73, 7505711. (paper) | |
SISG-Net: Simultaneous Instance Segmentation and Grasp Detection | |
Robots have always found it challenging to grasp in cluttered scenes because of the complex background information and changing operating environment. Therefore, in order to enable robots to perform multi-object grasping tasks in a wider range of application scenarios, such as object sorting on industrial production lines and object manipulation by home service robots, we innovatively integrated segmentation and grasp detection into the same framework, and designed a simultaneous instance segmentation and grasp detection network (SISG-Net). By using the network, robots can better interact with complex environments and perform grasp tasks. In order to solve the problem of insufficient fusion of the existing RGBD fusion strategy in the robot field, we propose a lightweight RGB-D fusion module called SMCF to make modal fusion more efficient. In order to solve the problem of inaccurate perception of small objects in different scenes, we propose the FFASP module. Finally, we use the AFF module to adaptively fuse multi-scale features. Segmentation can remove noise information from the background, enabling the robot to grasp in different backgrounds with robustness. Using the segmentation result, we refine grasp detection and find the best grasp pose for robot grasping in complex scenes. Our grasp detection model performs similarly to state-of-the-art grasp detection algorithms on the Cornell Dataset. Our model achieves state-of-the-art performance on the OCID Dataset. We show that the method is stable and robust in real-world grasping experiments. The code and video of our experiment used in this paper can be found at: https://github.com/meiguiz/SISG-Net |
|
Yunhui Yan, Ling Tong, Kechen Song, Hongkun Tian, Yi Man, Wenkang Yang. SISG-Net: Simultaneous Instance Segmentation and Grasp Detection for Robot Grasp in Clutter [J]. Advanced Engineering Informatics, 2023, 58, 102189. (paper) (code) | |
Antipodal-Points-aware Dual-decoding Network for Robotic Visual Grasp Detection Oriented to Multi-object Clutter Scenes | |
It is challenging for robots to detect grasps with high accuracy and efficiency-oriented to multi-object clutter scenes, especially scenes with objects of large-scale differences. Effective grasping representation, full utilization of data, and formulation of grasping strategies are critical to solving the problem. To this end, this paper proposes an antipodal-points grasping representation model. Based on this, the Antipodal-Points-aware Dual-decoding Network (APDNet) is presented for grasping detection in multi-object scenes. APDNet employs an encoding–decoding architecture. The shared encoding strategy based on an Adaptive Gated Fusion Module (AGFM) is proposed in the encoder to fuse RGB-D multimodal data. Two decoding branches, namely StartpointNet and EndpointNet, are presented to detect antipodal points. To better focus on objects at different scales in multi-object scenes, a global multi-view cumulative attention mechanism, called Global Accumulative Attention Mechanism (GAAM), is also designed in this paper for StartpointNet. The proposed method is comprehensively validated and compared using a public dataset and real robot platform. On the GraspNet-1Billion dataset, the proposed method achieves 30.7%, 26.4%, and 12.7% accuracy at a speed of 88.4 FPS for seen, unseen, and novel objects, respectively. On the AUBO robot platform, the detection and grasp success rates are 100.0% and 95.0% on single-object scenes and 97.0% and 90.3% on multi-object scenes, respectively. It is demonstrated that the proposed method exhibits state-of-the-art performance with well-balanced accuracy and efficiency. |
|
Hongkun Tian, Kechen Song, Jing Xu, Shuai Ma, Yunhui Yan. Antipodal-Points-aware Dual-decoding Network for Robotic Visual Grasp Detection Oriented to Multi-object Clutter Scenes [J]. Expert Systems With Applications, 2023, 230, 120545. (paper) | |
SG-Grasp: Semantic Segmentation Guided Robotic Grasp Oriented to Weakly Textured Objects Based on Visual Perception Sensors | |
Weakly textured objects are frequently manipulated by industrial and domestic robots, and the most common two types are transparent and reflective objects; however, their unique visual properties present challenges even for advanced grasp detection algorithms. Many existing algorithms heavily rely on depth information, which is not accurately provided by ordinary red-green-blue and depth (RGB-D) sensors for transparent and reflective objects. To overcome this limitation, we propose an innovative solution that uses semantic segmentation to effectively segment weakly textured objects and guide grasp detection. By using only red-green-blue (RGB) images from RGB-D sensors, our segmentation algorithm (RTSegNet) achieves state-of-the-art performance on the newly proposed TROSD dataset. Importantly, our method enables robots to grasp transparent and reflective objects without requiring retraining of the grasp detection network (which is trained solely on the Cornell dataset). Real-world robot experiments demonstrate the robustness of our approach in grasping commonly encountered weakly textured objects; furthermore, results obtained from various datasets validate the effectiveness and robustness of our segmentation algorithm. Code and video are available at: https://github.com/meiguiz/SG-Grasp. | |
Ling Tong, Kechen Song, Hongkun Tian, Yi Man, Yunhui Yan, Qinggang Meng. SG-Grasp: Semantic Segmentation Guided Robotic Grasp Oriented to Weakly Textured Objects Based on Visual Perception Sensors [J]. IEEE Sensors Journal, 2023, 23(22), 28430-28441. (paper) (code) | |
Data-driven robotic visual grasping detection for unknown objects | |
This paper presents a comprehensive survey of data-driven robotic visual grasping detection (DRVGD) for unknown objects. We review both object-oriented and scene-oriented aspects, using the DRVGD for unknown objects as a guide. Object-oriented DRVGD aims for the physical information of unknown objects, such as shape, texture, and rigidity, which can classify objects into conventional or challenging objects. Scene-oriented DRVGD focuses on unstructured scenes, which are explored in two aspects based on the position relationships of objectto-object, grasping isolated or stacked objects in unstructured scenes. In addition, this paper provides a detailed review of associated grasping representations and datasets. Finally, the challenges of DRVGD and future directions are pointed out. | |
Hongkun Tian, Kechen Song, et al. Data-driven Robotic Visual Grasping Detection for Unknown Objects A Problem-oriented Review [J]. Expert Systems With Applications, 2023, 211, 118624. (paper) | |
Rotation Adaptive Grasping Estimation Network Oriented to Unknown Objects Based on Novel RGB-D Fusion Strategy | |
This paper proposes a framework for rotation adaptive grasping estimation based on a novel RGB-D fusion strategy. Specifically, the RGB-D is fused with shared weights in stages based on the proposed Multi-step Weight-learning Fusion (MWF) strategy. The spatial position is encoding learned autonomously based on the proposed Rotation Adaptive Conjoin (RAC) encoder to achieve spatial and rotational adaptiveness oriented to unknown objects with unknown poses. In addition, the Multi-dimensional Interaction-guided Attention (MIA) decoding strategy based on the fused multiscale features is proposed to highlight the practical elements and suppress the invalid ones. The method has been validated on the Cornell and Jacquard grasping datasets with cross-validation accuracies of 99.3% and 94.6%. The single-object and multi-object scene grasping success rates on the robot platform are 95.625% and 87.5%, respectively. | |
Hongkun Tian, Kechen Song, et al. Rotation Adaptive Grasping Estimation Network Oriented to Unknown Objects Based on Novel RGB-D Fusion Strategy [J]. Engineering Applications of Artificial Intelligence, 2023. (paper) |
|
Lightweight Pixel-Wise Generative Robot Grasping Detection | |
Grasping detection is one of the essential tasks for robots to achieve automation and intelligence. The existing grasp detection mainly relies on data-driven discriminative and generative strategies. Generative strategies have significant advantages over discriminative strategies in terms of efficiency. RGB and depth (RGB-D) data are widely used in grasping data sources due to the sufficient amount of information and low cost of acquisition. RGB-D fusion has shown advantages over only using RGB or depth. However, existing research has mainly focused on early fusion and late fusion, which is challenging to utilize information from both modalities fully. Improving the accuracy of grasping while leveraging the knowledge of both modalities and ensuring lightweight and real time is crucial. Therefore, this article proposes a pixel-wise RGB-D dense fusion method based on a generative strategy. The technique is doubly experimentally validated on public datasets and real robot platform. Accuracy rates of 98.9% and 94.0% are achieved on Cornell and Jacquard datasets, and the efficiency of only 15 ms is achieved for single-image processing. The average success rate of the AUBO i5 robotic platform with DH-AG-95 parallel gripper reached 94.0% for single-object scenes, 86.7% for three-object scenes, and 84% for five-object scenes. Our approach has outperformed existing state-of-the-art methods. | |
Hongkun Tian, Kechen Song, et al. Light-weight Pixel-wise Generative Robot Grasping Detection Based on RGB-D Dense Fusion [J]. IEEE Transactions on Instrumentation and Measuremente, 2022,71, 5017912. (paper) (video) |
Multi-Exposure Fusion for Curved Workpieces
Multi-Exposure Fusion for Curved Workpieces
CW-MEF dataset: | |
To fill the gap of MEF datasets in the industrial field, a novel curved workpieces dataset called CW-MEF for the MEF task is proposed. The samples in the dataset have been carefully selected to cover mainstream mechanical workpieces, critical parts like engine blades, and hardware tools that are very common in daily life. All samples have a characteristic that is prone to produce reflection under the light, so the workpiece samples we selected are highly representative. The CW-MEF dataset is divided into 44 categories and contains a total of 4113 images, all of which are 1280 × 1024 in size. The dataset is available at: https://github.com/VDT-2048/CW-MEF. | |
Chongyan Sun, Kechen Song, et al. A Multi-Exposure Fusion Method for Reflection Suppression of Curved Workpieces [J]. IEEE Transactions on Instrumentation and Measuremente,2022,71,5021104. (paper) (code & dataset) |
|