科学研究
硕士论文

基于文本挖掘的地铁施工隐患分析及可视化研究

来源:   作者:  发布时间:2021年01月06日  点击量:

基于文本挖掘的地铁施工隐患

分析及可视化研究


黑永健


 

地铁施工隐患作为导致安全事故发生的直接原因,受到国家及地铁企业的高度重视。在信息化管理的背景下,各地地铁企业先后构建了各自的隐患排查系统。伴随着地铁的大规模建设,系统中积累了大量的、人工难以处理的、非结构化施工隐患文本记录。然而,目前这些文本数据仅仅用作存储和查询,文本中蕴含的可以反映地铁施工隐患规律的有价值的信息并没有被挖掘。

首先,本文提出了结合中文分词技术、TF-IDF(Term Frequency-Inverse Document Frequency)算法和LDA(Latent Dirichlet Allocation)主题模型算法的地铁施工隐患文本主题挖掘方法,基于真实历史数据,识别出地铁施工过程中客观存在的隐患信息;其次,本文以某地地铁2016-2018三年间的隐患文本记录为数据源,利用提出的文本主题挖掘方法,识别出了实际施工过程中存在的隐患类别和各类隐患的排查要点;最后,本文利用字段抽取和人工复核的方式将数据源中的每一条隐患排查记录与隐患类别进行匹配,进一步利用社会网络分析的方法,发现了在不同年份复现率较高的隐患、分布部位较广的隐患以及各类隐患在不同施工部位的发生次数,并将结果可视化展示。

本文针对海量的非结构化地铁施工隐患文本,提出将文本挖掘技术和可视化技术用于分析地铁施工隐患的思路,实现将抽象的文本数据转化成形象的可视化信息,辅助未来的隐患排查工作,为其提供数据支撑,可用于地铁企业编制隐患排查年鉴,并将可视化的分析结果用于工人安全培训,具备实际应用价值。



关键词:安全管理;地铁施工隐患;文本挖掘;LDA主题模型;数据可视化



Abstract

As the direct cause of safety accident, metro construction safety hazards are highly appreciated by our country and metro enterprises. In the context of informatization management, metro enterprises in different areas successively set their own hazards troubleshooting system. With the deep development of metro construction, the system has accumulated numerous unstructured text records on its safety hazards. However, currently, these records are used for store and querys only. Valuable information underlying these records haven’t been found out, which can reflect the law of metro construction safety hazards.

Firstly, combining Chinese Word Segmentation Technology, Term Frequency-Inverse Document Frequency (TF-IDF), and Latent Dirichlet Allocation (LDA), this thesis figures out a topic mining method to identify hazards in the metro construction progress on the basis with real historical data. Then, taking one city’s 2016-2018 metro construction hazards text records as data sources, this thesis recognizes the hazard categories and troubleshooting points during construction progress by using the proposed method. Finally, using field sampling and manual review, this thesis matches the screening hazard records in the data source with identified hazards catagories. Furthermore, with the help of social network analysis technology, the thesis reveals the various hazards occurring conditions of different years and constructing parts, and points out the corresponding occurrence times of various hazards in constructing parts, which are displayed visually as images.

The thesis aims to analyse the massive unstructured texts of metro construction hazards. The proposed method combines text mining and visualization technique, which can realize the transition from text data to visualized information. This method can provide data support for hazards screening in the future. It is also available for metro enterprises to make hazard troubleshooting almanac and use the visualized analysis results on workers safety training, which has important application value.


Keywords: Safety management; Metro construction safety hazards; Text Mining; Latent Dirichlet Allocation topic model; Data Visualization