研究简报

基于BiLSTM-CRF的中文地质时间信息抽取

  • 刘文聪 ,
  • 张春菊 ,
  • 汪陈 ,
  • 张雪英 ,
  • 朱月琴 ,
  • 焦守涛 ,
  • 鲁艳旭
展开
  • 1.合肥工业大学土木与水利工程学院,安徽 合肥 230009
    2.南京师范大学虚拟地理环境教育部重点 实验室,江苏 南京 210023
    3.自然资源部城市国土资源监测与仿真重点实验室,广东 深圳 518034
    4.中国地质调查局发展研究中心,北京 100037
刘文聪(1998-),女,安徽合肥人,硕士研究生,主要从事地图制图学与地理信息工程研究. E-mail:2019110618@mail.hfut.edu.cn
张春菊(1984-),女,安徽宿州人,副教授,主要从事地理信息智能处理与服务研究. E-mail:zcjtwz@sina.com

收稿日期: 2020-11-04

  修回日期: 2021-01-15

  网络出版日期: 2021-04-19

基金资助

自然资源部城市国土资源监测与仿真重点实验室开放基金“面向国土资源信息感知的定性位置空间语义计算”(KF-2020-05-084);国家自然科学基金项目“‘文本—地图’结合的地理知识图谱构建方法”(41971337)

Geological Time Information Extraction from Chinese Text Based on BiLSTM-CRF

  • Wencong LIU ,
  • Chunju ZHANG ,
  • Chen WANG ,
  • Xueying ZHANG ,
  • Yueqin ZHU ,
  • Shoutao JIAO ,
  • Yanxu LU
Expand
  • 1.School of Civil Engineering,Hefei University of Technology,Hefei 230009,China
    2.MOE Key Laboratory of Virtual Geographical Environment,Nanjing Normal University,Nanjing 210023,China
    3.Key Laboratory of Urban Land Resources Monitoring and Simulation,Shenzhen Guangdong 518034,China
    4.Development Research Center,China Geological Survey,Beijing 100037,China
LIU Wencong (1998-), female, Hefei City, Anhui Province, Associate professor. Research areas include cartography and geographical information engineering. E-mail:2019110618@mail.hfut.edu.cn
ZHANG Chunju (1984-), female, Suzhou City, Anhui Province, Associate professor. Research areas include the intelligent processing and service of geographic information. E-mail:zcjtwz@sina.com

Received date: 2020-11-04

  Revised date: 2021-01-15

  Online published: 2021-04-19

Supported by

the Open Fund of Key Laboratory of Urban Land Resources Monitoring and Simulation, Ministry of Natural Resources "The spatial semantic computing for qualitative location of land resources information perception"(KF-2020-05-084);The National Natural Science Foundation of China "Geographical knowledge graph construction method combining text and map"(41971337)

摘要

时间信息贯穿于地质现象和事件产生、发展、消亡的整个过程中,反映了地质现象和事件的状态和演变过程。特别是,地质时间表达通常与成矿内在机制和时空演化规律有关。设计并实现了基于深度学习的通用时间和地质时间信息抽取方法。结合地质矿产文本中时间信息的描述特点,将时间信息划分为通用时间信息与地质时间信息两种类型,并对两种时间信息类型进行细分;基于自主研发的“交互式矿产信息标注软件”,采用交叉验证及意见反馈模式构建了地质时间信息语料库;实现了基于双向长短期记忆神经网络—条件随机场(BiLSTM-CRF)的时间信息抽取方法;并与主流的卷积神经网络(CNN)和条件随机场(CRF)模型的抽取结果进行了比较。实验结果表明,基于双向长短期记忆神经网络—条件随机场的时间信息抽取效果最好,对总体时间抽取的F1值达到95.49%,较好地解决了地质文本中时间信息的规范化表达和结构化抽取问题。

本文引用格式

刘文聪 , 张春菊 , 汪陈 , 张雪英 , 朱月琴 , 焦守涛 , 鲁艳旭 . 基于BiLSTM-CRF的中文地质时间信息抽取[J]. 地球科学进展, 2021 , 36(2) : 211 -220 . DOI: 10.11867/j.issn.1001-8166.2021.017

Abstract

Time information runs through the entire process of the creation, development and extinction of geological entities, reflecting the state and evolution of geological entities. In particular, the expression of geological time is usually related to metallogenetic mechanism and space time evolution regularity. This paper designs and implements a universal time and geological time information extraction method based on deep learning methods. Combining the description characteristics of time information in the Chinese text of geological and mineral resources, the time information in geological reports and documentation is divided into two types: universal time information and geological time information, and the two types of time information are subdivided. The self-developed geological time information corpus is constructed using cross-validation and opinion feedback mode. The time information extraction method based on BiLSTM-CRF is realized, and this method is compared with CNN and CRF. The experimental results show that the BiLSTM-CRF model is better than the mainstream model in time information extraction, and the F1-Measure of the overall time extraction reaches 95.49%, which solves the problem of standardized expression and structured extraction of time information in geological text.

参考文献

1 ZHOU Yongzhang, CHEN Shuo, ZHANG Qi, et al. Advances and prospects of big data and mathematical geoscience[J]. Acta Petrologica Sinica,2018,34(2):255-263.
1 周永章,陈烁,张旗,等.大数据与数学地球科学研究进展——大数据与数学地球科学专题代序[J].岩石学报,2018,34(2):255-263.
2 CHEN Yanjing, PIRAJNO F, LAI Yong, et al.Metallogenic time and tectonic setting of the Jiaodong gold province[J]. Acta Petrologica Sinica, 2004, 20(4):907-920.
2 陈衍景, PIRAJNO Franco, 赖勇, 等.胶东矿集区大规模成矿时间和构造环境[J].岩石学报, 2004, 20(4):907-920.
3 ZHANG Qi, XUE Chunji, ZHAO Xiaobo, et al.Geology, geochemistry and metallogenic epoch of the Katebasu large-sized gold deposit, Western Tianshan Mountains, Xinjiang[J].Geology in China, 2015, 42(3):411-437.
3 张祺, 薛春纪, 赵晓波, 等.新疆西天山卡特巴阿苏大型金矿床地质地球化学和成岩成矿年代[J].中国地质, 2015, 42(3):411-437.
4 ZHANG Xueying, ZHANG Chunju, WU Mingguang, et al.Spatiotemporal features based geographical knowledge graph construction[J]. Scientia Sinica(Informationis), 2020, 50(7):1 019-1 032.
4 张雪英, 张春菊, 吴明光, 等.顾及时空特征的地理知识图谱构建方法[J].中国科学:信息科学, 2020, 50(7):1 019-1 032.
5 LI Lubiao, ZHANG Yinsheng, WANG Huilin.Application of TimeML in the text time relation resolution[J]. China Science & Technology Resources Review, 2014, 46(5):95-103.
5 李路标, 张寅生, 王惠临.TimeML在文本时间关系解析中的应用[J]. 中国科技资源导刊, 2014, 46(5):95-103.
6 VERHAGEN M, SAURí R, CASELLI T, et al. SemEval-2010 task 13: TempEval-2[C]∥Proceedings of the 5th International Workshop on Semantic Evaluation.Uppsala,Sweden,2010: 57-62.
7 ZHANG Chunju. Interpretation of event spatio-temporal and attribute information in chinese text[J]. Acta Geodaetica et Cartographica Sinica, 2015, 44(5):590.
7 张春菊.面向中文文本的事件时空与属性信息解析方法研究[J].测绘学报, 2015, 44(5):590.
8 WU Qiong.Research on automatic recognition of chinese time expression[D]. Dalian:Dalian University of Technology, 2015.
8 吴琼.中文时间表达式自动识别的研究[D].大连:大连理工大学, 2015.
9 SONG Guomin, ZHANG Sanqiang, JIA Fenli, et al. Temporal information extraction and normalization method in chinese texts[J]. Journal of Geomatics Science and Technology, 2019, 36(5):538-544.
9 宋国民, 张三强, 贾奋励, 等.中文文本中时间信息抽取及规范化方法[J].测绘科学技术学报, 2019, 36(5):538-544.
10 MA Leilei, LI Hongwei, WEI Yong, et al. Chinese text temporal expression recognition and normalization method based on rules[J]. Journal of Information Engineering University, 2017, 18(5):560-565.
10 马雷雷, 李宏伟, 魏勇, 等.基于规则的中文文本时间表达式识别和规范化方法[J].信息工程大学学报, 2017, 18(5):560-565.
11 ZHANG Chunju, ZHANG Xueying, LI Ming, et al. Interpretation of temporal information in chinese text[J]. Geography and Geo-Information Science, 2014, 30(6):1-7.
11 张春菊, 张雪英, 李明, 等.中文文本中时间信息解析方法[J].地理与地理信息科学, 2014, 30(6):1-7.
12 YAN Zifei, JI Donghong. Exploration of chinese temporal information extraction based on CRF and semi-supervised learning[J]. Computer Engineering and Design, 2015,36(6):1 642-1 646.
12 闫紫飞, 姬东鸿.基于CRF和半监督学习的中文时间信息抽取[J].计算机工程与设计, 2015,36(6):1 642-1 646.
13 QUE Xiang. Geological spatiotemporal data model for dynamic process simulating and real-time expression[D]. Wuhan:China University of Geosciences, 2015.
13 阙翔.面向动态过程模拟和实时表达的地质时空数据模型研究[D].武汉:中国地质大学, 2015.
14 LIU Gang, WU Chonglong, HE Zhenwen, et al. Data model for geological spatiotemporal big data expression and storage management[J]. Bulletin of Geological Science and Technology, 2020, 39(1):164-174.
14 刘刚, 吴冲龙, 何珍文, 等.面向地质时空大数据表达与存储管理的数据模型研究[J].地质科技通报, 2020, 39(1):164-174.
15 ZHANG Xueying, YE Peng, WANG Shu, et al. Geological entity recognition method based on Deep Belief Networks[J]. Acta Petrologica Sinica, 2018, 34(2):343-351.
15 张雪英, 叶鹏, 王曙, 等.基于深度信念网络的地质实体识别方法[J].岩石学报, 2018, 34(2):343-351.
16 JONES C B, PURVES R S. Geographical information retrieval[J]. International Journal of Geographical Information Science, 2008, 22(3):219-228.
17 LIU Z J, TANG B Z, WANG X L, et al. CMedTEX: A rule-based temporal expression extraction and normalization system for chinese clinical notes[C]. AMIA Annual Symposium Proceedings, 2016:818-826.
18 LIU Shuaishi, CHENG Xi, GUO Wenyan, et al.Progress report on new research in deep learning[J]. CAAI Transactions on Intelligent Systems, 2016, 11(5):567-577.
18 刘帅师, 程曦, 郭文燕, 等.深度学习方法研究新进展[J].智能系统学报, 2016, 11(5):567-577.
19 LE Xiaoqiu, YANG Chongjun, YU Wenyang. Spatial concept extraction based on spatial semantic role in natural language[J].Geomatics and Information Science of Wuhan University, 2005, 30(12):1 100-1 103.
19 乐小虬, 杨崇俊, 于文洋.基于空间语义角色的自然语言空间概念提取[J].武汉大学学报:信息科学版, 2005, 30(12):1 100-1 103.
20 YANG Jinfeng, YU Qiubin, GUAN Yi, et al. An overview of research on electronic medical record oriented named entity recognition and entity relation extraction[J]. Acta Automatica Sinica, 2014, 40(8):1 537-1 562.
20 杨锦锋, 于秋滨, 关毅, 等.电子病历命名实体识别和实体关系抽取研究综述[J].自动化学报, 2014, 40(8):1 537-1 562.
21 LAFFERTY J D, MCCALLUM A, PEREIRA F. Condictional random fields: Probabilistic models for segmenting and labeling sequence data[J]. International Conference on Machine Learning, 2001, 3(2):282-289.
22 CHEN Jingwen, CHEN Jianguo, WANG Chengbin, et al. Research on segmentation of geological mineral text using conditional random fields[J]. China Mining Magazine, 2018, 27(9):907-922.
22 陈婧汶, 陈建国, 王成彬, 等.基于条件随机场的地质矿产文本分词研究[J].中国矿业, 2018, 27(9):907-922.
23 ZHOU Yongzhang, LI Peixing, WANG Shugong, et al. Research progress on big data and intelligent modelling of mineral deposits[J]. Bulletin of Mineralogy, Petrology and Geochemistry,2017,36(2):327-331,344.
23 周永章,黎培兴,王树功, 等.矿床大数据及智能矿床模型研究背景与进展[J].矿物岩石地球化学通报,2017,36(2):327-331,344.
24 ZENG D J, LIU K, LAI S W, et al. Relation classification via convolutional deep neural network[C]// The 25th International Conference on Computational Linguistics. Dublin:Proceedings of COLING,2014: 2 335-2 344.
25 LI Hang. Statistical learning method[M]. Beijing:Tsinghua University Press, 2019.
25 李航.统计学习方法[M]. 2版. 北京:清华大学出版社, 2019.
26 ZHOU Yongzhang, WANG Jun, ZUO Renguang, et al. Machine learning, deep learning and Python language in field of geology[J]. Acta Petrologica Sinica, 2018, 34(11):3 173-3 178.
26 周永章, 王俊, 左仁广, 等.地质领域机器学习、深度学习及实现语言[J].岩石学报, 2018, 34(11): 3 173-3 178.
27 QIU Q J, XIE Z, WU L. BiLSTM-CRF for geological named entity recognition from the geoscience literature[J].Earth Science Informatics, 2019, 12(4):565-579.
28 QIU Q J, XIE Z, WU L. GNER: A generative model for geological named entity recognition without labeled data using deep learning[J]. Earth and Space Science, 2019, 6(6):931-946.
29 DONG Shaochun, YIN Hongwei, XU Gang. Heterogeneous data searching based on geologic time ontology[J]. Journal of Geo-Information Science, 2010, 12(2):2 194-2 199.
29 董少春, 尹宏伟, 许刚.地质时间本体在异构数据检索中的应用[J].地球信息科学学报, 2010, 12(2):2 194-2 199.
30 WAN Xiaoqiao, WANG Chengshan, WU Huaichun, et al.From stratigraphy to earthtime[J]. Earth Science Frontiers,2014,21(2):1-7.
30 万晓樵, 王成善, 吴怀春, 等.从地层到地时[J].地学前缘,2014,21(2):1-7.
31 HOU Zhiwei, ZHU Yunqiang, GAO Ying, et al. Geologic time scale ontology and its applications in semantic retrieval[J]. Journal of Geoinformation Science, 2018, 20(1):17-27.
31 侯志伟, 诸云强, 高楹, 等.地质年代本体及其在语义检索中的应用[J].地球信息科学学报, 2018, 20(1):17-27.
32 LIU Baojun, LI Tingdong. Some problems of geology[J]. Advances in Earth Science, 2001, 16(5):607-616.
32 刘宝珺, 李廷栋.地质学的若干问题[J].地球科学进展, 2001, 16(5):607-616.
33 ZHANG Shouxin. Chinese code of stratigraphic nomenclature (Commendation)[J]. Journal of University of Chinese Academy of Sciences,2005, 22(5):604-623.
33 张守信.中国地层标准化的建议——中国地层命名法规的建议[J].中国科学院研究生院学报,2005, 22(5):604-623.
34 HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9(8): 1 735-1 780.
35 BAI Bing, HOU Xia, SHI Song.Named entity recognition method based on CRF and BI-LSTM[J]. Journal of Beijing Information Science & Technology University, 2018, 33(6):27-33.
35 柏兵, 侯霞, 石松.基于CRF和BI-LSTM的命名实体识别方法[J].北京信息科技大学学报: 自然科学版, 2018, 33(6):27-33.
文章导航

/