地球科学进展 ›› 2021, Vol. 36 ›› Issue (2): 211 -220. doi: 10.11867/j.issn.1001-8166.2021.017

研究简报 上一篇    

基于 BiLSTM-CRF的中文地质时间信息抽取
刘文聪 1( ), 张春菊 1 , 3( ), 汪陈 1, 张雪英 2, 朱月琴 4, 焦守涛 4, 鲁艳旭 2   
  1. 1.合肥工业大学土木与水利工程学院,安徽 合肥 230009
    2.南京师范大学虚拟地理环境教育部重点 实验室,江苏 南京 210023
    3.自然资源部城市国土资源监测与仿真重点实验室,广东 深圳 518034
    4.中国地质调查局发展研究中心,北京 100037
  • 收稿日期:2020-11-04 修回日期:2021-01-15 出版日期:2021-04-13
  • 通讯作者: 张春菊 E-mail:2019110618@mail.hfut.edu.cn;zcjtwz@sina.com
  • 基金资助:
    自然资源部城市国土资源监测与仿真重点实验室开放基金“面向国土资源信息感知的定性位置空间语义计算”(KF-2020-05-084);国家自然科学基金项目“‘文本—地图’结合的地理知识图谱构建方法”(41971337)

Geological Time Information Extraction from Chinese Text Based on BiLSTM-CRF

Wencong LIU 1( ), Chunju ZHANG 1 , 3( ), Chen WANG 1, Xueying ZHANG 2, Yueqin ZHU 4, Shoutao JIAO 4, Yanxu LU 2   

  1. 1.School of Civil Engineering,Hefei University of Technology,Hefei 230009,China
    2.MOE Key Laboratory of Virtual Geographical Environment,Nanjing Normal University,Nanjing 210023,China
    3.Key Laboratory of Urban Land Resources Monitoring and Simulation,Shenzhen Guangdong 518034,China
    4.Development Research Center,China Geological Survey,Beijing 100037,China
  • Received:2020-11-04 Revised:2021-01-15 Online:2021-04-13 Published:2021-04-19
  • Contact: Chunju ZHANG E-mail:2019110618@mail.hfut.edu.cn;zcjtwz@sina.com
  • About author:LIU Wencong (1998-), female, Hefei City, Anhui Province, Associate professor. Research areas include cartography and geographical information engineering. E-mail: 2019110618@mail.hfut.edu.cn
  • Supported by:
    the Open Fund of Key Laboratory of Urban Land Resources Monitoring and Simulation, Ministry of Natural Resources "The spatial semantic computing for qualitative location of land resources information perception"(KF-2020-05-084);The National Natural Science Foundation of China "Geographical knowledge graph construction method combining text and map"(41971337)

时间信息贯穿于地质现象和事件产生、发展、消亡的整个过程中,反映了地质现象和事件的状态和演变过程。特别是,地质时间表达通常与成矿内在机制和时空演化规律有关。设计并实现了基于深度学习的通用时间和地质时间信息抽取方法。结合地质矿产文本中时间信息的描述特点,将时间信息划分为通用时间信息与地质时间信息两种类型,并对两种时间信息类型进行细分;基于自主研发的“交互式矿产信息标注软件”,采用交叉验证及意见反馈模式构建了地质时间信息语料库;实现了基于双向长短期记忆神经网络—条件随机场(BiLSTM-CRF)的时间信息抽取方法;并与主流的卷积神经网络(CNN)和条件随机场(CRF)模型的抽取结果进行了比较。实验结果表明,基于双向长短期记忆神经网络—条件随机场的时间信息抽取效果最好,对总体时间抽取的F1值达到95.49%,较好地解决了地质文本中时间信息的规范化表达和结构化抽取问题。

Time information runs through the entire process of the creation, development and extinction of geological entities, reflecting the state and evolution of geological entities. In particular, the expression of geological time is usually related to metallogenetic mechanism and space time evolution regularity. This paper designs and implements a universal time and geological time information extraction method based on deep learning methods. Combining the description characteristics of time information in the Chinese text of geological and mineral resources, the time information in geological reports and documentation is divided into two types: universal time information and geological time information, and the two types of time information are subdivided. The self-developed geological time information corpus is constructed using cross-validation and opinion feedback mode. The time information extraction method based on BiLSTM-CRF is realized, and this method is compared with CNN and CRF. The experimental results show that the BiLSTM-CRF model is better than the mainstream model in time information extraction, and the F1-Measure of the overall time extraction reaches 95.49%, which solves the problem of standardized expression and structured extraction of time information in geological text.

中图分类号: 

1 ZHOU Yongzhang, CHEN Shuo, ZHANG Qi, et al. Advances and prospects of big data and mathematical geoscience[J]. Acta Petrologica Sinica,2018,34(2):255-263.
ZHOU Yongzhang, CHEN Shuo, ZHANG Qi, et al. Advances and prospects of big data and mathematical geoscience[J]. Acta Petrologica Sinica,2018,34(2):255-263.
周永章,陈烁,张旗,等.大数据与数学地球科学研究进展——大数据与数学地球科学专题代序[J].岩石学报,2018,34(2):255-263.
周永章,陈烁,张旗,等.大数据与数学地球科学研究进展——大数据与数学地球科学专题代序[J].岩石学报,2018,34(2):255-263.
2 CHEN Yanjing, PIRAJNO F, LAI Yong, et al.Metallogenic time and tectonic setting of the Jiaodong gold province[J]. Acta Petrologica Sinica, 2004, 20(4):907-920.
CHEN Yanjing, PIRAJNO F, LAI Yong, et al.Metallogenic time and tectonic setting of the Jiaodong gold province[J]. Acta Petrologica Sinica, 2004, 20(4):907-920.
陈衍景, PIRAJNO Franco, 赖勇, 等.胶东矿集区大规模成矿时间和构造环境[J].岩石学报, 2004, 20(4):907-920.
陈衍景, PIRAJNO Franco, 赖勇, 等.胶东矿集区大规模成矿时间和构造环境[J].岩石学报, 2004, 20(4):907-920.
3 ZHANG Qi, XUE Chunji, ZHAO Xiaobo, et al.Geology, geochemistry and metallogenic epoch of the Katebasu large-sized gold deposit, Western Tianshan Mountains, Xinjiang[J].Geology in China, 2015, 42(3):411-437.
ZHANG Qi, XUE Chunji, ZHAO Xiaobo, et al.Geology, geochemistry and metallogenic epoch of the Katebasu large-sized gold deposit, Western Tianshan Mountains, Xinjiang[J].Geology in China, 2015, 42(3):411-437.
张祺, 薛春纪, 赵晓波, 等.新疆西天山卡特巴阿苏大型金矿床地质地球化学和成岩成矿年代[J].中国地质, 2015, 42(3):411-437.
张祺, 薛春纪, 赵晓波, 等.新疆西天山卡特巴阿苏大型金矿床地质地球化学和成岩成矿年代[J].中国地质, 2015, 42(3):411-437.
4 ZHANG Xueying, ZHANG Chunju, WU Mingguang, et al.Spatiotemporal features based geographical knowledge graph construction[J]. Scientia Sinica(Informationis), 2020, 50(7):1 019-1 032.
ZHANG Xueying, ZHANG Chunju, WU Mingguang, et al.Spatiotemporal features based geographical knowledge graph construction[J]. Scientia Sinica(Informationis), 2020, 50(7):1 019-1 032.
张雪英, 张春菊, 吴明光, 等.顾及时空特征的地理知识图谱构建方法[J].中国科学:信息科学, 2020, 50(7):1 019-1 032.
张雪英, 张春菊, 吴明光, 等.顾及时空特征的地理知识图谱构建方法[J].中国科学:信息科学, 2020, 50(7):1 019-1 032.
5 LI Lubiao, ZHANG Yinsheng, WANG Huilin.Application of TimeML in the text time relation resolution[J]. China Science & Technology Resources Review, 2014, 46(5):95-103.
LI Lubiao, ZHANG Yinsheng, WANG Huilin.Application of TimeML in the text time relation resolution[J]. China Science & Technology Resources Review, 2014, 46(5):95-103.
李路标, 张寅生, 王惠临.TimeML在文本时间关系解析中的应用[J]. 中国科技资源导刊, 2014, 46(5):95-103.
李路标, 张寅生, 王惠临.TimeML在文本时间关系解析中的应用[J]. 中国科技资源导刊, 2014, 46(5):95-103.
6 VERHAGEN M, SAURÍ R, CASELLI T, et al. SemEval-2010 task 13: TempEval-2[C]∥Proceedings of the 5th International Workshop on Semantic Evaluation.Uppsala,Sweden,2010: 57-62.
VERHAGEN M, SAURí R, CASELLI T, et al. SemEval-2010 task 13: TempEval-2[C]∥Proceedings of the 5th International Workshop on Semantic Evaluation.Uppsala,Sweden,2010: 57-62.
7 ZHANG Chunju. Interpretation of event spatio-temporal and attribute information in chinese text[J]. Acta Geodaetica et Cartographica Sinica, 2015, 44(5):590.
ZHANG Chunju. Interpretation of event spatio-temporal and attribute information in chinese text[J]. Acta Geodaetica et Cartographica Sinica, 2015, 44(5):590.
张春菊.面向中文文本的事件时空与属性信息解析方法研究[J].测绘学报, 2015, 44(5):590.
张春菊.面向中文文本的事件时空与属性信息解析方法研究[J].测绘学报, 2015, 44(5):590.
8 WU Qiong.Research on automatic recognition of chinese time expression[D]. Dalian:Dalian University of Technology, 2015.
WU Qiong.Research on automatic recognition of chinese time expression[D]. Dalian:Dalian University of Technology, 2015.
吴琼.中文时间表达式自动识别的研究[D].大连:大连理工大学, 2015.
吴琼.中文时间表达式自动识别的研究[D].大连:大连理工大学, 2015.
9 SONG Guomin, ZHANG Sanqiang, JIA Fenli, et al. Temporal information extraction and normalization method in chinese texts[J]. Journal of Geomatics Science and Technology, 2019, 36(5):538-544.
SONG Guomin, ZHANG Sanqiang, JIA Fenli, et al. Temporal information extraction and normalization method in chinese texts[J]. Journal of Geomatics Science and Technology, 2019, 36(5):538-544.
宋国民, 张三强, 贾奋励, 等.中文文本中时间信息抽取及规范化方法[J].测绘科学技术学报, 2019, 36(5):538-544.
宋国民, 张三强, 贾奋励, 等.中文文本中时间信息抽取及规范化方法[J].测绘科学技术学报, 2019, 36(5):538-544.
10 MA Leilei, LI Hongwei, WEI Yong, et al. Chinese text temporal expression recognition and normalization method based on rules[J]. Journal of Information Engineering University, 2017, 18(5):560-565.
MA Leilei, LI Hongwei, WEI Yong, et al. Chinese text temporal expression recognition and normalization method based on rules[J]. Journal of Information Engineering University, 2017, 18(5):560-565.
马雷雷, 李宏伟, 魏勇, 等.基于规则的中文文本时间表达式识别和规范化方法[J].信息工程大学学报, 2017, 18(5):560-565.
马雷雷, 李宏伟, 魏勇, 等.基于规则的中文文本时间表达式识别和规范化方法[J].信息工程大学学报, 2017, 18(5):560-565.
11 ZHANG Chunju, ZHANG Xueying, LI Ming, et al. Interpretation of temporal information in chinese text[J]. Geography and Geo-Information Science, 2014, 30(6):1-7.
ZHANG Chunju, ZHANG Xueying, LI Ming, et al. Interpretation of temporal information in chinese text[J]. Geography and Geo-Information Science, 2014, 30(6):1-7.
张春菊, 张雪英, 李明, 等.中文文本中时间信息解析方法[J].地理与地理信息科学, 2014, 30(6):1-7.
张春菊, 张雪英, 李明, 等.中文文本中时间信息解析方法[J].地理与地理信息科学, 2014, 30(6):1-7.
12 YAN Zifei, JI Donghong. Exploration of chinese temporal information extraction based on CRF and semi-supervised learning[J]. Computer Engineering and Design, 2015,36(6):1 642-1 646.
YAN Zifei, JI Donghong. Exploration of chinese temporal information extraction based on CRF and semi-supervised learning[J]. Computer Engineering and Design, 2015,36(6):1 642-1 646.
闫紫飞, 姬东鸿.基于CRF和半监督学习的中文时间信息抽取[J].计算机工程与设计, 2015,36(6):1 642-1 646.
闫紫飞, 姬东鸿.基于CRF和半监督学习的中文时间信息抽取[J].计算机工程与设计, 2015,36(6):1 642-1 646.
13 QUE Xiang. Geological spatiotemporal data model for dynamic process simulating and real-time expression[D]. Wuhan:China University of Geosciences, 2015.
QUE Xiang. Geological spatiotemporal data model for dynamic process simulating and real-time expression[D]. Wuhan:China University of Geosciences, 2015.
阙翔.面向动态过程模拟和实时表达的地质时空数据模型研究[D].武汉:中国地质大学, 2015.
阙翔.面向动态过程模拟和实时表达的地质时空数据模型研究[D].武汉:中国地质大学, 2015.
14 LIU Gang, WU Chonglong, HE Zhenwen, et al. Data model for geological spatiotemporal big data expression and storage management[J]. Bulletin of Geological Science and Technology, 2020, 39(1):164-174.
LIU Gang, WU Chonglong, HE Zhenwen, et al. Data model for geological spatiotemporal big data expression and storage management[J]. Bulletin of Geological Science and Technology, 2020, 39(1):164-174.
刘刚, 吴冲龙, 何珍文, 等.面向地质时空大数据表达与存储管理的数据模型研究[J].地质科技通报, 2020, 39(1):164-174.
刘刚, 吴冲龙, 何珍文, 等.面向地质时空大数据表达与存储管理的数据模型研究[J].地质科技通报, 2020, 39(1):164-174.
15 ZHANG Xueying, YE Peng, WANG Shu, et al. Geological entity recognition method based on Deep Belief Networks[J]. Acta Petrologica Sinica, 2018, 34(2):343-351.
ZHANG Xueying, YE Peng, WANG Shu, et al. Geological entity recognition method based on Deep Belief Networks[J]. Acta Petrologica Sinica, 2018, 34(2):343-351.
张雪英, 叶鹏, 王曙, 等.基于深度信念网络的地质实体识别方法[J].岩石学报, 2018, 34(2):343-351.
张雪英, 叶鹏, 王曙, 等.基于深度信念网络的地质实体识别方法[J].岩石学报, 2018, 34(2):343-351.
16 JONES C B, PURVES R S. Geographical information retrieval[J]. International Journal of Geographical Information Science, 2008, 22(3):219-228.
JONES C B, PURVES R S. Geographical information retrieval[J]. International Journal of Geographical Information Science, 2008, 22(3):219-228.
17 LIU Z J, TANG B Z, WANG X L, et al. CMedTEX: A rule-based temporal expression extraction and normalization system for chinese clinical notes[C]. AMIA Annual Symposium Proceedings, 2016:818-826.
LIU Z J, TANG B Z, WANG X L, et al. CMedTEX: A rule-based temporal expression extraction and normalization system for chinese clinical notes[C]. AMIA Annual Symposium Proceedings, 2016:818-826.
18 LIU Shuaishi, CHENG Xi, GUO Wenyan, et al.Progress report on new research in deep learning[J]. CAAI Transactions on Intelligent Systems, 2016, 11(5):567-577.
LIU Shuaishi, CHENG Xi, GUO Wenyan, et al.Progress report on new research in deep learning[J]. CAAI Transactions on Intelligent Systems, 2016, 11(5):567-577.
刘帅师, 程曦, 郭文燕, 等.深度学习方法研究新进展[J].智能系统学报, 2016, 11(5):567-577.
刘帅师, 程曦, 郭文燕, 等.深度学习方法研究新进展[J].智能系统学报, 2016, 11(5):567-577.
19 LE Xiaoqiu, YANG Chongjun, YU Wenyang. Spatial concept extraction based on spatial semantic role in natural language[J].Geomatics and Information Science of Wuhan University, 2005, 30(12):1 100-1 103.
LE Xiaoqiu, YANG Chongjun, YU Wenyang. Spatial concept extraction based on spatial semantic role in natural language[J].Geomatics and Information Science of Wuhan University, 2005, 30(12):1 100-1 103.
乐小虬, 杨崇俊, 于文洋.基于空间语义角色的自然语言空间概念提取[J].武汉大学学报:信息科学版, 2005, 30(12):1 100-1 103.
乐小虬, 杨崇俊, 于文洋.基于空间语义角色的自然语言空间概念提取[J].武汉大学学报:信息科学版, 2005, 30(12):1 100-1 103.
20 YANG Jinfeng, YU Qiubin, GUAN Yi, et al. An overview of research on electronic medical record oriented named entity recognition and entity relation extraction[J]. Acta Automatica Sinica, 2014, 40(8):1 537-1 562.
YANG Jinfeng, YU Qiubin, GUAN Yi, et al. An overview of research on electronic medical record oriented named entity recognition and entity relation extraction[J]. Acta Automatica Sinica, 2014, 40(8):1 537-1 562.
杨锦锋, 于秋滨, 关毅, 等.电子病历命名实体识别和实体关系抽取研究综述[J].自动化学报, 2014, 40(8):1 537-1 562.
杨锦锋, 于秋滨, 关毅, 等.电子病历命名实体识别和实体关系抽取研究综述[J].自动化学报, 2014, 40(8):1 537-1 562.
21 LAFFERTY J D, MCCALLUM A, PEREIRA F. Condictional random fields: Probabilistic models for segmenting and labeling sequence data[J]. International Conference on Machine Learning, 2001, 3(2):282-289.
LAFFERTY J D, MCCALLUM A, PEREIRA F. Condictional random fields: Probabilistic models for segmenting and labeling sequence data[J]. International Conference on Machine Learning, 2001, 3(2):282-289.
22 CHEN Jingwen, CHEN Jianguo, WANG Chengbin, et al. Research on segmentation of geological mineral text using conditional random fields[J]. China Mining Magazine, 2018, 27(9):907-922.
CHEN Jingwen, CHEN Jianguo, WANG Chengbin, et al. Research on segmentation of geological mineral text using conditional random fields[J]. China Mining Magazine, 2018, 27(9):907-922.
陈婧汶, 陈建国, 王成彬, 等.基于条件随机场的地质矿产文本分词研究[J].中国矿业, 2018, 27(9):907-922.
陈婧汶, 陈建国, 王成彬, 等.基于条件随机场的地质矿产文本分词研究[J].中国矿业, 2018, 27(9):907-922.
23 ZHOU Yongzhang, LI Peixing, WANG Shugong, et al. Research progress on big data and intelligent modelling of mineral deposits[J]. Bulletin of Mineralogy, Petrology and Geochemistry,2017,36(2):327-331,344.
ZHOU Yongzhang, LI Peixing, WANG Shugong, et al. Research progress on big data and intelligent modelling of mineral deposits[J]. Bulletin of Mineralogy, Petrology and Geochemistry,2017,36(2):327-331,344.
周永章,黎培兴,王树功, 等.矿床大数据及智能矿床模型研究背景与进展[J].矿物岩石地球化学通报,2017,36(2):327-331,344.
周永章,黎培兴,王树功, 等.矿床大数据及智能矿床模型研究背景与进展[J].矿物岩石地球化学通报,2017,36(2):327-331,344.
[1] 卢辉雄, 王永军, 汪冰, 张恩, 王瑞军, 李名松. 基于GIS的层次分析法在沽源地区铀成矿预测中的应用[J]. 地球科学进展, 2014, 29(8): 968-973.
[2] 郭华东. 数字地球:10年发展与前瞻[J]. 地球科学进展, 2009, 24(9): 955-962.
[3] 王卷乐,孙九林. 世界数据中心(WDC)回顾、变革与展望[J]. 地球科学进展, 2009, 24(6): 612-620.
[4] 万丽,王庆飞. 成矿元素品位有序数据集自仿射分形方法应用性评价[J]. 地球科学进展, 2007, 22(4): 357-361.
[5] 张耀南;韦五周;程国栋;杨海;景通桥. 寒区旱区特色数据集管理与共享应用[J]. 地球科学进展, 2005, 20(7): 717-723.
[6] 李庆谋. 多维分形克里格方法[J]. 地球科学进展, 2005, 20(2): 248-256.
[7] 刘刚,吴冲龙,汪新庆. 计算机辅助区域地质调查野外工作系统研究进展[J]. 地球科学进展, 2003, 18(1): 77-084.
[8] 肖 斌,赵鹏大,侯景儒. 地质统计学新进展[J]. 地球科学进展, 2000, 15(3): 293-296.
阅读次数
全文


摘要