地球科学进展 ›› 2011, Vol. 26 ›› Issue (4): 449 -459. doi: 10.11867/j.issn.1001-8166.2011.04.0449

研究简报 上一篇    下一篇

Web时空数据挖掘研究进展
孙嘉 1,2,裴韬 3,龚玺 2,3,周成虎 1,3   
  1. 1. 中国科学院烟台海岸带研究所,山东烟台264003; 2. 中国科学院研究生院,北京100049;3. 中国科学院地理科学与资源研究所,资源与环境信息系统国家重点实验室,北京100101
  • 收稿日期:2010-10-28 修回日期:2011-02-20 出版日期:2011-04-10
  • 通讯作者: 孙嘉 E-mail:sunnie.nju@gmail.com
  • 基金资助:

    中国科学院青年人才项目“面向时空轨迹数据的知识发现”(编号:KZCX2-YW-QN303);中国科学院地理科学与资源研究所自主部署创新项目“时空轨迹数据的模式挖掘”(编号:200905004);国家高技术研究发展计划项目“非结构化应急多媒体数据挖掘”(2009AA12Z227)资助.

Review of Research Progress in Web Spatio temporal Data Mining

Sun Jia 1,2,Pei Tao 3,Gong Xi 2,3,Zhou Chenghu 1,3   

  1. 1.Yantai Institute of Coastal Zone Research, Chinese Academy of Sciences, Yantai264003, China; 2.Graduate University of Chinese Academy of Sciences, Beijing100049, China; 3. Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing100101, China
  • Received:2010-10-28 Revised:2011-02-20 Online:2011-04-10 Published:2011-04-10

随着互联网的迅速发展,Web已经渗透到人类社会的各个角落,其中蕴含着大量关系社会、经济和生活的信息。从中挖掘出刻画事件时空范围的时空信息,可以为探索社会、自然事件以及行为主体的时空运动规律和知识提供丰富的素材。系统综述了Web时空数据挖掘的理论、方法和应用,首先介绍了Web时空数据挖掘的概念及分类,详细阐述了Web时空信息的特点和提取方法,其次针对3类Web时空数据挖掘的内容、方法及应用进行了综述,最后探讨了Web时空数据挖掘面临的难题、研究热点和未来领域的发展方向。

Web Spatio-temporal information describs the spatio-temporal scope of events or actors. One could find spatio-temporal knowledge such as the services scope of network resources, geographical distribution of search behavior, and the web page-based disaster description. This paper systematically reviews the web spatio-temporal data mining technology and services. Firstly, this paper introduces the unique characteristics of web spatio-temporal data, discusses the methods of web spatio-temporal information extraction. Then,it introduces each type of web spatio-temporal data mining methods. Finally, some challenges and future directions are discussed. 

中图分类号: 

[1]Goodchild M F. Citizens as sensors: The world of volunteered geography [J].GeoJournal,2007, 69(4):211-221.
[2]Klösgen W, Zytkow J. Handbook of Data Mining and Knowledge Discovery [M]. Oxford: Oxford University Press,2002.
[3]Ester M, Kriegel H P, Sander J. Spatial data mining: A database approach [C]∥Proc. of the Fifth Int. Symposium on Large Spatial Databases (SSD 97). Berlin, Germany,1997:47-66. [ZK)]
[4] Han J W, Koperski K, Stefanovic N. GeoMiner: A system prototype for spatial data mining [C]∥Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data. New York, NY, USA: ACM, 1997.
[5] Li Deren, Wang Shuliang, Li Deyi. Spatial Data Mining Theories and Applications [M].Beijing: Science Press, 2006. [李德仁, 王树良, 李德毅.空间数据挖掘理论与应用 [M].北京:科学出版社,2006.]
[6]Li Deren, Wang Shuliang, Shi Wenzhong, et al. On spatial data mining and knowledge discovery [J]. Geomatics and Information Science of Wuhan University, 2001,26(6):491-499. [李德仁, 王树良, 史文中,等. 论空间数据挖掘和知识发现 [J]. 武汉大学学报:信息科学版, 2001,26(6):491-499.]
[7]Li Deren, Wang Shuliang, Li Deyi, et al. Theories and technologies of spatial data mining and knowledge discovery [J].Geomatics and Information Science of Wuhan University,2002,27(3):221-233. [李德仁,王树良,李德毅,等. 论空间数据挖掘和知识发现的理论与方法 [J]. 武汉大学学报:信息科学版, 2002,27(3):221-233.]
[8]Kosala R, Blockeel H. Web mining research:A survey [J].ACM SIGKDD Explorations Newsletter,2001, 2(1):1-15.
[9] Han Jiawei, Meng Xiaofeng, Wang Jing, et al. Research on Web mining: A survey [J]. Journal of Computer Research and Development,2001, 38(4):405-414. [韩家炜, 孟小峰, 王静,等. Web挖掘研究[J]. 计算机研究与发展, 2001 ,38(4):405-414.]
[10] Cooley R, Mobasher B, Srivastava J. Web mining: Information and pattern discovery on the World Wide Web [C]∥9th International Conference on Tools with Artificial Intelligence (ICTAI '97), 1997:558-567.
[11]Buyukkokten O, Cho J, Garcia-molina H, et al. Exploiting geographical location information of web pages [C]∥Proceeding of the ACM SIGMOD Workshop on the Web and Databases 1999(Web DB′99), Philadelphia, Pennsylvania: [s.n], 1999:1-18.
[12]Wang C, Xie X, Wang L, et al. Detecting geographic locations from web resources [C]∥Proceeding of the 2005 Workshop on Geographic Information Retrieval. New York, NY, USA: ACM, 2005:17-24.
[13]Silva M J, Martins B, Chaves M, et al. Adding geographic scopes to web resources [J]. Computers, Environment and Urban Systems,2005, 30(4): 378-399.
[14]Mei Q, Liu C, Su H, et al. A probabilistic approach to spatiotemporal theme pattern mining on Weblogs [C]∥Proceeding of the 15th International Conference on World Wide Web. New York, NY, USA: ACM, 2006:533-542. 
[15] Sakaki T, Okazaki M, Matsno Y. Earthquake shakes Twitter users: Real-time event detection by social sensors [C]∥Proceeding of the 19th International Conference on World Wide Web. New York, NY, USA: ACM, 2010:851-860.
[16] Culotta A. Towards detecting influenza epidemics by analyzing Twitter messages [C]∥Proceeding of the 1st Workshop on Social Media Analytics(SOMA′10). New York, NY, USA: ACM, 2010:32-45.
[17]Mehler A, Bao Y, Li X, et al. Spatial analysis of new sources [J].IEEE Transactions on Visualization and Computer Graphics,2006, 12(5): 765-772.
[18]Xu K, Li R, Bao S H, et al. SEM: Mining spatial events from the Web [C]∥Proceeding of the 12th PacificAsia Conference on Advances in Knowledge Discovery and Data Mining. Berlin, Heidelberg: SpringerVerlag, 2008:393-404.
[19] Kurashima T, Tezuka T, Tanaka K. Mining and visualizing local experiences from Blog entries [C]∥Proceeding of the 17th International Conference on Database and Expert Systems Applications. Berlin, Heidelberg: SpringerVerlag, 2006, 4 080:213-222.
[20]Kurashima T, Tezuka T, Tanaka K. Blog map of experiences: Extracting and geographically mapping visitor experiences from Urban Blogs [C]∥International Conference on Web Information Systems Engineering. Berlin, Heidelberg: SpringerVerlag, 2005, 3 806:496-503.
[21]Christopher B J, Alia I A, David F, et al. The SPIRIT spatial search engine: Architecture, ontologies and spatial indexing [J]. Geographic Information Science, 2004,3 234:125-139.
[22]Jones C B, Alani H, Tudhope D. Geographical information retrieval with ontologies of place [C]∥Proceeding of the International Conference on Spatial Information Theory: Foundations of Geographic Information Science. Morro Bay, CA, USA: Springer, 2001, 322-335.
[23]Borges K A V. Use of an Ontology of Urban Places for Recognition and Extraction of Geospatial Evidences on the Web [D]. Federal University of Minas Gerais,2006.
[24] Larson R R. Geographic information retrieval and spatial browsing [C]∥Geographic Information Systems and Libraries: Patrons, Maps, and Spatial Information,1996:81-124.
[25]Makowetz A, Chen Y, Suel T, et al. Design and Implementation of a Geographic Search Engine, TR-CIS-2005-03 [R]. New York: Polytechnic University, Brooklyn, 2005.
[26] Ginsberg J, Mohebbi M H, Patel R S, et al. Detecting influenza epidemics using search engine query data [J].Nature,2009, 457: 1 012-1 015.
[27]Goel S, Hofman J M, Lahaie S, et al. What can search predict [C]∥www.2010, April 26-30, 2010, Raleigh, North Carolira, 2010.
[28] Ettredge M, Gerdes J, Karuga G. Using Web-based search data to predict macroeconomic statistics [J]. Communications of the ACM, 2005, 48(11): 87-92.
[29] D′amuri F, Marcucci J. “Google it!” Forecasting the US Unemployment Rate with a Google Job Search Index [R]. Bank of Italy, 2009.
[30]Askitas N, Zimmermann K F. Google econometrics and unemployment forecasting [J]. Applied Economics Quarterly, 2009, 55(2):107-120.
[31]Quincey E D, Kostkova P. Early warning and outbreak detection using social networking websites: The potential of Twitter [J]. Electronic Healthcare, 2010, 27(2): 21-24.
[32]Polgreen P M, Chen Y, Pennock D M, et al. Using internet searches for influenza surveillance [J].Clinical Infectious Disease, 2008, 47(11): 1 443-1 448.
[33]Eysenbach G. Infodemiology: Tracking flu-related searches on the web for syndromic surveillance [C]∥AMIA Annual Symposium Proceedings,2006:244-248.
[34] Pasley R, Clough P, Purves R S,et al. Mapping geographic coverage of the web [C]∥Proceeding of the 16th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems. New York, NY, USA: ACM, 2008:1-20.
[35] Alonso O, Gertz M, Baeza-Yates R. On the value of temporal information in information retrieval [C]∥ACM SIGIR Forum. New York, NY, USA: ACM, 2007,41(2):35-41.
[36]Srivastava J, Cooley R, Deshpande M, et al. Web usage mining: Discovery and applications of usage patterns from web data [J].ACM SIGKDD Explorations Newsletter,2000, 1(2):12-23.
[37]Ling X, Weld D S. Temporal information extraction [C]∥Proceedings of the Twenty Fifth National Conference, 2010:1-6.
[38]Allen J. Maintaining knowledge about temporal intervals [J]. Communications of the ACM,1983, 26(11):832-843.
[39] Verhagen M, Gaizauskas R, Schilder F, et al. Semeval-2007 task 15: Tempeval temporal relation identification [C]∥4th International Workshop on Semantic Evaluations, Stroudsburg. PA, USA, 2007:75-80.
[40] Schockaert S. Reasoning about Fuzzy Temporal and Spatial Information from the Web [M]. Ghent: Ghent University, 2008.
[41]Amitay E, Har′el N, Sivan R, et al. Web a where: Geotagging web content [C]∥Proceeding of the 27th annual international ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR′04). New York, NY, USA: ACM, 2004: 273-280.
[42]Mccurley K S. Geospatial mapping and navigation of the web[C]∥Proceeding of the 10th International World Wide Web Conference (WWW′10). New York, NY, USA: ACM, 2005: 221-229.
[43]Borges K A V, Laender A H F, Medeiros C B, et al. Discovering geographic locations in Web pages using urban addresses [C]∥Proceeding of the 4th ACM Workshop on Geographical Information retrieval. New York, NY, USA: ACM, 2007:31-36.
[44]Li H, Srihari R K, Niu C, et al. Location normalization for information extraction [C]∥Proceeding of the 19th CoLING. Morristown, NJ, USA: Association for Computational Linguistics, 2002:1-7.
[45]Tezuka T, Tanaka K. Landmark extraction: A Web mining approach [J]. Spatial Information Theory,2005,3 639(2 005):379-396.
[46]Rauch E, Bukatin M, Baker K. A confidence-based framework for disambiguating geographic terms [C]∥Proceeding of the HLT NAACL 2003 Workshop on Analysis of Geographic References. Morristown, NJ, USA: Association for Computational Linguistics, 2003:50-54.
[47]Li H, Srihari R K, Niu C, et al. InfoXtract location normalization:A hybrid approach to geographic references in information extraction [C]∥Proceeding of the HLTNAACL 2003 Workshop on the Analysis of Geographic References. Morristown, NJ, USA: Association for Computational Linguistics, 2003, 1:39-44.
[48]Burger J D, Henderson J C, Morgan W T. Statistical named entity recognizer adaptation [C]∥Proceedings of CoNLL2002. Morristown, NJ, USA: Association for Computational Linguistics, 2002:163-166.
[49]Malouf R. Markov models for language-independent named entity recognition [C]∥Proceeding of CoNLL2002. Morristown, NJ, USA: Association for Computational Linguistics, 2002:187-190. [
[50]McNamee P, Mayfield J. Entity extraction without languagespecific resources [C]∥Proceeding of CoNLL2002. Morristown, NJ, USA: Association for Computational Linguistics, 2002:183-186.
[51]Delboni M T, Borges K A V, Laender A H F, et al. Semantic expansion of geographic web queries based on natural language positioning expressions [J]. Transactions in GIS,2007, 11(3):377-397.
[52]Ding J,Gravano L, Shivakumar N. Computing geographical scopes of web resources [C]∥Proceeding of the 26th International Conference on Very Large Data Bases (VLDB′00). San Francisco, CA, USA: Morgan Kaufmann Publishers Inc. 2000: 545-556.
[53]Li Deren, Cui Wei. Geographic ontology and SIMG [J]. Acta Geodaetica et Cartographica Sinaica, 2006, 35(2):144-148. [李德仁, 崔巍. 地理本体与空间信息多级网格 [J]. 测绘学报, 2006,35(2): 144-148.]
[54]William S, Austin T.Ontologies [J]. IEEE Intelligent System,1999, 1(2):18-19.
[55]Wong T L, Lam W. Learning to refine ontology for a new Web site using a bayesian approach [C]∥Proceedings of the Fifth SIAM International Conference on Data Mining. 2007, 7(1):1-12.
[56]Eirinaki M, Vazirgiannis M, Varlamis I. SEWeP: Using site semantics and a taxonomy to enhance the Web personalization process [C]∥Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,2003:1-10. 
[57] Pei J, Han J W, Mortazavi-Asl B. Mining access patterns efficiently from web logs [J]. Knowledge Discovery and Data Mining,2000, 2000(1 805):396-407.
[58]Brownstein J S, Freifeld C C, Madoff L C. digital Disease detectionharnessing the Web for public health surveillance [J].The New England Journal of Medicine, 2009, 360: 3 153-3 159.
[59]Backstrom L, Kleinberg J, Kumar R, et al. Spatial variation in search engine queries [C]∥Proceeding of the 17th International Conference on World Wide Web. New York, NY, USA: ACM, 2008: 357-366.
[60]Li Deren, Shao Zhenfeng. A new era of geographic information [J]. Science in China(Series F),2009, 39(6):579-587. [李德仁, 邵振峰. 论新地理信息时代 [J]. 中国科学:F辑, 2009, 39(6):579-587.]
[61]hen Feixiang, Yang Chongjun, Shen Shengli, et al. Research on mobile GIS based on LBS [J]. Computer Engineering and Applications, 2006, 42(2):200-210. [陈飞翔, 杨崇俊,申胜利,等. 基于LBS的移动GIS研究 [J]. 计算机工程与应用, 2006, 42(2):200-210.]
[62]Meloan S. Toward a global “Internet of Things” [J].Sun Developer Network,2003:203-227.

[1] 何彬彬,崔莹,陈翠华,陈建华. 基于地质空间数据挖掘的区域成矿预测方法[J]. 地球科学进展, 2011, 26(6): 615-623.
[2] 金亚秋. 复杂自然环境时空定量信息的获取与融合处理的理论与应用[J]. 地球科学进展, 2007, 22(2): 111-125.
阅读次数
全文


摘要