首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 312 毫秒
1.
文本聚类是文本挖掘领域的一个重要研究分支,是聚类方法在文本处理领域的应用。本文首先对基于空间向量模型的文本聚类过程做了较深入的讨论和总结。另外,本文回顾了现有的文本聚类算法,以及常用的文本聚类效果评价指标。在研究了已有成果的基础上,本文利用20Newsgroup文本语料库,针对向量空间表示模型,在开源的数据挖掘平台WEKA上实现了文本预处理和k-means聚类算法,并根据实际聚类效果,就文本表示、特征选择、特征降维等方面提出优化方案。  相似文献   

2.
王凯丽 《价值工程》2010,29(13):182-183
随着互联网的普及和电子商务、个性化推荐技术等的发展,Web使用挖掘成为了数据挖掘的新的研究热点。针对Web用户会话聚类,提出了一种基于序列对集合的用户会话实时聚类方法。对聚类算法进行了分析与比较,给出了时空复杂度,实验比较了BOM算法与BOC算法的效率,并验证了BOC算法的有效性与时效性。  相似文献   

3.
对股票市场信息的文本挖掘   总被引:1,自引:0,他引:1  
面对股票市场上海量的信息,本文提出使用文本挖掘技术,在快速得到初步挖掘结果的基础上,应用上市公司财务评价指标体系和数据挖掘中的聚类的方法对其分析。文章结合股票市场信息的特征建立了文本挖掘的框架和挖掘流程,并举用22家上市公司的年报进行文本挖掘和聚类分析,给出了一种综合分析与评价上市公司财务状况和经营状况的方法。  相似文献   

4.
《价值工程》2018,(14):216-218
文本挖掘和文本可视化是计算机中重要应用技术,能够形象地高度概括文本信息中的核心内容,方便人们快速地理解和吸收文本中的核心思想。本文阐述文本挖掘预处理简要处理流程,然后阐述使用R软件进行文本挖掘,实现词项聚类、文本聚类、绘画词项云图、词项网络图等,找出其中隐藏的文本信息,并以可视化方式展现出来。最后对文本挖掘和文本可视化技术进行总结和展望。  相似文献   

5.
从Web服务器日志文件和客户交易数据中挖掘有意义的用户访问模式和潜在的客户群,有助于企业提供个性化信息服务和开展有针对性的电子商务活动。本文基于Web挖掘的聚类技术,提出了一种电子商务中个性化推荐系统的具体实现方案。  相似文献   

6.
本文针对当前企业财务信用问题,提出了基于模糊C均值聚类和综合评价相结合的财务等级分类方法。探讨了聚类分析方法在财务信用分类中的应用,包括模糊聚类方法和根据聚类中心进行综合评价研究两个方面。采用实际的数据进行了实证分析,表明该方法可根据实际需要获取很好的评价分类效果,而且根据聚类中心进行评价,能更清楚地反映信用等级。  相似文献   

7.
本文将系统论思想、模糊动态聚类法运用于上市公司壳资源质量评价,主要思路是:从技术创新、财务状况、管理能力、市场形象和社会效益5个方面建立评价指标,用解释结构模型方法构建评价指标层次结构,以古林法分配指标权重,将财务数据依据财务标准用分段函数处理,建立模糊相似关系矩阵,采用直接模糊动态聚类法获得分类结果。本文评价的阶段性及最终结果能够用于企业诊断、企业重组、区域资本市场规划等方面。  相似文献   

8.
论述了物流园区评价的重要性,介绍了德尔菲法的基本原理及德尔菲法在物流园区评价过程中应用:阐述了德尔菲法的优点以及弊端。在此基础上,提出基于模糊聚类分析法建立评价指标权重的构想,最后介绍模糊聚类分析法的基本原理和求解过程。  相似文献   

9.
信息检索主要研究大量文本的信息组织和检索,典型的信息检索问题是基于用户的输入来定位相关的文本,而文本的相关性是一个模糊的概念,为了对这个模糊的概念加以客观的度量,本文提出了一种新的文本间相关性的计算方法。该方法利用词频矩阵和模糊相似矩阵,在基于相关性检索的一组文档中,使用模糊聚类中的传递闭包法计算出一组文档间的相关度,最后用一个实例加以说明,得到了比较客观的结果。  相似文献   

10.
姜宁  赵庆祯 《价值工程》2007,26(2):23-26
模糊模型识别、模糊综合评价和模糊聚类是当前模糊数学应用比较广泛的三个领域。由于客观事物本身的模糊性,加上人们对客观事物的反映过程也产生模糊性,使得经典的识别和评价方法越来越不能满足客观实际的要求,于是模糊识别及评价随之产生并发展起来。文中介绍了模糊模型识别及其模糊综合评价决策的方法和研究现状,结合现实生活中的实际问题对二者的应用加以说明。对在模糊模型识别过程中所用到的模糊聚类分析问题加以研究、概括,并利用SPSS软件来获得聚类结果,以使文中的理论更加充实、完善。  相似文献   

11.
Textual data has become increasingly common in business analytic data sets. While concept-based text mining offers a method of extracting meaningful information from text data, methods for monitoring of customer perceptions of business processes and products that are discussed in customer-generated documents are not immediately available. We explore the results of two text-mining algorithms and review issues observed in the data that affect uploading the results onto a newly proposed methodological monitoring platform analogous to statistical process control charts. Finally, we discuss several topics for future research in text mining.  相似文献   

12.
论网络环境下会计职能重心的转移   总被引:1,自引:0,他引:1  
网络技术带来了企业组织环境、生产环境、物流环境、管理环境等一系列环境的变迁,进而使会计环境也发生巨大改变。本文以此为出发点,首先阐述了传统会计职能的重心及其局限性,进而分析网络技术对会计职能产生的巨大影响,最后提出网络环境下会计职能重心的转移及其职能作用发挥形式。  相似文献   

13.
在充分调研基础上,建立了中小企业信用评价指标体系,将灰色统计理论中的三角白化权函数聚类决策方法应用到中小企业信用评价工作中,并通过重新设计灰色聚类决策步骤,克服了三角白化权函数聚类系数不规范可能带来的聚类错误或影响类内排序问题。经实例分析表明,该方法可使评估结果更为客观、可靠,更有利于作出正确的决策。  相似文献   

14.
Information explosion is a critical challenge to the development of modern information systems. In particular, when the application of an information system is over the Internet, the amount of information over the web has been increasing exponentially and rapidly. Search engines, such as Google and Baidu, are essential tools for people to find the information from the Internet. Valuable information, however, is still likely submerged in the ocean of search results from those tools. By clustering the results into different groups based on subjects automatically, a search engine with the clustering feature allows users to select most relevant results quickly. In this paper, we propose an online semantics-based method to cluster Chinese web search results. First, we employ the generalised suffix tree to extract the longest common substrings (LCSs) from search snippets. Second, we use the HowNet to calculate the similarities of the words derived from the LCSs, and extract the most representative features by constructing the vocabulary chain. Third, we construct a vector of text features and calculate snippets’ semantic similarities. Finally, we improve the Chameleon algorithm to cluster snippets. Extensive experimental results have shown that the proposed algorithm has outperformed over the suffix tree clustering method and other traditional clustering methods.  相似文献   

15.
In today's business environment, enterprises are increasingly under pressure to process the vast amount of data produced everyday within enterprises. One method is to focus on the business intelligence (BI) applications and increasing the commercial added-value through such business analytics activities. Term weighting scheme, which has been used to convert the documents as vectors in the term space, is a vital task in enterprise Information Retrieval (IR), text categorisation, text analytics, etc. When determining term weight in a document, the traditional TF-IDF scheme sets weight value for the term considering only its occurrence frequency within the document and in the entire set of documents, which leads to some meaningful terms that cannot get the appropriate weight. In this article, we propose a new term weighting scheme called Term Frequency – Function of Document Frequency (TF-FDF) to address this issue. Instead of using monotonically decreasing function such as Inverse Document Frequency, FDF presents a convex function that dynamically adjusts weights according to the significance of the words in a document set. This function can be manually tuned based on the distribution of the most meaningful words which semantically represent the document set. Our experiments show that the TF-FDF can achieve higher value of Normalised Discounted Cumulative Gain in IR than that of TF-IDF and its variants, and improving the accuracy of relevance ranking of the IR results.  相似文献   

16.
This paper proposes an approach for creating and utilizing keyword-based patent maps for use in new technology creation activity. The proposed approach comprises the following sub-modules. First, text mining is used to transform patent documents into structured data to identify keyword vectors. Second, principal component analysis is employed to reduce the numbers of keyword vectors to make suitable for use on a two-dimensional map. Third, patent ‘vacancies’, defined as blank areas in the map that are sparse in patent density but large in size, are identified. The validity of the vacancy is then tested against such criteria as technological criticality and technological trends. If a vacancy is judged as meaningful, its technological features are investigated in detail to identify the potential for new technology creation. The procedure of the proposed approach is described in detail by employing an illustrative patent database and is implemented into an expert system for new technology creation.  相似文献   

17.
To foresee the advent of new technologies and their socio-economic impact is a necessity for academia, governments and private enterprises as well. In the future studies, the identification of future signal is one of the renowned techniques for analysis of trends, emerging issue, and gaining future insights. In the Big Data era, recent scholars have proposed using a text mining procedure focusing upon web data such as new social media and academic papers. However, the detection of future signals is still under a developing area of research, and there is much to improve existing methodology as well as developing theoretical foundations. The present study reviews previous literature on identifying emerging issue based on the weak signal detection approach. Then the authors proposed a revised framework that incorporate quantitative and qualitative text mining for assessing the strength of future signals. The authors applied the framework to the case study on the ethical issues of artificial intelligence (hereafter AI). From EBSCO host database, the authors collected text data covering the ethical issues in AI and conducted text mining analysis. Results reveal that emerging ethical issues can be classified as strong signal, weak signal, well-known but not so strong signal, and latent signal. The revised methodology will be able to provide insights for government and business stakeholders by identifying the future signals and their meanings in various fields.  相似文献   

18.
谷洪雁 《价值工程》2014,(19):318-319
灰色聚类法把城市压力管道系统当作一个灰色系统来研究,将灰色聚类理论应用于压力管道受灾程度的评价中,以我国压力管道为例,建立不同的聚类指标所拥有的隶属函数,确定聚类权重值,求得最大聚类系数及聚类结果,得出用灰色聚类法进行压力管道损失评价,既能体现事故损失等级界线的模糊性,又能充分利用信息,避免了模糊数学法的不合理之处。因而,灰色聚类法是一种简便、客观、可靠的方法。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号