首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于词向量和变分自动编码器的短文本主题模型
引用本文:张 青,韩立新,勾智楠.基于词向量和变分自动编码器的短文本主题模型[J].河北工业科技,2018,35(6):441-447.
作者姓名:张 青  韩立新  勾智楠
作者单位:河海大学计算机与信息学院,河海大学计算机与信息学院,河海大学计算机与信息学院
基金项目:江苏省研究生科研与实践创新计划项目(KYCX17_0486); 中央高校基本科研业务费专项资金(2017B708X14); 河北省人力资源社会保障课题(JRSHZ-2018-08018)
摘    要:为了解决短文本稀疏性问题,提高主题模型的性能,提出了一种词向量嵌入的主题模型。首先,假设一篇文档只包含一个主题;其次,利用词向量对每一轮迭代的主题进行扩充与调整,即对每一个主题,利用一种非参数化的概率采样方法得到一些词,再用词向量找出相似词,提升该主题下相似词的权重;最后,用拉普拉斯近似主题分布,使其更好地运用在变分自动编码器训练中,从而加快训练速度。实验结果表明,本文模型训练出的主题具有较好的解释性,并优于其他主流的模型,可为短文本的主题提取提供更多的可能。在主题模型训练的过程中,利用词向量干预主题词分布可以得到较好的主题质量,并可以通过变分自动编码器加快训练速度,对自然语言处理问题的研究具有一定的创新性和参考价值。

关 键 词:计算机神经网络  主题模型  词向量  变分自动编码器  短文本
收稿时间:2018/5/23 0:00:00
修稿时间:2018/9/12 0:00:00

Short text topic model based on word vector and variational autoencoder
ZHANG Qing,HAN Lixin and GOU Zhinan.Short text topic model based on word vector and variational autoencoder[J].Hebei Journal of Industrial Science & Technology,2018,35(6):441-447.
Authors:ZHANG Qing  HAN Lixin and GOU Zhinan
Abstract:In order to solve the problem of short text sparsity and improve the performance of the model, a topic model embedded by word vector is proposed. Firstly, that a document contains only one topic is supposed. Secondly, we use word vector to expand and adjust the theme of each iteration. That is to say, for each topic, we use a non-parametric probability sampling method to get some words, and then use word vector to find similar words, so as to enhance the weight of similar words under the topic. Finally, a Laplace approximation to the topic distribution is constructed, so that it is better trained by the variational autoencoder, thus speeding up the training speed. The experimental results show that the model has much more interpretable topics and outperforms other mainstream training models, thus providing more possibilities for the topic extraction of short text. In the process of thematic model training, the use of word vectors to interfere with the distribution of thematic words can achieve better quality of themes, and can speed up the training speed through the variational autoencoder, which has a certain innovation and reference value for the research of natural language processing.
Keywords:computer neural network  topic model  word vector  variational autoencoder  short text
本文献已被 CNKI 等数据库收录!
点击此处可从《河北工业科技》浏览原始摘要信息
点击此处可从《河北工业科技》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号