Ex_treme's blog.

文章搜索引擎————LDA算法整合

2018/04/04 Share

文章搜索引擎(四)

这一章主要是把LDA算法整合进去,非常简单,而后是对前端整体的一个实现和展示。

整合LDA算法

LDA算法实现

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

import json
import gensim
from article.models import ArticleModel

# relist 文章id的list
def getTheme(reslist):
fencelist = list()
for res in reslist:
id = res[0]
articles = ArticleModel.objects.get(id = id )
filefence = json.loads(articles.file_fence)
fencelist.append(filefence)
#字典{词的id:词的个数}
dictionary = gensim.corpora.Dictionary(fencelist)
#文档词频矩阵
corpus = [dictionary.doc2bow() for word in fencelist]

ldamodel = gensim.models.ldamodel.LdaModel(corpus,num_topics=3,id2word=dictionary,passes=20)
return ldamodel.print_topics(num_topics=1,num_words=3)

展示搜索段落结果

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

#获得文章中与关键词有关的段落
def GetWrodContent(reslist,keylist):
wordcontent = dict()
for res in reslist:
id = res[0]
article = ArticleModel.objects.get(id = id)
filepath = article.file_path
with open(filepath,'r',encoding='utf-8') as f:
text = f.read()
textlist = re.split('\n',text)

showlist = []
for texts in textlist:
for key in keylist:
if key in texts:
showlist.append(texts)
break

filename = article.file_name
wordcontent[filename] = showlist[0:5]

return wordcontent

views后台显示代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
class ArticleView(View):
def get(self,request):
return render(request,'index.html')


def post(self,request):
article_from = ArticleForm(request.POST)
if article_from.is_valid():
# initdata()
word = request.POST.get('keyword','')
keylist = word.split(' ')
reslist = tfidf(keylist)
if len(reslist) > 1:

theme = getTheme(reslist)
contentdict = GetWrodContent(reslist,keylist)
themelist = re.compile('"(.*?)"').findall(theme[0][1])
return render(
request,'index.html',{
'contentdict':contentdict,
'themelist':themelist,
}
)
return render(request,'index.html')

前端显示HTML代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
    <div class="theme">
主题
{% for info in themelist %}
<span>{{ info }}</span>&nbsp;&nbsp;&nbsp;
{% endfor %}
</div>
{% for key,value in contentdict.items %}
<div class="resultArea">结果显示区域
<div class="resultArea resultList itemHead">标题展示区域
{{ key }}
</div>
{% for content in value %}
<div class="resultArea resultList itemBody">正文展示区域
{{ content }}<br>
</div>
{% endfor %}
</div>
{% endfor %}
</div>

超级丑,因为没有样式,jquery有机会还是捡起来的好。。。。。。
image

CATALOG
  1. 1. 文章搜索引擎(四)
  2. 2. 整合LDA算法