Google Corp : 한글 글꼴 설치(자연어 처리 예)

필요한 패키지 설치

!pip install konlpy!pip install 워드클라우드

한글 글꼴을 설치합니다.

인기글

! ! apt-get install fonts-nanumfont_path=macusr/share/fonts/truetype/nanum/NanumBarunGothic.ttf’ をインストールします

! ! apt-get install fonts-nanumfont_path=macusr/share/fonts/truetype/nanum/NanumBarunGothic.ttf’ をインストールします

구글.colab에서 드라이브를 가져옵니다.mount()/content/drive’)영화 리뷰의 자연어 처리입니다.## 판다를 pdimport matplotlib로 가져옵니다.워드클라우드 가져오기 워드클라우드의 pypltimport 컬렉션 보기1) 데이터를 로드합니다.df=pd.read_table(‘/content/drive/MyDrive/Data/ICT-Python-Data/ratings_train)。txt’,message=’cp949′)dfdf=pd.read_table(‘/content/drive/MyDrive/Data/ICT-Python-Data/ratings_train)。txt’,message=’cp949′)df2) 데이터 전처리 결측치 처리를 합니다동방. 당신이 노력해서 잴 수 있는(.민낯( )동방. 당신이 노력해서 잴 수 있는(.민낯( )df.dropna(inplace=True) #Null값이 존재하는 행을 삭제합니다.한글 정규화#한글과 공백을 제외하고 모두 제거하는 df[‘document’] = = df[‘document’].str.replace(‘[^ㄱ-ㅎが-ヒッ])#한글과 공백을 제외하고 모두 제거하는 df[‘document’] = = df[‘document’].str.replace(‘[^ㄱ-ㅎが-ヒッ])동방=지방[0:3000]지방형태소로 토큰화합니다from konlpy.tag import Oktokt=Okt()w_list=[] for prent in df[‘module’]: s_list=okt.pos(word) for word, tag in s_list:if tag in [‘Noun’, Adjective’]: : w_list.fession(word)= collections。Counter(w_list)tag=counts.most_common(50)tag# 분석결과 영화라는 단어가 가장 빈도수가 높다는 것을 알 수 있다。from konlpy.tag import Oktokt=Okt()w_list=[] for prent in df[‘module’]: s_list=okt.pos(word) for word, tag in s_list:if tag in [‘Noun’, Adjective’]: : w_list.fession(word)= collections。Counter(w_list)tag=counts.most_common(50)tag# 분석결과 영화라는 단어가 가장 빈도수가 높다는 것을 알 수 있다。3) 워드 클라우드 작성 및 시각화입니다.from wordcloud import WordCloudwc=WordCloud(font_path=message_path,background_color=’pink’)cloud=wc.generate_from_frequencys(dict(タグ))クラウドからfrom wordcloud import WordCloudwc=WordCloud(font_path=message_path,background_color=’pink’)cloud=wc.generate_from_frequencys(dict(タグ))クラウドからmatplotlib をインポートします。pyplot as pltplt.figure(figsize=(15,10))#plt.axis”off”)plt.imshow(cloud)관계없는 단어를 삭제합니다.w_list=[]stopword=df[‘정’, ‘왜’, ‘말갈’, 다’, ‘도’, ‘건’, ‘화’, ‘전’, 는’ 태그를 지정하면 “df” 안의 문장에 대하여 s_list=ok.pos; (tag) 을 지정합니다. w_list 태그로 지정하지 않을 경우: “df” 로 정지합니다.Counter(w_list)tag=most.most_common(50)タグw_list=[]stopword=df[‘정’, ‘왜’, ‘말갈’, 다’, ‘도’, ‘건’, ‘화’, ‘전’, 는’ 태그를 지정하면 “df” 안의 문장에 대하여 s_list=ok.pos; (tag) 을 지정합니다. w_list 태그로 지정하지 않을 경우: “df” 로 정지합니다.Counter(w_list)tag=most.most_common(50)タグwordcloud임포트 WordCloud임포트 matplotlib에서 취득합니다. pyplot as pltwc=WordCloud(font_path=slot_path, background_color=’white’, max_size=60)cloud=wc.generate_from_frequencies(tag)plt.figure(figsize=(15,10)plt.axis(off’)plt.imshow(cloud)cloud)wc.generenerate_facencys(vicenses(tag)plot(strateswordcloudインポートWordCloudインポートmatplotlibから取得します。pyplot as pltwc=WordCloud(font_path=slot_path, background_color=’white’, max_size=60)cloud=wc.generate_from_frequencies(tag))plt.figure(figsize=(15,10)plt.axis(off’)plt.imshow(cloud)cloud)wc.generenerate_facencys(vicenses(tag)plot(strates영화별 리뷰를 시각화합니다.konlpy.tag import 컬렉션에서 matplotlib 를 가져옵니다.pyplotas plt from wordcloud Import 팬더를 pd로 가져오기1) 파일을 불러옵니다df=pd.read_csv(‘/content/Drive/Data/ICT-Python-Data/comment_rank’df’)。df=pd.read_csv(‘/content/Drive/Data/ICT-Python-Data/comment_rank’df’)。2) 데이터 탐색df.info ()df.info ()3) 영화의 개수 확인df[‘movie’].유니크()df[‘movie’].유니크()4) 영화별 평점과 리뷰 수를 확인하다# 영화별 평점 mean=df.groupby(‘movie’)[‘rank].mean()grade=pd. Data Frame(mean)grade.sort_values(‘rank’,ascending=False) #평점이 높은 영화는 롱리브 더 킹과 매트릭스임을 알 수 있다.# 영화별 평점 mean=df.groupby(‘movie’)[‘rank].mean()grade=pd. Data Frame(mean)grade.sort_values(‘rank’,ascending=False) #평점이 높은 영화는 롱리브 더 킹과 매트릭스임을 알 수 있다.temp=df.groupby(‘∀’)[‘message’]。伯爵temp=df.groupby(‘∀’)[‘message’]。伯爵review=pd. Data Frame(temp) review.sort_values(‘comment’, ascending=False) # 리뷰가 많이 달린 영화는 평점이 가장 높은 영화가 아닌 ‘고양이 집사’임을 알 수 있다.review=pd. Data Frame(temp) review.sort_values(‘comment’, ascending=False) # 리뷰가 많이 달린 영화는 평점이 가장 높은 영화가 아닌 ‘고양이 집사’임을 알 수 있다.5) 영화 리뷰 형태소 분석cat_df=df[df[‘movie’] == “고양이 집사”] cat_df#고양이 집사 리뷰는 총 190개로 구성되어 있음을 알 수 있다.cat_df=df[df[‘movie’] == “고양이 집사”] cat_df#고양이 집사 리뷰는 총 190개로 구성되어 있음을 알 수 있다.from konlpy.tag import ocollectionsokt=Okt()w_list=[] for prent in cat_df[‘message’]: s_list=okt.pos(forse) for word: tag in [‘Noun’, Adjective’]: : w_list.collections(word)collections のタグ。Counter(w_list)tag=counts.most_common(50)tag# 가장 많이 언급되는 단어는 ‘아이’, ‘감동’, ‘사람’이라는 것을 알 수 있다。from konlpy.tag import ocollectionsokt=Okt()w_list=[] for prent in cat_df[‘message’]: s_list=okt.pos(forse) for word: tag in [‘Noun’, Adjective’]: : w_list.collections(word)collections のタグ。Counter(w_list)tag=counts.most_common(50)tag# 가장 많이 언급되는 단어는 ‘아이’, ‘감동’, ‘사람’이라는 것을 알 수 있다。# wordcloud의 Import WordCloudwc = = WordCloud(font_path=module_path、background_color=’white’)cloud=wc.generate_from_frequences(tag(タグ))から取得します# wordcloud의 Import WordCloudwc = = WordCloud(font_path=module_path、background_color=’white’)cloud=wc.generate_from_frequences(tag(タグ))から取得します# # matplotlib をインポートします。pyplot as pltplt.figure(figsize=(15,10))plt.axis(off’)plt.imshow(cloud)# # matplotlib をインポートします。pyplot as pltplt.figure(figsize=(15,10))plt.axis(off’)plt.imshow(cloud)## 최종분석내용#최종적으로어린이,사람,동물보호법,감동같은단어가가장많이 언급되었다는#을확인할수있습니다.이러한 결과를 봤을 때 ‘고양이 집사’는 어린이와 동물에 대한 #관련된 영화이면서 큰 감동을 주는 영화라는 것을 파악할 수 있습니다.## 최종분석내용#최종적으로어린이,사람,동물보호법,감동같은단어가가장많이 언급되었다는#을확인할수있습니다.이러한 결과를 봤을 때 ‘고양이 집사’는 어린이와 동물에 대한 #관련된 영화이면서 큰 감동을 주는 영화라는 것을 파악할 수 있습니다.## 최종분석내용#최종적으로어린이,사람,동물보호법,감동같은단어가가장많이 언급되었다는#을확인할수있습니다.이러한 결과를 봤을 때 ‘고양이 집사’는 어린이와 동물에 대한 #관련된 영화이면서 큰 감동을 주는 영화라는 것을 파악할 수 있습니다.