白丝美女被狂躁免费视频网站,500av导航大全精品,yw.193.cnc爆乳尤物未满,97se亚洲综合色区,аⅴ天堂中文在线网官网

Information processing device, learning method, and storage medium

專利號
US11176327B2
公開日期
2021-11-16
申請人
FUJITSU LIMITED(JP Kawasaki)
發(fā)明人
Yuji Mizobuchi
IPC分類
G06F40/58; G06F40/30; G06F16/00; G06F40/45; G06F40/216; G06F40/284; G06N20/00
技術(shù)領(lǐng)域
word,learning,language,words,parameter,in,section,target,space,vector
地域: Kawasaki

摘要

A non-transitory computer-readable storage medium storing a program that causes a computer to execute a process, the process includes learning distributed representations of words included in a word space of a first language using a learner for learning the distributed representations; classifying words included in a word space of a second language different from the first language into words common to words included in the word space of the first language and words not common to words included in the word space of the first language; and replacing distributed representations of the common words included in the word space of the second language with distributed representations of the words, corresponding to the common words, in the first language and adjusting a parameter of the learner.

說明書

CROSS-REFERENCE TO RELATED APPLICATION

This application is a continuation application of International Application PCT/JP2016/079545 filed on Oct. 4, 2016 and designated the U.S., the entire contents of which are incorporated herein by reference.

FIELD

The embodiment discussed herein is related to an information processing device, a learning method, and a storage medium.

BACKGROUND

It is important to produce highly accurate representations of words in text processing, and a number of studies have been conducted. In recent years, as a technique for producing representations of words, Word2Vec is known (refer to, for example, Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient Estimation of Word Representations in Vector Space. In Proceedings of Workshop at ICLR, 2013).

Word2Vec is a system for producing distributed representations of words based on the idea (distribution hypothesis) that similar words appear in similar sentences. The distributed representations of the words are expressed by vectors indicating semantic relationships between the words. Word2Vec uses a supervised learning method for a neural network composed of an input layer, a hidden layer, and an output layer to learn relationships between a given word appearing in a sentence and words in the vicinity of the given word and a distributed representation of the given word. For Word2Vec, the Continuous Bag-of-Words model and the Skip-gram model have been proposed. The Skip-gram model inputs a vector corresponding to a given word and predicts a word in the vicinity of the given word.

權(quán)利要求

1
What is claimed is:1. A non-transitory computer-readable storage medium storing a program that causes a computer to execute a process, the process comprising:causing a learner to learn distributed representations of words included in a word space of a first language;classifying words included in a word space of a second language different from the first language into common words common to words included in the word space of the first language and uncommon words not common to words included in the word space of the first language, language resources of the word space of the first language being larger than the language resources of the word space of the second language;replacing distributed representations of the common words included in the word space of the second language with distributed representations of the words included in the word space of the first language corresponding to the common words;adjusting a weight for obtaining an output result based on the replaced distributed representations to be used in the learner;inputting the adjusted weight to the learner; andcausing the learner to learn distributed representations of the uncommon words among the words included in the word space of the second language.2. The storage medium according to claim 1, the process further comprising:outputting the distributed representations of the common words and the distributed representations of the uncommon words as a result of the learning.3. The storage medium according to claim 1,wherein the learner learns the distributed representations of the words included in the word space of the first language and the distributed representations of the uncommon words included in the word space of the second language using Skip-gram model of Word2Vec.4. The storage medium according to claim 3, wherein the replacing includesreplacing a hidden layer of the Skip-gram model with the distributed representations of the words included in the word space of the first language corresponding to the common words.5. The storage medium according to claim 4,wherein the adjusting includes adjusting the weight that is a parameter between the hidden layer and an output layer of the Skip-gram model.6. The storage medium according to claim 1, wherein the classifying includes:executing morphological analysis on a corpus for learning the second language,outputting words indicated by results of the morphological analysis,using the words indicated by the results of the morphological analysis and an alignment dictionary to acquire correspondence relationships between words in the second language and words in the first language, andclassifying the words included in the word space of the second language based on the acquired correspondence relationships.7. An information processing device comprising:a memory; anda processor coupled to the memory and configured to:cause a learner to learn distributed representations of words included in a word space of a first language;classify words included in a word space of a second language different from the first language into common words common to words included in the word space of the first language and uncommon words not common to words included in the word space of the first language, language resources of the word space of the first language being larger than the language resources of the word space of the second language;replace distributed representations of the common words included in the word space of the second language with distributed representations of the words included in the word space of the first language corresponding to the common words;adjust a weight for obtaining an output result based on the replaced distributed representations to be used in the learnerinput the adjusted weight to the learner; andcause the learner to learn distributed representations of the uncommon words among the words included in the word space of the second language.8. The information processing device according to claim 7, wherein the processor is configured to:output the distributed representations of the common words and the distributed representations of the uncommon words as a result of learning the distributed representations of the uncommon words.9. The information processing device according to claim 7,wherein the learner learns the distributed representations of the words included in the word space of the first language and the distributed representations of the uncommon words included in the word space of the second language using Skip-gram model of Word2Vec.10. A learning method to be executed by a computer, the learning method comprising:causing a learner to learn distributed representations of words included in a word space of a first language;classifying words included in a word space of a second language different from the first language into common words common to words included in the word space of the first language and uncommon words not common to words included in the word space of the first language, language resources of the word space of the first language being larger than the language resources of the word space of the second language;replacing distributed representations of the common words included in the word space of the second language with distributed representations of the words included in the word space of the first language corresponding to the common words;adjusting a weight for obtaining an output result based on the replaced distributed representations to be used in the learner;inputting the adjusted weight to the learner; andcausing the learner to learn distributed representations of the uncommon words among the words included in the word space of the second language.
微信群二維碼
意見反饋