白丝美女被狂躁免费视频网站,500av导航大全精品,yw.193.cnc爆乳尤物未满,97se亚洲综合色区,аⅴ天堂中文在线网官网

Systems and methods for generating recitation items

專利號
US10867525B1
公開日期
2020-12-15
申請人
Educational Testing Service(US NJ Princeton)
發(fā)明人
Su-Youn Yoon; Lei Chen; Keelan Evanini; Klaus Zechner
IPC分類
G09B19/04; G10L13/08; G06F40/211
技術(shù)領(lǐng)域
text,phoneme,prosodic,metric,recitation,phonetic,texts,syntactic,native,language
地域: NJ NJ Princeton

摘要

Computer-implemented systems and methods are provided for automatically generating recitation items. For example, a computer performing the recitation item generation can receive one or more text sets that each includes one or more texts. The computer can determine a value for each text set using one or more metrics, such as a vocabulary difficulty metric, a syntactic complexity metric, a phoneme distribution metric, a phonetic difficulty metric, and a prosody distribution metric. Then the computer can select a final text set based on the value associated with each text set. The selected final text set can be used as the recitation items for a speaking assessment test.

說明書

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a divisional application of U.S. patent application Ser. No. 14/215,124, filed on Mar. 17, 2014, entitled “Systems and Methods for Generating Recitation Items,” which claims priority to U.S. Provisional Patent Application No. 61/802,904, filed Mar. 18, 2013, entitled “An Automated Recitation Item Generation Method,” both of which are herein incorporated by reference in their entireties.

FIELD

The technology described herein relates generally to text selection and more specifically to automatic generation of recitation items for speaking assessment.

BACKGROUND

Assessment of a person's speaking proficiency is often performed in education and in other domains. Such speaking assessment typically takes the form of texts (e.g., essays, passages, articles, etc.) being presented to and read by the person being assessed. The texts used in the assessments are usually selected from a large pool of texts collected from diverse resources (e.g., textbooks, journals, websites, and manual generation). The selection process, however, is often performed manually, which is costly, time-consuming, and lacks objectivity.

SUMMARY

權(quán)利要求

1
It is claimed:1. A computer-readable medium encoded with instructions for commanding one or more data processors to execute steps of a method of generating recitation items in a target language, the method comprising:extracting a target language phoneme distribution from a target language corpus, wherein the target language corpus is in the target language;extracting a first native language phoneme distribution from a first native language corpus, wherein the first native language corpus is in a first native language of one or more users;extracting a second native language phoneme distribution from a second native language corpus, wherein the second native language corpus is in a second native language of one or more users;comparing the target language phoneme distribution to the first native language phoneme distribution to identify any first phoneme n-gram whose frequency of appearing in the target language corpus satisfies a first predetermined criterion and whose frequency of appearing in the first native language corpus satisfies a second predetermined criterion, wherein any such identified first phoneme n-gram is included in first list;comparing the target language phoneme distribution to the second native language phoneme distribution to identify any second phoneme n-gram whose frequency of appearing in the target language corpus satisfies a third predetermined criterion and whose frequency of appearing in the second native language corpus satisfies a fourth predetermined criterion, wherein any such identified second phoneme n-gram is included in second list;accessing one or more candidate recitation items; andgenerating recitation items based on candidate recitation items having any phoneme n-gram that appears in both the first list and the second list.2. A system for performing a method of generating recitation items in a target language, the system comprising:one or more data processors;a computer-readable medium encoded with instructions for commanding the one or more data processors to execute steps, the steps comprising:extracting a target language phoneme distribution from a target language corpus, wherein the target language corpus is in the target language;extracting a first native language phoneme distribution from a first native language corpus, wherein the first native language corpus is in a first native language of one or more users;extracting a second native language phoneme distribution from a second native language corpus, wherein the second native language corpus is in a second native language of one or more users;comparing the target language phoneme distribution to the first native language phoneme distribution to identify any first phoneme n-gram whose frequency of appearing in the target language corpus satisfies a first predetermined criterion and whose frequency of appearing in the first native language corpus satisfies a second predetermined criterion, wherein any such identified first phoneme n-gram is included in first list;comparing the target language phoneme distribution to the second native language phoneme distribution to identify any second phoneme n-gram whose frequency of appearing in the target language corpus satisfies a third predetermined criterion and whose frequency of appearing in the second native language corpus satisfies a fourth predetermined criterion, wherein any such identified second phoneme n-gram is included in second list;accessing one or more candidate recitation items; andgenerating recitation items based on candidate recitation items having any phoneme n-gram that appears in both the first list and the second list.3. A computer-implemented method of generating recitation items in a target language, the method comprising:extracting a target language phoneme distribution from a target language corpus, wherein the target language corpus is in the target language;extracting a first native language phoneme distribution from a first native language corpus, wherein the first native language corpus is in a first native language of one or more users;extracting a second native language phoneme distribution from a second native language corpus, wherein the second native language corpus is in a second native language of one or more users;comparing the target language phoneme distribution to the first native language phoneme distribution to identify any first phoneme n-gram whose frequency of appearing in the target language corpus satisfies a first predetermined criterion and whose frequency of appearing in the first native language corpus satisfies a second predetermined criterion, wherein any such identified first phoneme n-gram is included in first list;comparing the target language phoneme distribution to the second native language phoneme distribution to identify any second phoneme n-gram whose frequency of appearing in the target language corpus satisfies a third predetermined criterion and whose frequency of appearing in the second native language corpus satisfies a fourth predetermined criterion, wherein any such identified second phoneme n-gram is included in second list;accessing one or more candidate recitation items; andgenerating recitation items based on candidate recitation items having any phoneme n-gram that appears in both the first list and the second list.4. The method of claim 3, further comprising:filtering texts in a text pool, wherein the candidate recitation items included selected texts in the filtered text pool.5. The method of claim 4, wherein the step of filtering includes:identifying any word or phrase that appears in both a pre-determined fairness list and a text in the text pool;determining a fairness metric value associated with the text;determining whether to filter out the text from the text pool based on at least the fairness metric value.6. The method of claim 4, wherein the step of filtering includes:determining a syntactic complexity value associated with a text in the text pool;determining whether to filter out the text from the text pool based on at least the syntactic complexity value.7. The method of claim 4, wherein the step of filtering includes:determining a vocabulary difficulty value associated with a text;determining whether to filter out the text from the text pool based on at least the vocabulary difficulty value.8. The method of claim 7, wherein the vocabulary difficulty value is determined by determining a proportion of low frequency words appearing in the text, wherein a low frequency word is a word whose frequency of appearing in a reference corpus is below a pre-determined threshold.9. The method of claim 3, further comprising:generating, using the one or more processing systems, a value for each candidate recitation item using one or more of:a vocabulary difficulty metric,a syntactic complexity metric,a phoneme distribution metric,a phonetic difficulty metric, anda prosody distribution metric; andwherein the candidate recitation items are selected from one or more potential recitation items based on the value associated with each potential recitation item.10. The method of claim 9, wherein the prosody distribution metric is generated by:generating prosodic annotations for each sentence in the potential recitation item;extracting prosodic patterns from the prosodic annotations;determining a prosodic distribution from the prosodic patterns;comparing the prosodic distribution with an ideal prosodic distribution and computing a prosodic distribution similarity value;wherein determining the value for the potential recitation item is based on at least the prosodic distribution similarity value.11. The method of claim 9, wherein generating the phonetic difficulty metric includes:generating phoneme sequences for the potential recitation item;comparing the phoneme sequences with a challenge list;computing a phonetic difficulty value based on the comparison;wherein determining the value for the potential recitation item is based on at least the phonetic difficulty value.12. The method of claim 1, wherein generating the syntactic complexity metric includes:determining a syntactic complexity value associated with the potential recitation item;wherein determining the value for the potential recitation item is based on at least the syntactic complexity value associated with the potential recitation item.13. The method of claim 9, wherein generating the phoneme distribution metric includes:generating phoneme sequences for the potential recitation item;determining a phoneme distribution of the phoneme sequences;comparing the phoneme distribution with an ideal phoneme distribution and computing a phoneme distribution similarity value;wherein determining the value for the potential recitation item is based on at least the phoneme distribution similarity value.14. The method of claim 13, wherein the potential recitation item is in the target language, and the ideal phoneme distribution is determined based on a reference data set that includes utterances in the target language spoken by native speakers of the target language.15. The method of claim 9, wherein generating the vocabulary difficulty metric includes:determining a vocabulary difficulty value associated with the potential recitation item;wherein determining a value for the potential recitation item is based on at least the vocabulary difficulty value associated with the potential recitation item.16. The method of claim 15, wherein determining the vocabulary difficulty value includes determining a proportion of low frequency words appearing in the potential recitation item, wherein a low frequency word is a word whose frequency of appearing in a reference corpus is below a pre-determined threshold.17. The method of claim 9, further comprising:filtering the one or more potential recitation items;wherein each potential recitation item for which a value is determined is selected from the filtered one or more potential recitation items.18. The method of claim 17, wherein the step of filtering includes:identifying a potential recitation item in the one or more potential recitation items;identifying any word or phrase that appears in both a pre-determined fairness list and a text in the potential recitation item;determining a fairness metric value associated with the text;determining whether to filter out the potential recitation item from the one or more potential recitation items based on at least the fairness metric value.19. The method of claim 17, wherein the step of filtering includes:identifying a potential recitation item in the one or more potential recitation items;determining a syntactic complexity value associated with a text in the potential recitation item;determining whether to filter out the potential recitation item from the one or more potential recitation items based on at least the syntactic complexity value.20. The method of claim 17, wherein the step of filtering includes:identifying a potential recitation item in the one or more potential recitation items;determining a vocabulary difficulty value associated with a text in the potential recitation item;determining whether to filter out the potential recitation item from the one or more potential recitation items based on at least the vocabulary difficulty value.21. The method of claim 20, wherein the vocabulary difficulty value is determined by determining a proportion of low frequency words appearing in the text, wherein a low frequency word is a word whose frequency of appearing in a reference corpus is below a pre-determined threshold.
微信群二維碼
意見反饋