WO1994017508A1 - Computerized system for teaching speech - Google Patents
Computerized system for teaching speech Download PDFInfo
- Publication number
- WO1994017508A1 WO1994017508A1 PCT/US1994/000815 US9400815W WO9417508A1 WO 1994017508 A1 WO1994017508 A1 WO 1994017508A1 US 9400815 W US9400815 W US 9400815W WO 9417508 A1 WO9417508 A1 WO 9417508A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- audio
- specimen
- user
- response
- audio specimen
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
-
- G—PHYSICS
- G09—EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
- G09B—EDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
- G09B7/00—Electrically-operated teaching apparatus or devices working with questions and answers
- G09B7/02—Electrically-operated teaching apparatus or devices working with questions and answers of the type wherein the student is expected to construct an answer to the question which is presented or wherein the machine gives an answer to the question presented by a student
- G09B7/04—Electrically-operated teaching apparatus or devices working with questions and answers of the type wherein the student is expected to construct an answer to the question which is presented or wherein the machine gives an answer to the question presented by a student characterised by modifying the teaching programme in response to a wrong answer, e.g. repeating the question, supplying a further explanation
-
- G—PHYSICS
- G09—EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
- G09B—EDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
- G09B19/00—Teaching not covered by other main groups of this subclass
- G09B19/04—Speaking
-
- G—PHYSICS
- G09—EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
- G09B—EDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
- G09B19/00—Teaching not covered by other main groups of this subclass
- G09B19/06—Foreign languages
Definitions
- the present invention relates to educational systems generally and more particularly to computerized systems for teaching speech.
- the present invention seeks to provide an improved computerized system for speech and pronuncia- tion teaching in which recorded reference speech speci- mens are presented to a student and in which a quanti- fication of the similarity between the student's repe- titions and the originally presented reference speech specimens is displayed to the user.
- the present invention also seeks to provide a speech and pronunciation teaching system which is particularly suited for independent speech study and does not require presence of a trained human speech and pronunciation expert.
- the system of the present invention includes verbal prompts which guide a user through a teaching system without requiring re- course to a human teacher.
- student perform- ance is monitored and the verbal prompt sequence branches to take student performance into account.
- predetermined types of student errors such as repeatedly mispronouncing a particular phoneme
- the verbal prompt sequence may branch to take into account the presence or absence of each type of student error.
- the present invention also seeks to provide a speech and pronunciation teaching system which is particularly suited to teaching preferred pronunciation of a foreign language to a speaker of a native lan- guage.
- the system of the present invention includes an initial menu presented in a plurality of languages and a multi-language message prompting the user to select the menu option representing his native language.
- the system is preferably operative to present subsequent verbal messages to the user in his own native language, and/or to branch the sequence of verbal messages so as to take into account speech characteristics, such as pronunciation errors, which are known to occur frequently in speakers of the user's native language. For example, when speaking English, native speakers of Japanese typically confuse the L and R sounds, and also the short I and long E sounds, as in the words "ship” and "sheep". Native speakers of Arabic and German do not have either of these problems.
- appara- tus for interactive speech training including an audio specimen generator for playing a pre-recorded reference audio specimen to a user for attempted repetition thereby, and an audio specimen scorer for scoring a user's repetition audio specimen.
- the audio specimen scorer includes a reference-to-response comparing unit for comparing at least one feature of a user's repeti- tion audio specimen to at least one feature of the reference audio specimen, and a similarity indicator for providing an output indication of the degree of similarity between at least one feature of the repeti- tion audio specimen and at least one feature of the reference audio specimen.
- the apparatus also includes a user response memory to which the reference- to-response comparing unit has access, for storing a user's repetition of a reference audio specimen.
- the reference-to- response comparing unit includes a volume/duration normalizer for normalizing the volume and duration of the reference and repetition audio specimens.
- the reference-to- response comparing unit includes a parameterization unit for extracting audio signal parameters from the reference and repetition audio specimens. Additionally in accordance with a preferred embodiment of the present invention, the reference-to- response comparing unit also includes apparatus for comparing the reference audio specimen parameters to the repetition audio specimen parameters.
- the apparatus for comparing includes a parameter score generator for providing a score representing the degree of similarity between the audio signal parameters of the reference and repetition audio specimens.
- the output indica- tion includes a display of the score.
- the output indication in- eludes a display of at least one audio waveform.
- the interactive speech training apparatus includes a prompt sequencer opera- tive to generate a sequence of prompts to a user.
- the interactive speech training apparatus also includes a reference audio specimen library in which reference audio speci- mens are stored and to which the audio specimen genera- tor has access.
- the reference audio specimen library includes a multiplicity of recordings of audio specimens produced by a plurality of speech models.
- the plurality of speech models differ from one another in at least one of the following characteristics: sex, age, and dia- lect.
- apparatus for interactive speech training including a prompt sequencer operative to generate a sequence of prompts to a user, prompting the user to produce a corresponding sequence of audio specimens, and a refer- ence-to-response comparing unit for comparing at least one feature of each of the sequence of audio specimens generated by the user, to a reference.
- the reference to which an individual user-generated audio specimen is compared includes a corresponding stored reference audio speci- men.
- the sequence of prompts branches in response to user performance.
- the sequence of prompts is at least partly determined by a user's designation of his native language.
- the prompt se- quencer includes a multi-language prompt sequence library in which a plurality of prompt sequences in a plurality of languages is stored and wherein the prompt sequencer is operative to generate a sequence of prompts in an individual one of the plurality of lan- guages in response to a user's designation of the individual language as his native language.
- apparatus for interactive speech training including an audio specimen recorder for recording audio specimens generated by a user, and a reference-to-response com- paring unit for comparing at least one feature of a user-generated audio specimen to a reference, the comparing unit including an audio specimen segmenter for segmenting a user-generated audio specimen into a plurality of segments, and a segment comparing unit for comparing at least one feature of at least one of the plurality of segments to a reference.
- the audio specimen segmenter includes a phonetic segmenter for segmenting a user-generated audio specimen into a plurality of phonetic segments.
- At least one of the phonetic segments includes a phoneme such as a vowel or consonant.
- at least one of the phonetic segments may include a syllable.
- apparatus for interactive speech training includ- ing an audio specimen recorder for recording audio specimens generated by a user, and a speaker- independent audio specimen scorer for scoring a user- generated audio specimen based on at least one speaker- independent parameter.
- at least one speaker- independent parameter includes a threshold value for the amount of energy at a predetermined frequency.
- Fig. 1 is a generalized pictorial illustra- tion of an interactive speech teaching system con- structed and operative in accordance with a preferred embodiment of the present invention
- Fig. 2 is a simplified block diagram illus- tration of the system of Fig. 1
- Fig. 3 is a simplified block diagram illus- tration of one of the components of the system of Fig. 1
- Fig. 4 is a simplified flow chart illustrat- ing preparation of pre-recorded material for use in the invention
- Figs. 1 is a generalized pictorial illustra- tion of an interactive speech teaching system con- structed and operative in accordance with a preferred embodiment of the present invention
- Fig. 2 is a simplified block diagram illus- tration of the system of Fig. 1
- Fig. 3 is a simplified block diagram illus- tration of one of the components of the system of Fig. 1
- Fig. 4 is
- FIG. 5A and 5B taken together, are a sim- plified flow chart illustrating operation of the appa- ratus of Figs. 1 and 2;
- FIG. 6 is a graphic representation (audio amplitude vs. time in sees) of a speech model's rendi- tion of the word "CAT" over 0.5 seconds;
- Fig. 7 is a graphic representation (audio amplitude vs. time in sees), derived from Fig. 6, of a speech model's rendition of the vowel "A" over 0.128 seconds;
- Fig. 8 is a graphic representation (audio amplitude vs. time in sees) of a student's attempted rendition of the word "CAT" over 0.5 seconds;
- Fig. 6 is a graphic representation (audio amplitude vs. time in sees) of a speech model's rendi- tion of the word "CAT" over 0.5 seconds
- Fig. 7 is a graphic representation (audio amplitude vs. time in
- Fig. 9 is a graphic representation (audio amplitude vs. time in sees), derived from Fig. 8, of a student's attempted rendition of the vowel "A" over 0.128 seconds;
- Fig. 10 is a graphic representation (audio amplitude vs. time in sees) of a student's attempted rendition of the word "CAT” over 0.35 seconds;
- Fig. 11 is a graphic representation (audio amplitude vs. time in sees), derived from Fig. 10, of a student's attempted rendition of the vowel "A" over 0.128 seconds.
- Figs. 1 and 2 illustrate an interactive speech teaching system con- structed and operative in accordance with a preferred embodiment of the present invention.
- the system of Figs. 1 and 2 is preferably based on a conventional personal computer 10, such as an IBM PC-AT, preferably equipped with an auxiliary audio module 12.
- a suitable audio module 12 is the DS201, raanufac- tured by Digispeech Inc. of Palo Alto, CA, USA and commercially available from IBM Educational Systems.
- a headset 14 is preferably associated with audio module 12. As may be seen from Fig.
- a display 30 is optionally provided which represents normalized audio waveforms of both a pre-recorded reference audio speci- men 32 and a student's attempted repetition 34 thereof.
- a score 40 quantifying the similarity over time be- tween the repetition and reference audio specimens, is typically displayed, in order to provide feedback to the student. Any suitable method may be employed to gener- ate the similarity score 40, such as conventional correlation methods. One suitable method is described in the above-referenced article by Itakura, the disclo- sure of which is incorporated herein by reference. To use the distance metric described by Itakura, first linear prediction coefficients are extracted from the speech signal.
- a dynamic programming algorithm may be employed to compute the distance between a student's repetition and a set of models, i.e., the extent to which the student's repetitions correspond to the models.
- appropriate software is loaded in computer 10 of Fig. 1 to carry out the operations set forth in the functional block diagram of Fig. 2.
- the structure of Fig. 2 may be embodied in a conventional hard-wired circuit.
- the apparatus of Fig. 2 com- prises a reference audio specimen player 100 which is operative to play a reference audio specimen to a student 110.
- Reference audio specimens for each of a multiplicity of phonemes, words and/or phrases are typically prerecorded by each of a plurality of speech models and are stored in a reference audio specimen library 120.
- Reference audio specimen player 100 has access to reference audio specimen library 120.
- the student 110 attempts to reproduce each reference audio specimen. His spoken attempts are received by student response specimen receiver 130 and are preferably digitized by a digitizer 140 and stored in a student response specimen memory 150.
- each stored student response from memory 150 is played back to the student on a student response specimen player 154.
- Players 100 and 154 need not, of course, be separate elements and are shown as separate blocks merely for clarity.
- a student response specimen scoring unit 160 is operative to evaluate the reference audio specimens by accessing student response specimen receiver 130.
- Scores are computed by comparing student responses to the corresponding reference audio specimen, accessed from library 120. Evaluation of student responses in terms of a reference specimen sometimes gives less than optimal results because a single reference specimen produced by a single speech model may not accurately represent the optimal pronunciation of that specimen. Therefore, alternatively or in addition, student response scores may be computed by evaluating student responses in terms of a speaker independent reference such as a set of speaker independent parameters stored in a speaker independent parameter database 170.
- the speaker independent parameters in database 170 are specific as to age, gender and/or dialect of the speaker. In other words, the parameters are speaker independent within each individual category of individuals of a particular age, gender and/or dialect.
- the CAT waveform includes first and third high frequency, low energy portions and a second portion interposed between the first and third portions which is characterized by medium frequency and high energy.
- the first and third portions correspond to the C and T sounds in CAT.
- the second portion corresponds to the A sound.
- Frequency analysis may be employed to evalu- ate the response specimen.
- Speaker dependent parameters such as resonant frequencies or linear predictor coef- ficients may be computed, and the computed values may be compared with known normal ranges therefore.
- Student response specimen scoring unit 160 is described in more detail below with reference to Fig. 3.
- the student response score or evaluation derived by scorer unit 160 is displayed to the student on a display 180 such as a television screen.
- the score or evaluation is also stored in a stu- dent follow-up database 190 which accumulates informa- tion regarding the progress of each individual student for follow-up purposes.
- the interface of the system with the student is preferably mediated by a prompt sequencer 200 which is operative to generate prompts to the student, such as verbal prompts, which may either be displayed on display 180 or may be audibly presented to the student.
- the prompt sequencer receives student scores from scoring unit 160 and is operative to branch the sequence of prompts and presented reference audio specimens to correspond to the student's progress as evidenced by his scores.
- the prompt sequencer initially presents the student with a menu via which a student may designate his native language.
- the prompt sequencer preferably takes the student's native language designa- tion into account in at least one of the following ways: (a) Verbal prompts are supplied to the user in his native language. Each prompt is stored in each of a plurality of native languages supported by the system, in a multilanguage prompt library 210 to which prompt sequencer 200 has access. (b)
- the sequence of prompts and refer- ence audio specimens is partially determined by the native language designation. For example, native speak- ers of Hebrew generally have difficulty in pronouncing the English R sound. Therefore, for Hebrew speakers, the sequence of prompts and reference audio specimens might include substantial drilling of the R sound.
- Fig. 3 is a simplified block diagram of a preferred implementation of student specimen scorer 160 of Fig. 2.
- scoring unit 160 receives student response specimens as input, either directly from student response specimen receiver 130 or indi- rectly, via student response specimen memory 150.
- the volume and duration of the responses are preferably normalized by a volume/duration normalizer unit 250, using conventional methods. If the linear predictive coding method of parameter extraction described herein is employed, volume normalization is not necessary because volume is separated from the other parameters during parameter extraction. Duration may be normalized using the time warping method described in the above-referenced arti- cle by Itakura.
- a segmentation unit 260 segments each re- sponse specimen, if it is desired to analyze only a portion of a response specimen, or if it is desired to separately analyze a plurality of portions of the response specimen.
- Each segment or portion may comprise a phonetic unit such as a syllable or phoneme.
- the consonants C and T may be stripped from a student's utterance of the word CAT, in order to allow the phoneme A to be separately analyzed.
- each segment or portion may comprise a time unit. If short, fixed length segments are employed, duration normalization is not necessary.
- the silence- speech boundary is first identified as the point at which the energy increases to several times the back- ground level and remains high.
- silence-speech bound- ary any suitable technique may be employed to identify the silence-speech bound- ary, such as that described in the above-referenced article by Rabiner and Sambur, the disclosure of which is incorporated herein by reference.
- consonant-vowel boundaries are identi- fied by identifying points at which the energy remains high but the dominant speech frequency decreases to a range of about 100 to 200 Hz.
- the dominant frequency may be measured by a zero crossing counter which is operative to count the number of times in which the waveform crosses the horizontal axis.
- specimen segmentation unit 260 may be bypassed or eliminated and each response speci- men may be analyzed in its entirety as a single unit.
- a parameter comparison unit 280 is operative to score student responses by evaluating the student responses in terms of speaker independent parameters stored in speaker independent parameter database 170 of Fig. 2.
- the score for an individual student response preferably represents the degree of similarity between the parameters derived from the individual student response by parameterization unit 270, and the corre- sponding speaker-independent parameters stored in database 170.
- the system may, for example, compare the student's response specimen with a corresponding plu- rality of stored reference specimens, thereby to obtain a plurality of similarity values, and may use the highest of these similarity values, indicating the most similarity, as the score for the student's response.
- the student response scores computed by parameter comparison unit 280 are preferably provided to each of the following units of Fig.
- step 1 (a) display 180, for display to the student. Alternatively, the student may be provided with an audio message indicating the score; (b) student follow-up database 190, for storage; and (c) prompt sequencer 200, to enable the prompt sequencer to adapt the subsequent sequence of prompts and recorded reference audio specimens to the user's progress as evidenced by the scores.
- a preferred method for preparation, during system set-up, of pre-recorded material for storage in reference audio specimen library 120 is now described with reference to Fig. 4.
- a reference audio specimen is recorded for each word, phoneme or other speech unit to be learned.
- step 300 a set of words, phonemes, phrases or other audio specimens is selected.
- a plurality of speech models are employed so that a range of sexes, ages and regional or national dialects may be represented.
- the plurality of speech models employed in a system designed to teach pronunciation of the English language may include the following six speech models: Man — British dialect Woman — British dialect Child — British dialect Man — American dialect Woman — American dialect Child — American dialect
- a plurality of speech models is selected Each audio specimen selected in step 300 is produced by each of the speech models.
- each recorded audio specimen is recorded, digitized and stored in memory by the system.
- the amplitude of each recorded audio specimen is normalized.
- each recorded audio specimen is preferably divided into time segments or phonetic segments.
- each recorded audio specimen is characterized by extracting at least one parameter therefrom.
- a typical user session using the system of Figs. 1-3, is now described with reference to the flowchart of Figs. 5A - 5B.
- the user is provided with a menu of languages and is prompted to designate his native language. Alternatively, the user may be prompted to speak a few words in his native language and the system may be operative to analyze the spoken words and to identify the native language.
- the user is provided with a speech model menu whose options correspond to the plurality of speech models described above, and is prompted to select the speech model most suitable for him.
- step 410 the user is prompted to select an initial reference audio specimen, such as a phoneme, word or phrase, to be practiced.
- the specimen to be practiced may be selected by the system, preferably partially in accordance with the user's designation of his native language in step 400.
- Step 420 The reference audio specimen is played to the user and, optionally, the waveform thereof is simultaneously displayed to the user.
- Step 430 The user's attempted repetition of the reference audio specimen is received, digitized and stored in memory by the system.
- Step 450 The system normalizes the audio level and duration of the repetition audio specimen.
- Step 460 Optionally, the repetition audio specimen is replayed and the normalized waveform of the repetition audio specimen is displayed to the user.
- Step 490 The system extracts audio fea- tures such as linear predictor coefficients from the repetition audio specimen by parameterization of the specimen. Suitable audio feature extraction methods are described in the above-referenced article by Itakura and in the references cited therein, the disclosures of which are incorporated herein by reference.
- Step 500 The system compares the parame- ters extracted in step 490 to stored features of the reference audio specimen and computes a similarity score.
- Step 510 The system displays the similari- ty score.
- Step 520 Preferably, the system plays back the reference and repetition specimens for audio cora- parison by the user.
- Step 530 Optionally, the system stores the similarity score and/or the repetition specimen itself for later follow-up.
- Step 540 Unless the system or the student determine that the session is to terminate, the system returns to step 410.
- system choices of reference specimens take into account student perform- ance. For example, if the similarity score for a par- ticular reference audio specimen is low, indicating poor user performance, the reference audio specimen may be repeated until a minimum level of performance is obtained. Subsequently, a similar reference audio specimen may be employed to ensure that the level of performance obtained generalizes to similar speech tasks. For example, if the user experiences diffi- culty in reproducing A in CAT, the specimen CAT may be repeatedly presented and may be followed by other specimens including A, such as BAD. Figs.
- FIG. 6 - 11 are graphic representations of the waveforms of speech specimens produced by speech models and students.
- Fig. 6 represents a speech model's rendition of the word "CAT” over 0.5 seconds.
- Fig. 7 is a graph- ic representation of a speech model's rendition of the vowel "A” over 0.128 seconds, obtained by "stripping" the consonants from the speech model's rendition of the word "CAT” illustrated in Fig. 6.
- the -starting point of the vowel "A” is identified by finding the consonant- vowel boundaries in "CAT", as described above.
- the duration of each vowel is predetermined.
- Fig. 8 is a graphic representation of a student's attempted rendition of the word "CAT" over 0.5 seconds.
- Fig. 9 is a graphic representation of a student's attempted rendition of the vowel "A” over 0.128 seconds, obtained by "stripping" the consonants from the student's rendition of the word "CAT” illus- trated in Fig. 8.
- Fig. 10 is a graphic representation of a student's attempted rendition of the word "CAT” over 0.35 seconds.
- Fig. 11 is a graphic representation of a student's attempted rendition of the vowel "A” over 0.128 seconds, obtained by "stripping" the consonants from the student's rendition of the word "CAT” illus- trated in Fig. 9. It will be appreciated by persons skilled in the art that the present invention is not limited to what has been particularly shown and described hereina- bove. Rather, the scope of the present invention is defined only by the claims that follow:
Abstract
An improved computerized system (10) for speech and pronunciation teaching in which recorded reference speech specimens (32) are presented to a student and in which a quantification of the similarity between the student's repetitions (34) and the originally presented reference speech specimens is displayed to the uses.
Description
COMPUTERIZED SYSTEM FOR TEACHING SPEECH
The present invention relates to educational systems generally and more particularly to computerized systems for teaching speech.
In recent years there have been developments in the art of computerized teaching of speech. Speech laboratories in which prompts and cues such as pre- recorded sounds and words are presented to a student and the students' speech productions are recorded or monitored are well known. The Speech Viewer II, marketed by IBM, is a speech therapy product which provides visual and audi- tory feedback from a student's sound productions. Known methods and apparatus for computerized speech recognition are described in the following publications, the disclosures of which are incorporated herein by reference: Flanagan, J. L. "Computers that talk and listen: Man-machine communication by voice", Proc IEEE, Vol. 64, 1976, pp. 405 - 415; Itakura, F. "Minimum prediction residual principle applied to speech recognition", IEEE Trans. Acoustics, Speech and Signal Processing, Feb. 1975 — describes a temporal alignment algorithm and a method for computing a distance metric; Le Roux, J. and Gueguen, C. "A fixed point computation of partial correlation coefficients", IEEE ASSP, June, 1977; Peacocke, R. D. and Graf, D. H, "An introduc- tion to speech and speaker recognition", IEEE Computer,
Vol . 23 ( 8 ) , Aug . 1990 , pp . 26 - 33 ; L. R. Rabiner et al, "Speaker-independent recognition of isolated words using clustering tech- niques", IEEE Trans. Acoustics, Speech and Signal Processing, Vol. ASSP-27, No. 4, Aug. 1979, pp. 336 - 349; Rabiner L.R., Levison, S. E. and Sondhi, M. M. , "On the application of vector quantization and hidden Markov models to speaker-independent, isolated word recognition", Bell Systems Tech J, Vol. 62(4), Apr. 1983, pp. 1075 - 1105; Rabiner L.R. and Sanbur M.R., "An algorithm for determining the endpoints of isolated utterances", Bell Systems Tech J, Feb., 1975; Rabiner, L.R. and Wilpon, J.G., "A simpli- fied, robust training procedure for speaker trained isolated word recognition systems", J Acoustical Socie- ty of America, Nov. 1980. The disclosures of all the above publications are incorporated herein by reference.
The present invention seeks to provide an improved computerized system for speech and pronuncia- tion teaching in which recorded reference speech speci- mens are presented to a student and in which a quanti- fication of the similarity between the student's repe- titions and the originally presented reference speech specimens is displayed to the user. The present invention also seeks to provide a speech and pronunciation teaching system which is particularly suited for independent speech study and does not require presence of a trained human speech and pronunciation expert. Preferably, the system of the present invention includes verbal prompts which guide a user through a teaching system without requiring re- course to a human teacher. Preferably, student perform- ance is monitored and the verbal prompt sequence branches to take student performance into account. For example, predetermined types of student errors, such as repeatedly mispronouncing a particular phoneme, may be extracted from student speech responses and the verbal prompt sequence may branch to take into account the presence or absence of each type of student error. The present invention also seeks to provide a speech and pronunciation teaching system which is particularly suited to teaching preferred pronunciation of a foreign language to a speaker of a native lan- guage. Preferably, the system of the present invention includes an initial menu presented in a plurality of languages and a multi-language message prompting the user to select the menu option representing his native language. In response to the user's selection of a native language, the system is preferably operative to present subsequent verbal messages to the user in his own native language, and/or to branch the sequence of
verbal messages so as to take into account speech characteristics, such as pronunciation errors, which are known to occur frequently in speakers of the user's native language. For example, when speaking English, native speakers of Japanese typically confuse the L and R sounds, and also the short I and long E sounds, as in the words "ship" and "sheep". Native speakers of Arabic and German do not have either of these problems. There is thus provided, in accordance with a preferred embodiment of the present invention, appara- tus for interactive speech training including an audio specimen generator for playing a pre-recorded reference audio specimen to a user for attempted repetition thereby, and an audio specimen scorer for scoring a user's repetition audio specimen. Further in accordance with a preferred embod- iment of the present invention the audio specimen scorer includes a reference-to-response comparing unit for comparing at least one feature of a user's repeti- tion audio specimen to at least one feature of the reference audio specimen, and a similarity indicator for providing an output indication of the degree of similarity between at least one feature of the repeti- tion audio specimen and at least one feature of the reference audio specimen. Still further in accordance with a preferred embodiment of the present invention, the apparatus also includes a user response memory to which the reference- to-response comparing unit has access, for storing a user's repetition of a reference audio specimen. Additionally in accordance with a preferred embodiment of the present invention, the reference-to- response comparing unit includes a volume/duration normalizer for normalizing the volume and duration of the reference and repetition audio specimens. Still further in accordance with a preferred embodiment of the present invention, the reference-to- response comparing unit includes a parameterization
unit for extracting audio signal parameters from the reference and repetition audio specimens. Additionally in accordance with a preferred embodiment of the present invention, the reference-to- response comparing unit also includes apparatus for comparing the reference audio specimen parameters to the repetition audio specimen parameters. Further in accordance with a preferred embod- iment of the present invention, the apparatus for comparing includes a parameter score generator for providing a score representing the degree of similarity between the audio signal parameters of the reference and repetition audio specimens. Still further in accordance with a preferred embodiment of the present invention, the output indica- tion includes a display of the score. In accordance with one alternative embodiment of the present invention, the output indication in- eludes a display of at least one audio waveform. Further in accordance with a preferred embod- iment of the present invention, the interactive speech training apparatus includes a prompt sequencer opera- tive to generate a sequence of prompts to a user. Still further in accordance with a preferred embodiment of the present invention, the interactive speech training apparatus also includes a reference audio specimen library in which reference audio speci- mens are stored and to which the audio specimen genera- tor has access. Additionally in accordance with a preferred embodiment of the present invention, the reference audio specimen library includes a multiplicity of recordings of audio specimens produced by a plurality of speech models. Still further in accordance with a preferred embodiment of the present invention, the plurality of speech models differ from one another in at least one of the following characteristics: sex, age, and dia-
lect. There is also provided in accordance with another preferred embodiment of the present invention, apparatus for interactive speech training including a prompt sequencer operative to generate a sequence of prompts to a user, prompting the user to produce a corresponding sequence of audio specimens, and a refer- ence-to-response comparing unit for comparing at least one feature of each of the sequence of audio specimens generated by the user, to a reference. Further in accordance with a preferred embod- iment of the present invention, the reference to which an individual user-generated audio specimen is compared includes a corresponding stored reference audio speci- men. Still further in accordance with a preferred embodiment of the present invention, the sequence of prompts branches in response to user performance. Additionally in accordance with a preferred embodiment of the present invention, the sequence of prompts is at least partly determined by a user's designation of his native language. Still further in accordance with a preferred embodiment of the present invention, the prompt se- quencer includes a multi-language prompt sequence library in which a plurality of prompt sequences in a plurality of languages is stored and wherein the prompt sequencer is operative to generate a sequence of prompts in an individual one of the plurality of lan- guages in response to a user's designation of the individual language as his native language. There is also provided, in accordance with another preferred embodiment of the present invention, apparatus for interactive speech training including an audio specimen recorder for recording audio specimens generated by a user, and a reference-to-response com- paring unit for comparing at least one feature of a user-generated audio specimen to a reference, the
comparing unit including an audio specimen segmenter for segmenting a user-generated audio specimen into a plurality of segments, and a segment comparing unit for comparing at least one feature of at least one of the plurality of segments to a reference. Still further in accordance with a preferred embodiment of the present invention, the audio specimen segmenter includes a phonetic segmenter for segmenting a user-generated audio specimen into a plurality of phonetic segments. Additionally in accordance with a preferred embodiment of the present invention, at least one of the phonetic segments includes a phoneme such as a vowel or consonant. In accordance with one alternative embodiment of the present invention, at least one of the phonetic segments may include a syllable. There is also provided in accordance with yet a further preferred embodiment of the present inven- tion, apparatus for interactive speech training includ- ing an audio specimen recorder for recording audio specimens generated by a user, and a speaker- independent audio specimen scorer for scoring a user- generated audio specimen based on at least one speaker- independent parameter. Further in accordance with a preferred embod- iment of the present invention, at least one speaker- independent parameter includes a threshold value for the amount of energy at a predetermined frequency. Still further in accordance with a preferred embodiment of the present invention, the apparatus also includes a conventional personal computer.
The present invention will be understood and appreciated from the following detailed description, taken in conjunction with the drawings in which: Fig. 1 is a generalized pictorial illustra- tion of an interactive speech teaching system con- structed and operative in accordance with a preferred embodiment of the present invention; Fig. 2 is a simplified block diagram illus- tration of the system of Fig. 1; Fig. 3 is a simplified block diagram illus- tration of one of the components of the system of Fig. 1; Fig. 4 is a simplified flow chart illustrat- ing preparation of pre-recorded material for use in the invention; Figs. 5A and 5B, taken together, are a sim- plified flow chart illustrating operation of the appa- ratus of Figs. 1 and 2; Fig. 6 is a graphic representation (audio amplitude vs. time in sees) of a speech model's rendi- tion of the word "CAT" over 0.5 seconds; Fig. 7 is a graphic representation (audio amplitude vs. time in sees), derived from Fig. 6, of a speech model's rendition of the vowel "A" over 0.128 seconds; Fig. 8 is a graphic representation (audio amplitude vs. time in sees) of a student's attempted rendition of the word "CAT" over 0.5 seconds; Fig. 9 is a graphic representation (audio amplitude vs. time in sees), derived from Fig. 8, of a student's attempted rendition of the vowel "A" over 0.128 seconds; Fig. 10 is a graphic representation (audio amplitude vs. time in sees) of a student's attempted rendition of the word "CAT" over 0.35 seconds; and Fig. 11 is a graphic representation (audio
amplitude vs. time in sees), derived from Fig. 10, of a student's attempted rendition of the vowel "A" over 0.128 seconds.
Reference is now made to Figs. 1 and 2 which illustrate an interactive speech teaching system con- structed and operative in accordance with a preferred embodiment of the present invention. The system of Figs. 1 and 2 is preferably based on a conventional personal computer 10, such as an IBM PC-AT, preferably equipped with an auxiliary audio module 12. For exam- pie, a suitable audio module 12 is the DS201, raanufac- tured by Digispeech Inc. of Palo Alto, CA, USA and commercially available from IBM Educational Systems. A headset 14 is preferably associated with audio module 12. As may be seen from Fig. 1 a display 30 is optionally provided which represents normalized audio waveforms of both a pre-recorded reference audio speci- men 32 and a student's attempted repetition 34 thereof. A score 40, quantifying the similarity over time be- tween the repetition and reference audio specimens, is typically displayed, in order to provide feedback to the student. Any suitable method may be employed to gener- ate the similarity score 40, such as conventional correlation methods. One suitable method is described in the above-referenced article by Itakura, the disclo- sure of which is incorporated herein by reference. To use the distance metric described by Itakura, first linear prediction coefficients are extracted from the speech signal. Then a dynamic programming algorithm may be employed to compute the distance between a student's repetition and a set of models, i.e., the extent to which the student's repetitions correspond to the models. Preferably, appropriate software is loaded in computer 10 of Fig. 1 to carry out the operations set forth in the functional block diagram of Fig. 2. Alter- natively, the structure of Fig. 2 may be embodied in a
conventional hard-wired circuit. Reference is now made specifically to the block diagram of Fig. 2. The apparatus of Fig. 2 com- prises a reference audio specimen player 100 which is operative to play a reference audio specimen to a student 110. Reference audio specimens for each of a multiplicity of phonemes, words and/or phrases are typically prerecorded by each of a plurality of speech models and are stored in a reference audio specimen library 120. Reference audio specimen player 100 has access to reference audio specimen library 120. The student 110 attempts to reproduce each reference audio specimen. His spoken attempts are received by student response specimen receiver 130 and are preferably digitized by a digitizer 140 and stored in a student response specimen memory 150. Optionally, each stored student response from memory 150 is played back to the student on a student response specimen player 154. Players 100 and 154 need not, of course, be separate elements and are shown as separate blocks merely for clarity. A student response specimen scoring unit 160 is operative to evaluate the reference audio specimens by accessing student response specimen receiver 130. Scores are computed by comparing student responses to the corresponding reference audio specimen, accessed from library 120. Evaluation of student responses in terms of a reference specimen sometimes gives less than optimal results because a single reference specimen produced by a single speech model may not accurately represent the optimal pronunciation of that specimen. Therefore, alternatively or in addition, student response scores may be computed by evaluating student responses in terms of a speaker independent reference such as a set of speaker independent parameters stored in a speaker independent parameter database 170. According to a preferred embodiment of the
present invention, the speaker independent parameters in database 170 are specific as to age, gender and/or dialect of the speaker. In other words, the parameters are speaker independent within each individual category of individuals of a particular age, gender and/or dialect. One example of a speaker independent parame- ter is the presence of high energy at a particular frequency which depends on the audio specimen. For example, in Fig. 6, the CAT waveform includes first and third high frequency, low energy portions and a second portion interposed between the first and third portions which is characterized by medium frequency and high energy. The first and third portions correspond to the C and T sounds in CAT. The second portion corresponds to the A sound. Frequency analysis may be employed to evalu- ate the response specimen. Speaker dependent parameters such as resonant frequencies or linear predictor coef- ficients may be computed, and the computed values may be compared with known normal ranges therefore. Student response specimen scoring unit 160 is described in more detail below with reference to Fig. 3. The student response score or evaluation derived by scorer unit 160 is displayed to the student on a display 180 such as a television screen. Prefera- bly, the score or evaluation is also stored in a stu- dent follow-up database 190 which accumulates informa- tion regarding the progress of each individual student for follow-up purposes. The interface of the system with the student is preferably mediated by a prompt sequencer 200 which is operative to generate prompts to the student, such as verbal prompts, which may either be displayed on display 180 or may be audibly presented to the student. Preferably, the prompt sequencer receives student scores from scoring unit 160 and is operative to branch
the sequence of prompts and presented reference audio specimens to correspond to the student's progress as evidenced by his scores. According to a preferred embodiment of the present invention, the prompt sequencer initially presents the student with a menu via which a student may designate his native language. The prompt sequencer preferably takes the student's native language designa- tion into account in at least one of the following ways: (a) Verbal prompts are supplied to the user in his native language. Each prompt is stored in each of a plurality of native languages supported by the system, in a multilanguage prompt library 210 to which prompt sequencer 200 has access. (b) The sequence of prompts and refer- ence audio specimens is partially determined by the native language designation. For example, native speak- ers of Hebrew generally have difficulty in pronouncing the English R sound. Therefore, for Hebrew speakers, the sequence of prompts and reference audio specimens might include substantial drilling of the R sound. Reference is now made to Fig. 3 which is a simplified block diagram of a preferred implementation of student specimen scorer 160 of Fig. 2. As explained above, scoring unit 160 receives student response specimens as input, either directly from student response specimen receiver 130 or indi- rectly, via student response specimen memory 150. The volume and duration of the responses are preferably normalized by a volume/duration normalizer unit 250, using conventional methods. If the linear predictive coding method of parameter extraction described herein is employed, volume normalization is not necessary because volume is separated from the other parameters during parameter extraction. Duration may be normalized using the time warping method described in the above-referenced arti-
cle by Itakura. A segmentation unit 260 segments each re- sponse specimen, if it is desired to analyze only a portion of a response specimen, or if it is desired to separately analyze a plurality of portions of the response specimen. Each segment or portion may comprise a phonetic unit such as a syllable or phoneme. For example, the consonants C and T may be stripped from a student's utterance of the word CAT, in order to allow the phoneme A to be separately analyzed. Alternatively, each segment or portion may comprise a time unit. If short, fixed length segments are employed, duration normalization is not necessary. To segment a response specimen, the silence- speech boundary is first identified as the point at which the energy increases to several times the back- ground level and remains high. Any suitable technique may be employed to identify the silence-speech bound- ary, such as that described in the above-referenced article by Rabiner and Sambur, the disclosure of which is incorporated herein by reference. Next, consonant-vowel boundaries are identi- fied by identifying points at which the energy remains high but the dominant speech frequency decreases to a range of about 100 to 200 Hz. The dominant frequency may be measured by a zero crossing counter which is operative to count the number of times in which the waveform crosses the horizontal axis. Alternatively, specimen segmentation unit 260 may be bypassed or eliminated and each response speci- men may be analyzed in its entirety as a single unit. A parameter comparison unit 280 is operative to score student responses by evaluating the student responses in terms of speaker independent parameters stored in speaker independent parameter database 170 of Fig. 2. The score for an individual student response preferably represents the degree of similarity between the parameters derived from the individual student
response by parameterization unit 270, and the corre- sponding speaker-independent parameters stored in database 170. The system may, for example, compare the student's response specimen with a corresponding plu- rality of stored reference specimens, thereby to obtain a plurality of similarity values, and may use the highest of these similarity values, indicating the most similarity, as the score for the student's response. The student response scores computed by parameter comparison unit 280 are preferably provided to each of the following units of Fig. 1: (a) display 180, for display to the student. Alternatively, the student may be provided with an audio message indicating the score; (b) student follow-up database 190, for storage; and (c) prompt sequencer 200, to enable the prompt sequencer to adapt the subsequent sequence of prompts and recorded reference audio specimens to the user's progress as evidenced by the scores. A preferred method for preparation, during system set-up, of pre-recorded material for storage in reference audio specimen library 120 is now described with reference to Fig. 4. As explained above, during system set-up, a reference audio specimen is recorded for each word, phoneme or other speech unit to be learned. In step 300, a set of words, phonemes, phrases or other audio specimens is selected. Preferably, a plurality of speech models are employed so that a range of sexes, ages and regional or national dialects may be represented. For example, the plurality of speech models employed in a system designed to teach pronunciation of the English language may include the following six speech models: Man — British dialect Woman — British dialect
Child — British dialect Man — American dialect Woman — American dialect Child — American dialect In step 310, a plurality of speech models is selected Each audio specimen selected in step 300 is produced by each of the speech models. In step 320, each recorded audio specimen is recorded, digitized and stored in memory by the system. In step 330, the amplitude of each recorded audio specimen is normalized. In step 340, each recorded audio specimen is preferably divided into time segments or phonetic segments. In step 350, each recorded audio specimen is characterized by extracting at least one parameter therefrom. A typical user session, using the system of Figs. 1-3, is now described with reference to the flowchart of Figs. 5A - 5B. In step 400, the user is provided with a menu of languages and is prompted to designate his native language. Alternatively, the user may be prompted to speak a few words in his native language and the system may be operative to analyze the spoken words and to identify the native language. In step 405, the user is provided with a speech model menu whose options correspond to the plurality of speech models described above, and is prompted to select the speech model most suitable for him. In step 410, the user is prompted to select an initial reference audio specimen, such as a phoneme, word or phrase, to be practiced. Alternatively, the specimen to be practiced may be selected by the system, preferably partially in accordance with the user's designation of his native language in step 400. Step 420 — The reference audio specimen is
played to the user and, optionally, the waveform thereof is simultaneously displayed to the user. Step 430 — The user's attempted repetition of the reference audio specimen is received, digitized and stored in memory by the system. Step 450 — The system normalizes the audio level and duration of the repetition audio specimen. Step 460 — Optionally, the repetition audio specimen is replayed and the normalized waveform of the repetition audio specimen is displayed to the user. Step 490 — The system extracts audio fea- tures such as linear predictor coefficients from the repetition audio specimen by parameterization of the specimen. Suitable audio feature extraction methods are described in the above-referenced article by Itakura and in the references cited therein, the disclosures of which are incorporated herein by reference. Step 500 — The system compares the parame- ters extracted in step 490 to stored features of the reference audio specimen and computes a similarity score. Step 510 — The system displays the similari- ty score. Step 520 — Preferably, the system plays back the reference and repetition specimens for audio cora- parison by the user. Step 530 — Optionally, the system stores the similarity score and/or the repetition specimen itself for later follow-up. Step 540 — Unless the system or the student determine that the session is to terminate, the system returns to step 410. Preferably, system choices of reference specimens take into account student perform- ance. For example, if the similarity score for a par- ticular reference audio specimen is low, indicating poor user performance, the reference audio specimen may be repeated until a minimum level of performance is obtained. Subsequently, a similar reference audio
specimen may be employed to ensure that the level of performance obtained generalizes to similar speech tasks. For example, if the user experiences diffi- culty in reproducing A in CAT, the specimen CAT may be repeatedly presented and may be followed by other specimens including A, such as BAD. Figs. 6 - 11 are graphic representations of the waveforms of speech specimens produced by speech models and students. Fig. 6 represents a speech model's rendition of the word "CAT" over 0.5 seconds. Fig. 7 is a graph- ic representation of a speech model's rendition of the vowel "A" over 0.128 seconds, obtained by "stripping" the consonants from the speech model's rendition of the word "CAT" illustrated in Fig. 6. The -starting point of the vowel "A" is identified by finding the consonant- vowel boundaries in "CAT", as described above. Accord- ing to one embodiment of the present invention, the duration of each vowel is predetermined. A predeter- mined vowel duration of 0.128 sees has been found to provide satisfactory results, however this value is not intended to be limiting. According to another embodiment of the present invention, the duration of each vowel is not predetermined. Instead, vowel-consonant boundaries are identified by suitable analysis of the speech specimen. Fig. 8 is a graphic representation of a student's attempted rendition of the word "CAT" over 0.5 seconds. Fig. 9 is a graphic representation of a student's attempted rendition of the vowel "A" over 0.128 seconds, obtained by "stripping" the consonants from the student's rendition of the word "CAT" illus- trated in Fig. 8. Fig. 10 is a graphic representation of a student's attempted rendition of the word "CAT" over 0.35 seconds. Fig. 11 is a graphic representation of a student's attempted rendition of the vowel "A" over
0.128 seconds, obtained by "stripping" the consonants from the student's rendition of the word "CAT" illus- trated in Fig. 9. It will be appreciated by persons skilled in the art that the present invention is not limited to what has been particularly shown and described hereina- bove. Rather, the scope of the present invention is defined only by the claims that follow:
ιg
Claims
1. Apparatus for interactive speech training comprising: an audio specimen generator for playing a pre-recorded reference audio specimen to a user for attempted repetition thereby; and an audio specimen scorer for scoring a user's repetition audio specimen.
2. Apparatus according to claim 1 wherein the audio specimen scorer comprises: a reference-to-response comparing unit for comparing at least one feature of a user's repetition audio specimen to at least one feature of the reference audio specimen; and a similarity indicator for providing an output indication of the degree of similarity between at least one repetition audio specimen feature and at least one reference audio specimen feature.
3. Apparatus according to claim 2 and also comprising a user response memory to which the refer- ence-to-response comparing unit has access, for storing a user's repetition of a reference audio specimen.
4. Apparatus according to claim 2 wherein said reference-to-response comparing unit comprises a vol- ume/duration normalizer for normalizing the volume and duration of the reference and repetition audio speci- mens.
5. Apparatus according to claim 2 wherein said reference-to-response comparing unit comprises a param- eterization unit for extracting audio signal parameters from the reference and repetition audio specimens.
6. Apparatus according to claim 5 and wherein said reference-to-response comparing unit also com- prises means for comparing the reference audio specimen parameters to the repetition audio specimen parameters.
7. Apparatus according to claim 6 wherein said means for comparing comprises a parameter score genera- tor for providing a score representing the degree of similarity between the audio signal parameters of the reference and repetition audio specimens.
8. Apparatus according to claim 7 wherein said output indication comprises a display of said score.
9. Apparatus according to claim 2 wherein said output indication comprises a display of at least one audio waveform.
10. Apparatus according to claim 1 and also comprising a prompt sequencer operative to generate a sequence of prompts to a user.
11. Apparatus for interactive speech training according to claim 1 and also comprising a reference audio specimen library in which reference audio speci- mens are stored and to which the audio specimen genera- tor has access.
12. Apparatus according to claim 11 wherein said reference audio specimen library comprises a multiplic- ity of recordings of audio specimens produced by a plurality of speech models.
13. Apparatus according to claim 12 wherein the plurality of speech models differ from one another in at least one of the following characteristics: sex; age; and dialect.
14. Apparatus for interactive speech training comprising: a prompt sequencer operative to generate a sequence of prompts to a user, prompting the user to produce a corresponding sequence of audio specimens; and a reference-to-response comparing unit for comparing at least one feature of each of the sequence of audio specimens generated by the user, to a refer- ence.
15. Apparatus according to claim 14 wherein the reference to which an individual user-generated audio specimen is compared comprises a corresponding stored reference audio specimen.
16. Apparatus according to claim 14 wherein the sequence of prompts branches in response to user per- formance.
17. Apparatus according to claim 14 wherein the sequence of prompts is at least partly determined by a user's designation of his native language.
18. Apparatus according to claim 14 wherein the prompt sequencer comprises a multi-language prompt sequence library in which a plurality of prompt se- quences in a plurality of languages is stored and wherein the prompt sequencer is operative to generate a sequence of prompts in an individual one of the plural- ity of languages in response to a user's designation of the individual language as his native language.
19. Apparatus for interactive speech training comprising: an audio specimen recorder for recording audio specimens generated by a user; and a reference-to-response comparing unit for comparing at least one feature of a user-generated audio specimen to a reference, the comparing unit comprising: an audio specimen segmenter for segmenting a user-generated audio specimen into a plurality of segments; and a segment comparing unit for comparing at least one feature of at least one of the plurality of segments to a reference.
20. Apparatus according to claim 19 wherein said audio specimen segmenter comprises a phonetic segmenter for segmenting a user-generated audio specimen into a plurality of phonetic segments.
21. Apparatus according to claim 20 wherein at least one of the phonetic segments comprises a phoneme.
22. Apparatus according to claim 20 wherein at least one of the phonetic segments comprises a sylla- ble.
23. Apparatus according to claim 21 wherein the phoneme comprises a vowel.
24. Apparatus according to claim 21 wherein the phoneme comprises a consonant.
25. Apparatus for interactive speech training comprising: an audio specimen recorder for recording audio specimens generated by a user; and a speaker-independent audio specimen scorer for scoring a user-generated audio specimen based on at least one speaker-independent parameter.
26. Apparatus according to claim 25 wherein at least one speaker-independent parameter comprises a threshold value for the amount of energy at a predeter- mined frequency.
27. Apparatus according to claim 1 and also comprising a conventional personal computer.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU60939/94A AU6093994A (en) | 1993-01-21 | 1994-01-19 | Computerized system for teaching speech |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US08/007,242 US5487671A (en) | 1993-01-21 | 1993-01-21 | Computerized system for teaching speech |
US007,242 | 1993-01-21 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO1994017508A1 true WO1994017508A1 (en) | 1994-08-04 |
Family
ID=21725036
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US1994/000815 WO1994017508A1 (en) | 1993-01-21 | 1994-01-19 | Computerized system for teaching speech |
Country Status (6)
Country | Link |
---|---|
US (2) | US5487671A (en) |
KR (1) | KR940018741A (en) |
CN (1) | CN1101446A (en) |
AU (1) | AU6093994A (en) |
TW (1) | TW277120B (en) |
WO (1) | WO1994017508A1 (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
BE1007899A3 (en) * | 1993-12-22 | 1995-11-14 | Philips Electronics Nv | Information system with means for user interactions, an informationprocessing and information management system and the means for operating thesaid system and the responses to the user |
FR2730579A1 (en) * | 1995-02-10 | 1996-08-14 | 2B Technology Sarl | PORTABLE DEVICE INTENDED FOR VOCAL STIMULI DICTION EXERCISE |
GB2298514A (en) * | 1995-03-03 | 1996-09-04 | Ghazala Shaheen Jamil Malik | Learning the Quran Karim |
WO1997021201A1 (en) * | 1995-12-04 | 1997-06-12 | Bernstein Jared C | Method and apparatus for combined information from speech signals for adaptive interaction in teaching and testing |
WO1998032111A1 (en) * | 1996-12-27 | 1998-07-23 | Ewa Braun | Device for phonological training |
WO1999013446A1 (en) * | 1997-09-05 | 1999-03-18 | Idioma Ltd. | Interactive system for teaching speech pronunciation and reading |
US6151577A (en) * | 1996-12-27 | 2000-11-21 | Ewa Braun | Device for phonological training |
US6157913A (en) * | 1996-11-25 | 2000-12-05 | Bernstein; Jared C. | Method and apparatus for estimating fitness to perform tasks based on linguistic and other aspects of spoken responses in constrained interactions |
WO2019075825A1 (en) * | 2017-10-20 | 2019-04-25 | 深圳市鹰硕技术有限公司 | Internet teaching platform-based accompanying teaching method and system |
WO2019095447A1 (en) * | 2017-11-17 | 2019-05-23 | 深圳市鹰硕技术有限公司 | Guided teaching method having remote assessment function |
Families Citing this family (123)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5540589A (en) * | 1994-04-11 | 1996-07-30 | Mitsubishi Electric Information Technology Center | Audio interactive tutor |
US6283760B1 (en) | 1994-10-21 | 2001-09-04 | Carl Wakamoto | Learning and entertainment device, method and system and storage media therefor |
KR980700637A (en) * | 1994-12-08 | 1998-03-30 | 레이어스 닐 | METHOD AND DEVICE FOR ENHANCER THE RECOGNITION OF SPEECHAMONG SPEECH-IMPAI RED INDIVIDUALS |
US5717828A (en) * | 1995-03-15 | 1998-02-10 | Syracuse Language Systems | Speech recognition apparatus and method for learning |
US6109923A (en) | 1995-05-24 | 2000-08-29 | Syracuase Language Systems | Method and apparatus for teaching prosodic features of speech |
CN1045854C (en) * | 1995-06-16 | 1999-10-20 | 曹敏娟 | Infrared two-direction speech sound transmission system |
JPH09231225A (en) * | 1996-02-26 | 1997-09-05 | Fuji Xerox Co Ltd | Language information processor |
US5893720A (en) * | 1996-03-25 | 1999-04-13 | Cohen; Hannah R. | Development language system for infants |
US5766015A (en) * | 1996-07-11 | 1998-06-16 | Digispeech (Israel) Ltd. | Apparatus for interactive language training |
US5832441A (en) * | 1996-09-16 | 1998-11-03 | International Business Machines Corporation | Creating speech models |
WO1998014934A1 (en) | 1996-10-02 | 1998-04-09 | Sri International | Method and system for automatic text-independent grading of pronunciation for language instruction |
US5915001A (en) * | 1996-11-14 | 1999-06-22 | Vois Corporation | System and method for providing and using universally accessible voice and speech data files |
US5857173A (en) * | 1997-01-30 | 1999-01-05 | Motorola, Inc. | Pronunciation measurement device and method |
US5811791A (en) * | 1997-03-25 | 1998-09-22 | Sony Corporation | Method and apparatus for providing a vehicle entertainment control system having an override control switch |
US6109107A (en) * | 1997-05-07 | 2000-08-29 | Scientific Learning Corporation | Method and apparatus for diagnosing and remediating language-based learning impairments |
US5920838A (en) * | 1997-06-02 | 1999-07-06 | Carnegie Mellon University | Reading and pronunciation tutor |
US6603835B2 (en) | 1997-09-08 | 2003-08-05 | Ultratec, Inc. | System for text assisted telephony |
US6159014A (en) * | 1997-12-17 | 2000-12-12 | Scientific Learning Corp. | Method and apparatus for training of cognitive and memory systems in humans |
US6019607A (en) * | 1997-12-17 | 2000-02-01 | Jenkins; William M. | Method and apparatus for training of sensory and perceptual systems in LLI systems |
US5927988A (en) * | 1997-12-17 | 1999-07-27 | Jenkins; William M. | Method and apparatus for training of sensory and perceptual systems in LLI subjects |
US6134529A (en) * | 1998-02-09 | 2000-10-17 | Syracuse Language Systems, Inc. | Speech recognition apparatus and method for learning |
US7203649B1 (en) * | 1998-04-15 | 2007-04-10 | Unisys Corporation | Aphasia therapy system |
US6305942B1 (en) | 1998-11-12 | 2001-10-23 | Metalearning Systems, Inc. | Method and apparatus for increased language fluency through interactive comprehension, recognition and generation of sounds, words and sentences |
AU3910500A (en) * | 1999-03-25 | 2000-10-09 | Planetlingo, Inc. | Method and system for computer assisted natural language instruction with adjustable speech recognizer |
US6224383B1 (en) | 1999-03-25 | 2001-05-01 | Planetlingo, Inc. | Method and system for computer assisted natural language instruction with distracters |
US6397185B1 (en) * | 1999-03-29 | 2002-05-28 | Betteraccent, Llc | Language independent suprasegmental pronunciation tutoring system and methods |
US6296489B1 (en) * | 1999-06-23 | 2001-10-02 | Heuristix | System for sound file recording, analysis, and archiving via the internet for language training and other applications |
US6468084B1 (en) * | 1999-08-13 | 2002-10-22 | Beacon Literacy, Llc | System and method for literacy development |
US7149690B2 (en) * | 1999-09-09 | 2006-12-12 | Lucent Technologies Inc. | Method and apparatus for interactive language instruction |
US6434518B1 (en) * | 1999-09-23 | 2002-08-13 | Charles A. Glenn | Language translator |
EP1139318A4 (en) * | 1999-09-27 | 2002-11-20 | Kojima Co Ltd | Pronunciation evaluation system |
US7330815B1 (en) | 1999-10-04 | 2008-02-12 | Globalenglish Corporation | Method and system for network-based speech recognition |
US8170538B2 (en) * | 1999-12-06 | 2012-05-01 | Solocron Media, Llc | Methods and apparatuses for programming user-defined information into electronic devices |
US6496692B1 (en) | 1999-12-06 | 2002-12-17 | Michael E. Shanahan | Methods and apparatuses for programming user-defined information into electronic devices |
US7149509B2 (en) * | 1999-12-06 | 2006-12-12 | Twenty Year Innovations, Inc. | Methods and apparatuses for programming user-defined information into electronic devices |
JP3520022B2 (en) * | 2000-01-14 | 2004-04-19 | 株式会社国際電気通信基礎技術研究所 | Foreign language learning device, foreign language learning method and medium |
US6847931B2 (en) | 2002-01-29 | 2005-01-25 | Lessac Technology, Inc. | Expressive parsing in computerized conversion of text to speech |
US6865533B2 (en) * | 2000-04-21 | 2005-03-08 | Lessac Technology Inc. | Text to speech |
WO2001082291A1 (en) * | 2000-04-21 | 2001-11-01 | Lessac Systems, Inc. | Speech recognition and training methods and systems |
US6963841B2 (en) * | 2000-04-21 | 2005-11-08 | Lessac Technology, Inc. | Speech training method with alternative proper pronunciation database |
US7280964B2 (en) * | 2000-04-21 | 2007-10-09 | Lessac Technologies, Inc. | Method of recognizing spoken language with recognition of language color |
US6850882B1 (en) | 2000-10-23 | 2005-02-01 | Martin Rothenberg | System for measuring velar function during speech |
US6971993B2 (en) | 2000-11-15 | 2005-12-06 | Logometrix Corporation | Method for utilizing oral movement and related events |
US7203840B2 (en) * | 2000-12-18 | 2007-04-10 | Burlingtonspeech Limited | Access control for interactive learning system |
WO2002050798A2 (en) * | 2000-12-18 | 2002-06-27 | Digispeech Marketing Ltd. | Spoken language teaching system based on language unit segmentation |
US7996321B2 (en) * | 2000-12-18 | 2011-08-09 | Burlington English Ltd. | Method and apparatus for access control to language learning system |
US6732076B2 (en) | 2001-01-25 | 2004-05-04 | Harcourt Assessment, Inc. | Speech analysis and therapy system and method |
WO2002059856A2 (en) * | 2001-01-25 | 2002-08-01 | The Psychological Corporation | Speech transcription, therapy, and analysis system and method |
US6711544B2 (en) | 2001-01-25 | 2004-03-23 | Harcourt Assessment, Inc. | Speech therapy system and method |
US6714911B2 (en) | 2001-01-25 | 2004-03-30 | Harcourt Assessment, Inc. | Speech transcription and analysis system and method |
US6523007B2 (en) * | 2001-01-31 | 2003-02-18 | Headsprout, Inc. | Teaching method and system |
US6882707B2 (en) * | 2001-02-21 | 2005-04-19 | Ultratec, Inc. | Method and apparatus for training a call assistant for relay re-voicing |
US20020010715A1 (en) * | 2001-07-26 | 2002-01-24 | Garry Chinn | System and method for browsing using a limited display device |
US8416925B2 (en) | 2005-06-29 | 2013-04-09 | Ultratec, Inc. | Device independent text captioned telephone service |
US7881441B2 (en) * | 2005-06-29 | 2011-02-01 | Ultratec, Inc. | Device independent text captioned telephone service |
KR20030078493A (en) * | 2002-03-29 | 2003-10-08 | 박성기 | Foreign language study apparatus |
US20030235806A1 (en) * | 2002-06-19 | 2003-12-25 | Wen Say Ling | Conversation practice system with dynamically adjustable play speed and the method thereof |
US7219059B2 (en) * | 2002-07-03 | 2007-05-15 | Lucent Technologies Inc. | Automatic pronunciation scoring for language learning |
US7752045B2 (en) * | 2002-10-07 | 2010-07-06 | Carnegie Mellon University | Systems and methods for comparing speech elements |
EP1565899A1 (en) * | 2002-11-27 | 2005-08-24 | Visual Pronunciation Software Ltd. | A method, system and software for teaching pronunciation |
US20040148226A1 (en) * | 2003-01-28 | 2004-07-29 | Shanahan Michael E. | Method and apparatus for electronic product information and business transactions |
US7524191B2 (en) * | 2003-09-02 | 2009-04-28 | Rosetta Stone Ltd. | System and method for language instruction |
US7113981B2 (en) * | 2003-12-29 | 2006-09-26 | Mixxer, Inc. | Cellular telephone download locker |
US20050142522A1 (en) * | 2003-12-31 | 2005-06-30 | Kullok Jose R. | System for treating disabilities such as dyslexia by enhancing holistic speech perception |
US20050153267A1 (en) * | 2004-01-13 | 2005-07-14 | Neuroscience Solutions Corporation | Rewards method and apparatus for improved neurological training |
US20050175972A1 (en) * | 2004-01-13 | 2005-08-11 | Neuroscience Solutions Corporation | Method for enhancing memory and cognition in aging adults |
US8515024B2 (en) | 2010-01-13 | 2013-08-20 | Ultratec, Inc. | Captioned telephone service |
WO2005081511A1 (en) * | 2004-02-18 | 2005-09-01 | Ultratec, Inc. | Captioned telephone service |
EP1721302A1 (en) * | 2004-03-02 | 2006-11-15 | AUBERT, Christian | Method for teaching verbs of foreign language |
NZ534092A (en) * | 2004-07-12 | 2007-03-30 | Kings College Trustees | Computer generated interactive environment with characters for learning a language |
US20060057545A1 (en) * | 2004-09-14 | 2006-03-16 | Sensory, Incorporated | Pronunciation training method and apparatus |
US7258660B1 (en) | 2004-09-17 | 2007-08-21 | Sarfati Roy J | Speech therapy method |
US20060084047A1 (en) * | 2004-10-20 | 2006-04-20 | Inventec Corporation | System and method of segmented language learning |
US8478597B2 (en) * | 2005-01-11 | 2013-07-02 | Educational Testing Service | Method and system for assessing pronunciation difficulties of non-native speakers |
US11258900B2 (en) | 2005-06-29 | 2022-02-22 | Ultratec, Inc. | Device independent text captioned telephone service |
CN101223565B (en) * | 2005-07-15 | 2013-02-27 | 理查德·A·莫 | Voice pronunciation training device, method and program |
JP2009525492A (en) * | 2005-08-01 | 2009-07-09 | 一秋 上川 | A system of expression and pronunciation techniques for English sounds and other European sounds |
US7657221B2 (en) * | 2005-09-12 | 2010-02-02 | Northwest Educational Software, Inc. | Virtual oral recitation examination apparatus, system and method |
TWI277947B (en) * | 2005-09-14 | 2007-04-01 | Delta Electronics Inc | Interactive speech correcting method |
US20090220926A1 (en) * | 2005-09-20 | 2009-09-03 | Gadi Rechlis | System and Method for Correcting Speech |
JP2007140200A (en) * | 2005-11-18 | 2007-06-07 | Yamaha Corp | Language learning device and program |
WO2007135605A1 (en) * | 2006-05-22 | 2007-11-29 | Philips Intellectual Property & Standards Gmbh | System and method of training a dysarthric speaker |
WO2008083689A1 (en) * | 2007-01-14 | 2008-07-17 | The Engineering Company For The Development Of Computer Systems ; (Rdi) | System and method for qur'an recitation rules |
TW200838035A (en) | 2007-03-08 | 2008-09-16 | Cirocomm Technology Corp | Improved miniature digital antenna with multi-bandwidth switch |
US7659856B2 (en) | 2007-05-09 | 2010-02-09 | Cirocomm Technology Corp. | Extremely miniaturized digital antenna having switchable multiple bandwidths |
WO2009006433A1 (en) * | 2007-06-29 | 2009-01-08 | Alelo, Inc. | Interactive language pronunciation teaching |
JP2009128675A (en) * | 2007-11-26 | 2009-06-11 | Toshiba Corp | Device, method and program, for recognizing speech |
CN101465259B (en) | 2007-12-19 | 2011-12-21 | 清华大学 | field emission electronic device |
US8030568B2 (en) * | 2008-01-24 | 2011-10-04 | Qualcomm Incorporated | Systems and methods for improving the similarity of the output volume between audio players |
GB2458461A (en) * | 2008-03-17 | 2009-09-23 | Kai Yu | Spoken language learning system |
WO2010003068A1 (en) * | 2008-07-03 | 2010-01-07 | The Board Of Trustees Of The University Of Illinois | Systems and methods for identifying speech sound features |
KR20100022243A (en) * | 2008-08-19 | 2010-03-02 | 현대자동차주식회사 | Foreign language studying system using bluetooth and method thereof |
US20100105015A1 (en) * | 2008-10-23 | 2010-04-29 | Judy Ravin | System and method for facilitating the decoding or deciphering of foreign accents |
CN101510423B (en) * | 2009-03-31 | 2011-06-15 | 北京志诚卓盛科技发展有限公司 | Multilevel interactive pronunciation quality estimation and diagnostic system |
US8412531B2 (en) * | 2009-06-10 | 2013-04-02 | Microsoft Corporation | Touch anywhere to speak |
GB0920480D0 (en) * | 2009-11-24 | 2010-01-06 | Yu Kai | Speech processing and learning |
TWI431563B (en) * | 2010-08-03 | 2014-03-21 | Ind Tech Res Inst | Language learning system, language learning method, and computer product thereof |
US9691289B2 (en) * | 2010-12-22 | 2017-06-27 | Brightstar Learning | Monotonous game-like task to promote effortless automatic recognition of sight words |
US8744856B1 (en) | 2011-02-22 | 2014-06-03 | Carnegie Speech Company | Computer implemented system and method and computer program product for evaluating pronunciation of phonemes in a language |
CN104285249B (en) * | 2012-05-09 | 2018-06-01 | 皇家飞利浦有限公司 | For the device and method of the behavior change of backer |
WO2014002391A1 (en) * | 2012-06-29 | 2014-01-03 | テルモ株式会社 | Information processing device and information processing method |
TWI508033B (en) * | 2013-04-26 | 2015-11-11 | Wistron Corp | Method and device for learning language and computer readable recording medium |
JP6244658B2 (en) * | 2013-05-23 | 2017-12-13 | 富士通株式会社 | Audio processing apparatus, audio processing method, and audio processing program |
US20150031010A1 (en) * | 2013-07-24 | 2015-01-29 | Aspen Performance Technologies | Improving neuroperformance |
US10878721B2 (en) | 2014-02-28 | 2020-12-29 | Ultratec, Inc. | Semiautomated relay method and apparatus |
US10748523B2 (en) | 2014-02-28 | 2020-08-18 | Ultratec, Inc. | Semiautomated relay method and apparatus |
US10389876B2 (en) | 2014-02-28 | 2019-08-20 | Ultratec, Inc. | Semiautomated relay method and apparatus |
US20180034961A1 (en) | 2014-02-28 | 2018-02-01 | Ultratec, Inc. | Semiautomated Relay Method and Apparatus |
US20180270350A1 (en) | 2014-02-28 | 2018-09-20 | Ultratec, Inc. | Semiautomated relay method and apparatus |
JP2016045420A (en) * | 2014-08-25 | 2016-04-04 | カシオ計算機株式会社 | Pronunciation learning support device and program |
CN109872727B (en) * | 2014-12-04 | 2021-06-08 | 上海流利说信息技术有限公司 | Voice quality evaluation device, method and system |
CN104505103B (en) * | 2014-12-04 | 2018-07-03 | 上海流利说信息技术有限公司 | Voice quality assessment equipment, method and system |
CN106156905A (en) * | 2015-03-25 | 2016-11-23 | 肖圣林 | Mobile interchange learning platform |
JP2017142353A (en) * | 2016-02-10 | 2017-08-17 | 株式会社World Talk Box | Language learning device, language learning method, and language learning program |
US20180197438A1 (en) * | 2017-01-10 | 2018-07-12 | International Business Machines Corporation | System for enhancing speech performance via pattern detection and learning |
US10916154B2 (en) | 2017-10-25 | 2021-02-09 | International Business Machines Corporation | Language learning and speech enhancement through natural language processing |
US11210968B2 (en) * | 2018-09-18 | 2021-12-28 | International Business Machines Corporation | Behavior-based interactive educational sessions |
CN109637543A (en) * | 2018-12-12 | 2019-04-16 | 平安科技(深圳)有限公司 | The voice data processing method and device of sound card |
JP7195593B2 (en) * | 2018-12-13 | 2022-12-26 | 株式会社Ecc | Language learning devices and language learning programs |
US11288974B2 (en) * | 2019-03-20 | 2022-03-29 | Edana Croyle | Speech development system |
US11282402B2 (en) * | 2019-03-20 | 2022-03-22 | Edana Croyle | Speech development assembly |
CN109979433A (en) * | 2019-04-02 | 2019-07-05 | 北京儒博科技有限公司 | Voice is with reading processing method, device, equipment and storage medium |
US11539900B2 (en) | 2020-02-21 | 2022-12-27 | Ultratec, Inc. | Caption modification and augmentation systems and methods for use by hearing assisted user |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4615680A (en) * | 1983-05-20 | 1986-10-07 | Tomatis Alfred A A | Apparatus and method for practicing pronunciation of words by comparing the user's pronunciation with the stored pronunciation |
US4641343A (en) * | 1983-02-22 | 1987-02-03 | Iowa State University Research Foundation, Inc. | Real time speech formant analyzer and display |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4507750A (en) | 1982-05-13 | 1985-03-26 | Texas Instruments Incorporated | Electronic apparatus from a host language |
EP0094502A1 (en) | 1982-05-13 | 1983-11-23 | Texas Instruments Incorporated | Electronic learning aid for assistance in speech pronunciation |
GB8817705D0 (en) | 1988-07-25 | 1988-09-01 | British Telecomm | Optical communications system |
WO1990001202A1 (en) | 1988-07-28 | 1990-02-08 | John Harold Dunlavy | Improvements to aircraft collision avoidance |
FR2674660A1 (en) * | 1989-06-26 | 1992-10-02 | Bozadjian Edouard | COMPARATIVE EVALUATION SYSTEM FOR IMPROVING PRONUNCIATION. |
-
1993
- 1993-01-21 US US08/007,242 patent/US5487671A/en not_active Ceased
-
1994
- 1994-01-19 WO PCT/US1994/000815 patent/WO1994017508A1/en active Application Filing
- 1994-01-19 AU AU60939/94A patent/AU6093994A/en not_active Abandoned
- 1994-01-20 CN CN94102645A patent/CN1101446A/en active Pending
- 1994-01-21 KR KR1019940001082A patent/KR940018741A/en not_active Application Discontinuation
- 1994-03-25 TW TW083102645A patent/TW277120B/zh active
-
1997
- 1997-05-09 US US08/854,251 patent/USRE37684E1/en not_active Expired - Fee Related
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4641343A (en) * | 1983-02-22 | 1987-02-03 | Iowa State University Research Foundation, Inc. | Real time speech formant analyzer and display |
US4615680A (en) * | 1983-05-20 | 1986-10-07 | Tomatis Alfred A A | Apparatus and method for practicing pronunciation of words by comparing the user's pronunciation with the stored pronunciation |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
BE1007899A3 (en) * | 1993-12-22 | 1995-11-14 | Philips Electronics Nv | Information system with means for user interactions, an informationprocessing and information management system and the means for operating thesaid system and the responses to the user |
FR2730579A1 (en) * | 1995-02-10 | 1996-08-14 | 2B Technology Sarl | PORTABLE DEVICE INTENDED FOR VOCAL STIMULI DICTION EXERCISE |
WO1996024917A1 (en) * | 1995-02-10 | 1996-08-15 | 2B Technology S.A.R.L. | Portable apparatus for practising the pronunciation of voice stimuli |
GB2298514A (en) * | 1995-03-03 | 1996-09-04 | Ghazala Shaheen Jamil Malik | Learning the Quran Karim |
WO1997021201A1 (en) * | 1995-12-04 | 1997-06-12 | Bernstein Jared C | Method and apparatus for combined information from speech signals for adaptive interaction in teaching and testing |
US5870709A (en) * | 1995-12-04 | 1999-02-09 | Ordinate Corporation | Method and apparatus for combining information from speech signals for adaptive interaction in teaching and testing |
US6157913A (en) * | 1996-11-25 | 2000-12-05 | Bernstein; Jared C. | Method and apparatus for estimating fitness to perform tasks based on linguistic and other aspects of spoken responses in constrained interactions |
WO1998032111A1 (en) * | 1996-12-27 | 1998-07-23 | Ewa Braun | Device for phonological training |
US6151577A (en) * | 1996-12-27 | 2000-11-21 | Ewa Braun | Device for phonological training |
WO1999013446A1 (en) * | 1997-09-05 | 1999-03-18 | Idioma Ltd. | Interactive system for teaching speech pronunciation and reading |
WO2019075825A1 (en) * | 2017-10-20 | 2019-04-25 | 深圳市鹰硕技术有限公司 | Internet teaching platform-based accompanying teaching method and system |
WO2019095447A1 (en) * | 2017-11-17 | 2019-05-23 | 深圳市鹰硕技术有限公司 | Guided teaching method having remote assessment function |
Also Published As
Publication number | Publication date |
---|---|
AU6093994A (en) | 1994-08-15 |
US5487671A (en) | 1996-01-30 |
USRE37684E1 (en) | 2002-04-30 |
KR940018741A (en) | 1994-08-18 |
TW277120B (en) | 1996-06-01 |
CN1101446A (en) | 1995-04-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US5487671A (en) | Computerized system for teaching speech | |
US5717828A (en) | Speech recognition apparatus and method for learning | |
US6134529A (en) | Speech recognition apparatus and method for learning | |
KR100312060B1 (en) | Speech recognition enrollment for non-readers and displayless devices | |
US7392187B2 (en) | Method and system for the automatic generation of speech features for scoring high entropy speech | |
JP3520022B2 (en) | Foreign language learning device, foreign language learning method and medium | |
Gerosa et al. | A review of ASR technologies for children's speech | |
US7840404B2 (en) | Method and system for using automatic generation of speech features to provide diagnostic feedback | |
US5634086A (en) | Method and apparatus for voice-interactive language instruction | |
US20020086269A1 (en) | Spoken language teaching system based on language unit segmentation | |
US20090004633A1 (en) | Interactive language pronunciation teaching | |
US8972259B2 (en) | System and method for teaching non-lexical speech effects | |
AU2003300130A1 (en) | Speech recognition method | |
JPH10222190A (en) | Sounding measuring device and method | |
JPH11143346A (en) | Method and device for evaluating language practicing speech and storage medium storing speech evaluation processing program | |
JP2002040926A (en) | Foreign language-pronunciationtion learning and oral testing method using automatic pronunciation comparing method on internet | |
US20060053012A1 (en) | Speech mapping system and method | |
Wester et al. | Evaluating comprehension of natural and synthetic conversational speech | |
Akahane-Yamada et al. | Computer-based second language production training by using spectrographic representation and HMM-based speech recognition scores | |
CN109697975B (en) | Voice evaluation method and device | |
WO1999013446A1 (en) | Interactive system for teaching speech pronunciation and reading | |
Chen et al. | Automatic pronunciation assessment for mandarin chinese: Approaches and system overview | |
EP3979239A1 (en) | Method and apparatus for automatic assessment of speech and language skills | |
Komatsu et al. | Perceptual discrimination of prosodic types | |
Malucha | Computer Based Evaluation of Speech Voicing for Training English Pronunciation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AT AU BB BG BR BY CA CH CN CZ DE DK ES FI GB HU JP KP KR KZ LK LU LV MG MN MW NL NO NZ PL PT RO RU SD SE SK UA US UZ VN |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): AT BE CH DE DK ES FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
122 | Ep: pct application non-entry in european phase | ||
NENP | Non-entry into the national phase |
Ref country code: CA |