WO2002059856A2 - Speech transcription, therapy, and analysis system and method - Google Patents

Speech transcription, therapy, and analysis system and method Download PDF

Info

Publication number
WO2002059856A2
WO2002059856A2 PCT/US2002/002258 US0202258W WO02059856A2 WO 2002059856 A2 WO2002059856 A2 WO 2002059856A2 US 0202258 W US0202258 W US 0202258W WO 02059856 A2 WO02059856 A2 WO 02059856A2
Authority
WO
WIPO (PCT)
Prior art keywords
user
word
processor
error
pronunciation
Prior art date
Application number
PCT/US2002/002258
Other languages
French (fr)
Other versions
WO2002059856A3 (en
Inventor
Julie Masterson
Barbara Bernhardt
Valarie Spiser-Albert
Carol Waryas
Pam Parmer
James H. Segapeli
Jan C. Laurent
Laurie Labbe
Original Assignee
The Psychological Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US09/769,776 external-priority patent/US6732076B2/en
Priority claimed from US09/770,093 external-priority patent/US6711544B2/en
Priority claimed from US09/999,249 external-priority patent/US6714911B2/en
Application filed by The Psychological Corporation filed Critical The Psychological Corporation
Priority to AU2002237945A priority Critical patent/AU2002237945A1/en
Publication of WO2002059856A2 publication Critical patent/WO2002059856A2/en
Publication of WO2002059856A3 publication Critical patent/WO2002059856A3/en

Links

Classifications

    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B5/00Electrically-operated educational appliances
    • G09B5/06Electrically-operated educational appliances with both visual and audible presentation of the material to be studied
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B19/00Teaching not covered by other main groups of this subclass
    • G09B19/04Speaking

Definitions

  • the present invention relates to systems and methods for analyzing and remediating speech pathologies, and, more particularly, to such systems and methods that are computer-based.
  • pictures and/or words stored in a database could be sorted using a desired criterion such as a particular phoneme and presented to the student under software control for facilitating the acquisition or remediation of speech or language skills. No analysis or scoring is performed; rather, the product is intended for use by one or more students, either alone or in concert with a pathologist/teacher.
  • a previously known method of diagnosing articulation or phonology disorders included a "pencil and paper" test wherein a student is asked to speak a word. The therapist grades the word subjectively, based upon the therapist's ear and the local standards.
  • a first aspect of which comprises a method and system for providing speech therapy.
  • the method comprises the steps of selecting a problem speech sound and searching a database that comprises a plurality of records. Each record comprises a picture and a word associated with the word. Next a set of records is automatically generated from the plurality of records.
  • Each record contains a word specific to the problem speech's sound.
  • the set of records is next automatically presented to a user sequentially on a display device, and the user is prompted to pronounce the displayed word. Finally, the pronunciation of each word is scored.
  • the system of the first aspect of the present invention comprises a processor, an input device in communication with the processor having means for selecting a problem speech sound, and a display device in communication with the processor.
  • the database as described above is resident on the processor, as are software means.
  • the software is adapted to automatically generate a set of records from the plurality of records, with each record containing a word specific to the problem speech sound.
  • the software is also adapted to automatically present at least a portion of each record in the set of records to a user sequentially on a display device; the set of records to a user sequentially on the display device and to prompt the user to pronounce the displayed word.
  • the software is adapted to receive via the input device a score for the pronunciation of each word.
  • Another aspect of the present invention is a system and method for analyzing a speech problem by performing a test of articulation, phonology, and sound features that is administered and analyzed with the use of an electronic processor.
  • This method comprises the steps of presenting to a student/user a symbol representative of a word and prompting the user to pronounce the word represented by the symbol into a microphone in signal communication with a processor. Next the therapist enters a phonetic representation of the user pronunciation into the processor. It is then automatically determined whether an error exists in the user pronunciation. If an error exists, the error is automatically categorized.
  • the therapist enters the phonetic representation of the user pronunciation into an input and storage device that is not in signal communication with the processor. At a later time the phonetic representation is downloaded into the processor, whereupon the automatic determining and categorizing steps proceed.
  • the system of the second aspect of the invention evaluates an articulation disorder.
  • the system comprises a processor and an output device and an input device, each in signal communication with the processor.
  • Software installable on the processor is adapted to present on the output device, typically a display device, although this is not intended as a limitation, a symbol representative of a word.
  • the software then is adapted to prompt a user via the output device to pronounce the word represented by the symbol and to receive from the therapist via the input device a phonetic representation of the user's pronunciation.
  • the software automatically determines whether an error exists in the user pronunciation, and, if an error exists, automatically categorizes the error.
  • system comprises a processor and an output device and a user input device, each in signal communication with the processor.
  • system further comprises an operator input and storage device that is not in signal communication with the processor, but is connectable thereto for downloading operator-entered data thereinto, the data comprising the phonetic representation.
  • the software then receives downloaded data from the operator input and storage device the phonetic representation of the user's pronunciation.
  • the software automatically determines whether an error exists in the user pronunciation, and, if an error exists, automatically categorizes the error.
  • the system and method of this second feature of the invention may be adapted for presentation of a single word, a plurality of words having a predetermined feature desired to be tested, a pretest for screening for potential articulation disorders, and an analysis of connected speech with the use of a moving picture to elicit a narrative from the student.
  • An additional aspect of the present invention is directed to the transcription of a student's speech by the therapist using a computerized process. This method comprises the steps of prompting the student to produce at least one phoneme orally.
  • the system related to this aspect of the invention comprises a processor and display means in signal communication with the processor.
  • the display means are for prompting a student to produce at least one phoneme orally, displaying a correct production of the at least one phoneme to a therapist, and displaying at least one incorrect production of the at least one phoneme to the therapist.
  • the therapist uses input means in signal communication with the processor to select from among the displayed correct and incorrect productions based upon the student-produced at least one phoneme, thus obviating the need for the therapist to enter the incorrect production symbol by symbol, unless it is desired to do so, or unless the actual production is not found among the displayed production selections.
  • FIGS. 1A,1B is a flow chart for an exemplary embodiment of the speech therapy method of the invention.
  • FIG. 2 is a schematic diagram of the speech therapy and analysis system.
  • FIGS. 3A,3B is a flow chart for an exemplary embodiment of the speech analysis method of the invention.
  • FIG. 4 is a section of a flow chart for another embodiment of the speech analysis method of the invention.
  • FIG. 5 is a schematic diagram of an alternate embodiment of the speech analysis system.
  • FIGS. 6A,6B is a flow chart for an additional embodiment of the speech analysis method of the invention.
  • FIG. 7 is an exemplary phonemic profile or individualized phonological evaluation screen.
  • FIG. 8 is an exemplary basic IPA production transcription screen.
  • FIG. 9 is an exemplary parent letter report.
  • FIG. 10 is an exemplary student production report option selection screen.
  • FIGS. 11A-11E is an exemplary level 1 treatment suggestion report.
  • FIGS. 12A-12E is an exemplary level 2 treatment suggestion report.
  • FIGS. 13A,13B is an exemplary level 3 treatment suggestion report.
  • FIGS. 14 is an exemplary level 4 treatment suggestion report.
  • FIG. 15 is an exemplary connected speech sample transcription screen.
  • FIGS. 1 A,1 B A flow chart of an exemplary embodiment of the automated speech therapy/intervention method is given in FIGS. 1 A,1 B, and a schematic of the system in FIG. 2.
  • the system and method are also contemplated for use in the acquisition of a language skill as well as in a remediation setting.
  • the "professional” version 10 of the invention block 100
  • typically two people who will be referred to as "therapist” 11 and "student” 12 are present, although this is not intended as a limitation.
  • This version is contemplated for use in such settings 32 as a hospital, clinic, rehabilitation center, school, or private facility.
  • the "student” 12 may be working alone, or in the presence of a nonprofessional such as a parent.
  • the therapist 11 may be, for example, a speech therapist or a teacher; the student 12 may be a user who is learning a second language or a school attendee who is being tested for, or who is already known to have, an articulation problem or phonological disorder.
  • the method comprises the steps of providing access to an electronic database that includes a plurality of records (block 101).
  • Each record comprises a word, a picture representative of the word, and a recommended pronunciation of the word.
  • the record may also include a digitized video clip to represent motion or a verb to impart a concept of action.
  • the record may further include a digitized sound that is associated with the word.
  • the record for the word dog might contain a picture of a dog, a video clip of a dog running, and/or a barking sound. It is believed that such multiple stimuli appeal to a multiplicity of cognitive areas, thereby optimizing the student's improvement.
  • Each record may further contain data useful for performing sorting functions, such as at least one category and/or concept.
  • An exemplary set of categories comprises: animals, art, babies, celebrations, global Images, environment, family, food, garden, health and exercise, home, leisure, medical, money, music, pets, play, school, shopping, signs/symbols, sports, technical, vacations, and work.
  • An exemplary set of concepts comprises: activities, objects, places, people, ideas, and events.
  • the record also typically comprises a vocabulary level associated with the word and a length of the word.
  • the method next comprises the step of inputting or accessing previously input demographic information for the student (block 102). Then a problem speech sound that is desired to be improved upon is selected that is known from a prior diagnosis (block 103).
  • the problem speech sound may be selected from a group consisting of a phoneme and a "feature.”
  • the feature comprises at least one of a place, a manner, and a voicing characteristic. Searching on a feature yields matches in all positions of words.
  • the database is electronically searched (block 106) for records containing words that include the problem speech sound to generate a set of records.
  • a filter may be applied if desired (block 104) to further limit the set (block 105), including selecting a category or concept, using the demographic information to limit the set, such as eliminating words that are intended for students over 7 years of age for a 5-year-old student, setting a desired vocabulary level , or selecting a word length.
  • the set of records may also be sorted (block 108) in various ways to produce a desired sequence, including, but not limited to, putting the words in alphabetical order, random order, or some other chosen sequence.
  • all the words in the database contain at least one of the letters "r,” "I,” and “s,” since these are known to present a problem most frequently.
  • a decision may be made whether to present the set of records or store/transmit them (block 109). If the former, the set of records is next presented sequentially to the student in the predetermined sequence on a display device (block 111), and the student is prompted to pronounce the word (block 112).
  • the display style may be selected (block 110) from a word only, a picture only, or a word plus a picture.
  • the student can read, he or she can use the displayed word to form a pronunciation; if the student cannot yet read, or cannot read the currently presented language, the picture will also aid in acquisition of reading skills as well as pronunciation.
  • the therapist scores the student's pronunciation (block 113) by inputting, for example, "correct,” “incorrect,” “skip,” or "re- present,”which will record an indication to re-present the record at a later time, such as after all the other items in the set have been presented.
  • the student or therapist can also elect (block 114) to hear the word pronounced (block 115) in a recommended manner by making an appropriate selection on an input device.
  • the scores are received by the system, and an aggregate score is calculated (block 116) for the problem speech sound.
  • the database also comprises a historical record of all sessions for each of the students, and the database is then accessed to store the current score thereinto (block 117).
  • the therapist may choose to calculate a historical change (block 118) from previously saved scores to provide an indication of the student's progress.
  • Such scores may also be used to calculate statistics (block 119) for a group of students, using, for example, a demographic filter.
  • the "personal version" of the system and method does not accept scoring, nor is there a database from which sets of records may be created.
  • the professional version is adapted to download a selected set of records onto a storage medium, such as a diskette, or to transmit the set of records to a remote site (block 109).
  • a remote site may comprise, but is not intended to be limited to, a room remote from the main processor accessible via intranet, or a different building accessible via internet.
  • This version then enables the student to perform (block 120) the steps in blocks 110-112 and 115 as desired on his or her own.
  • the system 10 comprises a processor 14, on which are resident the software package 15 of the present invention adapted to perform the functions as outlined above and a database 16 comprising the plurality of records 17 and demographic and historical data on the users 12.
  • An input device is in communication with the processor 14 that has means for selecting a problem speech sound. Such means may comprise any of the devices known in the art such as a keyboard 18 or pointing device such as a mouse 19 or touch screen.
  • a display device such as a display screen 20 is also in communication with the processor 14.
  • Optional elements that are also in communication with the processor 14 may include a microphone 21 and a speaker 22, both under processor 14 control, as well as means for performing analog-to-digital 23 and digital-to-analog 24 conversions.
  • the system 10 also has means for transferring records from the database to a storage medium such as a disk drive 25, under control of the software 15, or to a remote site such as another location 26 via a modem 27 over the internet 28 or such as another room 29 at the same location via an intranet 30.
  • a printer 31 under processor control may also be provided for furnishing a hard copy of any portion of the session as desired.
  • a secondary system 40 for use of the personal version of the invention at the remote location 26,29 comprises a processor 41 , input device 42 and display device 43 in communication with the processor 41 , and either or both of a modem 44 for receiving a set of records and a storage device reader 45 for reading a stored set of records.
  • the software package 46 for this version is adapted to read the records, present them to the student 12 sequentially, and prompt the student 12 to pronounce the word associated with the record.
  • FIGS. 3A,3B A flow chart of an exemplary embodiment of the automated speech therapy/intervention method is given in FIGS. 3A,3B.
  • the schematic of the system is substantially the same as that in FIG. 2.
  • the method comprises the steps of selecting the type of evaluation desired to be performed (block 501): screening, single word analysis, "deep" test, or connected speech analysis.
  • the screening, or pre-evaluation comprises the steps of presenting to a user a symbol representative of a word (block 502) and prompting the user to pronounce the word represented by the symbol into a microphone in signal communication with a processor (block 503).
  • the symbol presentation may comprise, for example, a picture on a display screen, although this is not intended as a limitation.
  • the therapist then enters a phonetic representation of the user pronunciation into the processor (block 504).
  • the therapist enters the phonetic representation of the user pronunciation into a separate operator input and storage device 47, such as, but not intended to be limited to, a personal data assistant (block 520).
  • a personal data assistant such as, but not intended to be limited to, a personal data assistant (block 520).
  • the user pronunciation data are downloaded into the processor (block 521) to complete the steps of the method.
  • FIG.5 A schematic of the system (FIG.5) illustrates the addition of the operator input and storage device 47, which is connectable to the system 10 when desired for downloading data into the processor 14 that has been entered thereinto by the therapist 11.
  • the advantages of this embodiment include the user and the operator being able to use separate pieces of hardware, thereby eliminating physical restraints imposed by attempting to share equipment. Further, during the session the user cannot view the operator's scoring information, which may inhibit the user. In addition, the operator's hardware may retain data for downloading into more than one processor if desired for subsequent collection and analysis.
  • the software installed upon the processor then automatically determines whether an error exists in the user pronunciation (block 506).
  • the determination may additionally include the application of a dialectical filter
  • block 505 that is adapted to discriminate between that which is deemed to be a true error and a predetermined normal dialect word pronunciation. If an error exists, the software automatically categorizes the error (block 507).
  • An error may be, for example, a substitution, a mispronunciation, or an omission. These steps are repeated a predetermined number of times n, for example, 20 times (block 510).
  • the software automatically generates a set of symbols, wherein each symbol is representative of a word containing at least one of the errors determined in the pre-evaluation. Then the steps as above are performed using the generated set of symbols, and an evaluation is made of articulation errors for the whole set.
  • the steps in blocks 502-509 are performed once for the desired word.
  • the therapist may decide to display a frequency spectrum of the user's pronunciation (block 508). If desired, a sample of a correct pronunciation of the word may be broadcast via a speaker in signal communication with the processor (block 509).
  • the evaluating step also comprises automatically recognizing an underlying commonality by correlating the errors detected. This pattern recognition permits the software to achieve an overarching diagnosis of a problem speech sound (block 511).
  • a report can be issued detailing the user's error(s) (block 512). Additionally, the error may be saved in a database that is accessible by the processor (block 513). If a previous entry for this user already exists, which is determined by a search, the error found in the present test may be compared with an error previously found, and a change overtime determined for that user (block 514), to note whether an improvement has occurred. Again, if desired, a report may be issued (block 515) as to the change determined.
  • An additional feature of this invention is the ability, once a categorization has been made of an error, of recommending a therapeutic program to address the error
  • Such a recommendation formulation may comprise, for example, creating a set of records as detailed above in FIGS. 1A-2.
  • the "symbol" comprises a motion picture representative of an action, and the user is prompted to provide a narration on the action into a microphone in signal communication with a processor.
  • the therapist then enters a phonetic representation of the user's pronunciation of the narration into the processor.
  • Software resident in the processor automatically determines whether an error exists in the user pronunciation, and, if an error exists, automatically categorizes the error.
  • Another aspect of the present invention relates to a system and method for transcribing student-produced speech by a therapist (FIGS.6A-15), foranalyzing the transcribed speech, and for producing a report and recommendations based upon the analysis. The steps of the method are illustrated in flow-chart form in FIGS.
  • the system of the invention is substantially as illustrated schematically in FIG.2 within the "professional site" 32.
  • the method of the present invention includes the steps of entering student and therapist information (block 601), such as demographic information.
  • the therapist 11 is then permitted to choose (block 602) between administering a "phonemic profile” (block 603) or a "connected speech sample” (block 604), and also whether or not to record the student's production. In either case, the therapist 11 may select between basic English International Phonetic Alphabet (IPA) or full IPA.
  • IPA International Phonetic Alphabet
  • a stimulus is presented to the student (block 605), such as by displaying a picture on the screen 20 to elicit a particular sound, which may comprise one or more phonemes. For example, a picture of a cat would elicit the student to say "cat.”
  • the correct target word e.g., "cat”
  • predicted incorrect productions e.g., "tat”
  • the therapist 11 is then permitted, if a match occurs (block 607), to select from among the displayed options based upon the student's production (block 608) or to enter the student's production in IPA format (block 609).
  • the selection of block 608 is made, for example, by a "point and click" method using the mouse 19 on a screen such as FIG. 7; the production entering of block 609 may also be made by a "point and click” method using the mouse 19 on a transcribing screen such as in FIG. 8.
  • the software package 15 performs an automatic analysis forthe student (block 611 ), displays the results of the analysis on the screen 20 or prints the analysis results on the printer 31 (block 612), applies a filter such as an age and/or a dialect filter (block 613), and displays the results of the analysis with applied filter(s) on the screen 20 or prints the analysis results on the printer 31 if one or more filters were applied (block 614).
  • FIG. 10 An example of available student production report selections is shown in FIG. 10. Additional report selections include descriptions of student productions; word length, stress pattern, and word shape inventories; and consonant and vowel inventories.
  • the therapist 11 can proceed to an individualized phonological evaluation (IPE).
  • IPE individualized phonological evaluation
  • the stimuli for this evaluation are determined based upon the results of the phoneme profile, and there are four levels of evaluation possible, as will be reflected in the treatment reports to be discussed in the following. For example, if the student pronounced “tat” for "cat,” words such as “can,” “call,” “cad,” or “cast” may be selected for presentation to the student 12.
  • stimuli, transcription, and analyses are performed analogous to blocks 605-611
  • FIGS. 11A-11E, 12A-12E, 13A-13B, and 14 Exemplary treatment suggestion reports for four levels of IPEs are shown in FIGS. 11A-11E, 12A-12E, 13A-13B, and 14.
  • a stimulus is presented to the student 12 (block 619), such as a video clip on the screen 20 or other external stimulus.
  • the therapist 11 determines an intended target sentence (block 620) as the student's production is made.
  • the therapist 11 enters the target production on the keyboard 18 in orthographic format (block 621; FIG. 15), and the system 15 converts it into IPA format (block 622).
  • the student's production is defaulted to be the target production (block 623), and the therapist 11 edits the production fields in order to convert it into the actual student production (block 624).
  • the production is analyzed (block 611), with a comparison being made between the target and actual productions. Reports are then displayed (block 612) on the student's production and the comparison.
  • the remaining blocks are substantially the same as with the phonemic profile.

Abstract

A speech therapy method includes selecting a problem speech sound and searching a database that houses records, each containing a picture and an associated word. A set of records is generated that contain a word specific to the problem speech's sound. At least a portion of each record is presented to a user sequentially for pronunciation, which is scored. A speech problem is analyzed by presenting a symbol representative of a word and prompting the user to pronounce the word represented by the symbol into a microphone in signal communication with a processor. Next the therapist enters a phonetic representation of the user pronunciation into the processor. Alternatively, the therapist can enter a phonetic representation of the user pronunciation into an operator input and storage device, the phonetic representation subsequently downloadable into the processor. It is then automatically determined whether an error exists in the pronunciation. If an error exists, the error is automatically categorized. A transcription method uses a computerized process to prompt a student to produce at least one phoneme orally. Next a correct and at least one incorrect production of the phoneme are displayed. The therapist selects from among the displayed productions based upon the student-produced phoneme. The therapist then uses an input device in signal communication with the processor to select from among the displayed correct and incorrect productions based upon the student-produced phoneme, thus obviating the need for the therapist to enter the incorrect production symbol by symbol.

Description

SPEECH TRANSCRIPTION, THERAPY, AND ANALYSIS SYSTEM AND METHOD
BACKGROUND OF THE INVENTION Field of the Invention
The present invention relates to systems and methods for analyzing and remediating speech pathologies, and, more particularly, to such systems and methods that are computer-based.
Description of Related Art
Articulation and phonology disorders are the most common of the speech and language disorders. The prevalence of this disorder is, at the time of writing, approximately 10% of the school-age population. In addressing a perceived articulation issue in a student, speech/language pathologists have in the past used an initial test based upon a series of cards. Each card contains a picture and a word, and the student is asked to pronounce the word associated with the card. The pathologist then determines whether the student's pronunciation is "right" or "wrong." It may be recognized that such a system can be cumbersome, owing to the cards' having to be placed in a desired order and sorted manually. An intervention system designed to automate this process, Picture Gallery I, was presented by the owner of the current application. In this system pictures and/or words stored in a database could be sorted using a desired criterion such as a particular phoneme and presented to the student under software control for facilitating the acquisition or remediation of speech or language skills. No analysis or scoring is performed; rather, the product is intended for use by one or more students, either alone or in concert with a pathologist/teacher.
A previously known method of diagnosing articulation or phonology disorders included a "pencil and paper" test wherein a student is asked to speak a word. The therapist grades the word subjectively, based upon the therapist's ear and the local standards.
Other systems known in the art that address speech/language analysis and therapy methodologies includes those of Neuhaus (U.S. Pat. No. 6,113,393), Parry et al. (U.S. Pat. No. 6,077,085), UCSF and Rutgers (U.S. Pat. Nos. 5,813,862 and 6,071 ,123), Neumeyeret al. (U.S. Pat. No.6,055,498), Jenkins et al. (U.S. Pat. Nos. 5,927,988 and 6,019,607), Siegel (U.S. Pat. No. 6,009,397), Beard et al. (U.S. Pat. No. 5,857,173), Aaron et al. (U.S. Pat. No. 5,832,441 ),Russell et al. (U.S. Pat. Nos. 5,679,001 and 5,791 ,904), Rothenberg (U.S. Pat. No. 5,717,828), Wen (U.S. Pat. No. 5,562,453), Ezawa et al. (U.S. Pat. No. 4,969,194), Sturner et al. (U.S. Pat. No. 5,303,327), Shpiro (U.S. Pat. No. 5,766,015), and Siegel (U.S. Pat. No. 6,148,286).
Commercial software products in the field of articulation, phonology, or speech sound production include SpeechViewer, Interactive System for Phonological Analysis, Speech Master, Visi-pitch, and Computerized Profiling. Commercial print products include the Goldman-Fristoe Test of Articulation (American Guidance Service), Khan-Lewis Test of Phonology (American Guidance Service), Photo
Articulation Test (Pro-Ed), and Fisher-Logeman Test of Articulation (Pro-Ed).
SUMMARY OF THE INVENTION
It is therefore an object of the present invention to provide a system and method for eliciting a desired sound from a user.
It is a further object to provide such a system and method adapted to generate a report.
It is another object to provide a system and method for testing a user's articulation. It is an additional object to provide such a system and method that is adapted to analyze a group of problematic sounds.
It is also an object to provide such a system and method that recommends a therapeutic program responsive to the analysis.
It is yet a further object to provide such a system and method that includes a prescreening feature.
It is yet another object to provide a system and method for facilitating a therapist to transcribe speech of a student/client.
These and other objects are achieved by the present invention, a first aspect of which comprises a method and system for providing speech therapy. The method comprises the steps of selecting a problem speech sound and searching a database that comprises a plurality of records. Each record comprises a picture and a word associated with the word. Next a set of records is automatically generated from the plurality of records.
Each record contains a word specific to the problem speech's sound. The set of records is next automatically presented to a user sequentially on a display device, and the user is prompted to pronounce the displayed word. Finally, the pronunciation of each word is scored.
The system of the first aspect of the present invention comprises a processor, an input device in communication with the processor having means for selecting a problem speech sound, and a display device in communication with the processor. The database as described above is resident on the processor, as are software means. The software is adapted to automatically generate a set of records from the plurality of records, with each record containing a word specific to the problem speech sound. The software is also adapted to automatically present at least a portion of each record in the set of records to a user sequentially on a display device; the set of records to a user sequentially on the display device and to prompt the user to pronounce the displayed word. Finally, the software is adapted to receive via the input device a score for the pronunciation of each word.
Another aspect of the present invention is a system and method for analyzing a speech problem by performing a test of articulation, phonology, and sound features that is administered and analyzed with the use of an electronic processor. This method comprises the steps of presenting to a student/user a symbol representative of a word and prompting the user to pronounce the word represented by the symbol into a microphone in signal communication with a processor. Next the therapist enters a phonetic representation of the user pronunciation into the processor. It is then automatically determined whether an error exists in the user pronunciation. If an error exists, the error is automatically categorized.
In an alternate embodiment of the method, the therapist enters the phonetic representation of the user pronunciation into an input and storage device that is not in signal communication with the processor. At a later time the phonetic representation is downloaded into the processor, whereupon the automatic determining and categorizing steps proceed. The system of the second aspect of the invention evaluates an articulation disorder. The system comprises a processor and an output device and an input device, each in signal communication with the processor.
Software installable on the processor is adapted to present on the output device, typically a display device, although this is not intended as a limitation, a symbol representative of a word. The software then is adapted to prompt a user via the output device to pronounce the word represented by the symbol and to receive from the therapist via the input device a phonetic representation of the user's pronunciation. The software automatically determines whether an error exists in the user pronunciation, and, if an error exists, automatically categorizes the error.
In the alternate embodiment the system comprises a processor and an output device and a user input device, each in signal communication with the processor. The system further comprises an operator input and storage device that is not in signal communication with the processor, but is connectable thereto for downloading operator-entered data thereinto, the data comprising the phonetic representation.
The software then receives downloaded data from the operator input and storage device the phonetic representation of the user's pronunciation. The software automatically determines whether an error exists in the user pronunciation, and, if an error exists, automatically categorizes the error. The system and method of this second feature of the invention may be adapted for presentation of a single word, a plurality of words having a predetermined feature desired to be tested, a pretest for screening for potential articulation disorders, and an analysis of connected speech with the use of a moving picture to elicit a narrative from the student. An additional aspect of the present invention is directed to the transcription of a student's speech by the therapist using a computerized process. This method comprises the steps of prompting the student to produce at least one phoneme orally. Next a correct production of the at least one phoneme is displayed to the therapist; as well as at least one incorrect production of the at least one phoneme. The therapist is then permitted to select from among the displayed correct and incorrect productions based upon the student-produced at least one phoneme. The system related to this aspect of the invention comprises a processor and display means in signal communication with the processor. The display means are for prompting a student to produce at least one phoneme orally, displaying a correct production of the at least one phoneme to a therapist, and displaying at least one incorrect production of the at least one phoneme to the therapist. The therapist then uses input means in signal communication with the processor to select from among the displayed correct and incorrect productions based upon the student-produced at least one phoneme, thus obviating the need for the therapist to enter the incorrect production symbol by symbol, unless it is desired to do so, or unless the actual production is not found among the displayed production selections.
BRIEF DESCRIPTION OF THE DRAWINGS FIGS. 1A,1B is a flow chart for an exemplary embodiment of the speech therapy method of the invention.
FIG. 2 is a schematic diagram of the speech therapy and analysis system. FIGS. 3A,3B is a flow chart for an exemplary embodiment of the speech analysis method of the invention.
FIG. 4 is a section of a flow chart for another embodiment of the speech analysis method of the invention.
FIG. 5 is a schematic diagram of an alternate embodiment of the speech analysis system.
FIGS. 6A,6B is a flow chart for an additional embodiment of the speech analysis method of the invention.
FIG. 7 is an exemplary phonemic profile or individualized phonological evaluation screen. FIG. 8 is an exemplary basic IPA production transcription screen.
FIG. 9 is an exemplary parent letter report.
FIG. 10 is an exemplary student production report option selection screen. FIGS. 11A-11E is an exemplary level 1 treatment suggestion report. FIGS. 12A-12E is an exemplary level 2 treatment suggestion report. FIGS. 13A,13B is an exemplary level 3 treatment suggestion report.
FIGS. 14 is an exemplary level 4 treatment suggestion report. FIG. 15 is an exemplary connected speech sample transcription screen.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
A description of the preferred embodiments of the present invention will now be presented with reference to FIGS. 1A-15.
A flow chart of an exemplary embodiment of the automated speech therapy/intervention method is given in FIGS. 1 A,1 B, and a schematic of the system in FIG. 2. The system and method are also contemplated for use in the acquisition of a language skill as well as in a remediation setting. There are two versions of the system and method: In the "professional" version 10 of the invention (block 100), typically two people who will be referred to as "therapist" 11 and "student" 12 are present, although this is not intended as a limitation. This version is contemplated for use in such settings 32 as a hospital, clinic, rehabilitation center, school, or private facility. In the "personal" version 40 of the invention, the "student" 12 may be working alone, or in the presence of a nonprofessional such as a parent. The therapist 11 may be, for example, a speech therapist or a teacher; the student 12 may be a user who is learning a second language or a school attendee who is being tested for, or who is already known to have, an articulation problem or phonological disorder.
The method comprises the steps of providing access to an electronic database that includes a plurality of records (block 101). Each record comprises a word, a picture representative of the word, and a recommended pronunciation of the word. In an alternate embodiment, the record may also include a digitized video clip to represent motion or a verb to impart a concept of action. In another embodiment the record may further include a digitized sound that is associated with the word. For example, the record for the word dog might contain a picture of a dog, a video clip of a dog running, and/or a barking sound. It is believed that such multiple stimuli appeal to a multiplicity of cognitive areas, thereby optimizing the student's improvement.
Each record may further contain data useful for performing sorting functions, such as at least one category and/or concept. An exemplary set of categories comprises: animals, art, babies, celebrations, global Images, environment, family, food, garden, health and exercise, home, leisure, medical, money, music, pets, play, school, shopping, signs/symbols, sports, technical, vacations, and work. An exemplary set of concepts comprises: activities, objects, places, people, ideas, and events. The record also typically comprises a vocabulary level associated with the word and a length of the word.
The method next comprises the step of inputting or accessing previously input demographic information for the student (block 102). Then a problem speech sound that is desired to be improved upon is selected that is known from a prior diagnosis (block 103). The problem speech sound may be selected from a group consisting of a phoneme and a "feature." The feature comprises at least one of a place, a manner, and a voicing characteristic. Searching on a feature yields matches in all positions of words. The database is electronically searched (block 106) for records containing words that include the problem speech sound to generate a set of records. A filter may be applied if desired (block 104) to further limit the set (block 105), including selecting a category or concept, using the demographic information to limit the set, such as eliminating words that are intended for students over 7 years of age for a 5-year-old student, setting a desired vocabulary level , or selecting a word length.
If desired (block 107), the set of records may also be sorted (block 108) in various ways to produce a desired sequence, including, but not limited to, putting the words in alphabetical order, random order, or some other chosen sequence. In a preferred embodiment, all the words in the database contain at least one of the letters "r," "I," and "s," since these are known to present a problem most frequently.
For a professional therapy session, a decision may be made whether to present the set of records or store/transmit them (block 109). If the former, the set of records is next presented sequentially to the student in the predetermined sequence on a display device (block 111), and the student is prompted to pronounce the word (block 112). The display style may be selected (block 110) from a word only, a picture only, or a word plus a picture.
If the student can read, he or she can use the displayed word to form a pronunciation; if the student cannot yet read, or cannot read the currently presented language, the picture will also aid in acquisition of reading skills as well as pronunciation. In the professional setting, the therapist scores the student's pronunciation (block 113) by inputting, for example, "correct," "incorrect," "skip," or "re- present,"which will record an indication to re-present the record at a later time, such as after all the other items in the set have been presented. The student or therapist can also elect (block 114) to hear the word pronounced (block 115) in a recommended manner by making an appropriate selection on an input device.
The scores are received by the system, and an aggregate score is calculated (block 116) for the problem speech sound. The database also comprises a historical record of all sessions for each of the students, and the database is then accessed to store the current score thereinto (block 117). The therapist may choose to calculate a historical change (block 118) from previously saved scores to provide an indication of the student's progress. Such scores may also be used to calculate statistics (block 119) for a group of students, using, for example, a demographic filter. The "personal version" of the system and method does not accept scoring, nor is there a database from which sets of records may be created. Rather, the professional version is adapted to download a selected set of records onto a storage medium, such as a diskette, or to transmit the set of records to a remote site (block 109). Such a remote site may comprise, but is not intended to be limited to, a room remote from the main processor accessible via intranet, or a different building accessible via internet. This version then enables the student to perform (block 120) the steps in blocks 110-112 and 115 as desired on his or her own.
The system 10, as schematically illustrated in FIG. 2, comprises a processor 14, on which are resident the software package 15 of the present invention adapted to perform the functions as outlined above and a database 16 comprising the plurality of records 17 and demographic and historical data on the users 12. An input device is in communication with the processor 14 that has means for selecting a problem speech sound. Such means may comprise any of the devices known in the art such as a keyboard 18 or pointing device such as a mouse 19 or touch screen. A display device such as a display screen 20 is also in communication with the processor 14. Optional elements that are also in communication with the processor 14 may include a microphone 21 and a speaker 22, both under processor 14 control, as well as means for performing analog-to-digital 23 and digital-to-analog 24 conversions. The system 10 also has means for transferring records from the database to a storage medium such as a disk drive 25, under control of the software 15, or to a remote site such as another location 26 via a modem 27 over the internet 28 or such as another room 29 at the same location via an intranet 30. A printer 31 under processor control may also be provided for furnishing a hard copy of any portion of the session as desired. A secondary system 40 for use of the personal version of the invention at the remote location 26,29 comprises a processor 41 , input device 42 and display device 43 in communication with the processor 41 , and either or both of a modem 44 for receiving a set of records and a storage device reader 45 for reading a stored set of records. The software package 46 for this version is adapted to read the records, present them to the student 12 sequentially, and prompt the student 12 to pronounce the word associated with the record.
A flow chart of an exemplary embodiment of the automated speech therapy/intervention method is given in FIGS. 3A,3B. The schematic of the system is substantially the same as that in FIG. 2. The method comprises the steps of selecting the type of evaluation desired to be performed (block 501): screening, single word analysis, "deep" test, or connected speech analysis. The screening, or pre-evaluation, comprises the steps of presenting to a user a symbol representative of a word (block 502) and prompting the user to pronounce the word represented by the symbol into a microphone in signal communication with a processor (block 503). The symbol presentation may comprise, for example, a picture on a display screen, although this is not intended as a limitation. The therapist then enters a phonetic representation of the user pronunciation into the processor (block 504).
In an alternate embodiment of the method, the altered portion of which is illustrated in FIG. 4, the therapist enters the phonetic representation of the user pronunciation into a separate operator input and storage device 47, such as, but not intended to be limited to, a personal data assistant (block 520). At a later time, the user pronunciation data are downloaded into the processor (block 521) to complete the steps of the method.
A schematic of the system (FIG.5) illustrates the addition of the operator input and storage device 47, which is connectable to the system 10 when desired for downloading data into the processor 14 that has been entered thereinto by the therapist 11.
The advantages of this embodiment include the user and the operator being able to use separate pieces of hardware, thereby eliminating physical restraints imposed by attempting to share equipment. Further, during the session the user cannot view the operator's scoring information, which may inhibit the user. In addition, the operator's hardware may retain data for downloading into more than one processor if desired for subsequent collection and analysis.
In both embodiments, the software installed upon the processor then automatically determines whether an error exists in the user pronunciation (block 506). The determination may additionally include the application of a dialectical filter
(block 505) that is adapted to discriminate between that which is deemed to be a true error and a predetermined normal dialect word pronunciation. If an error exists, the software automatically categorizes the error (block 507). An error may be, for example, a substitution, a mispronunciation, or an omission. These steps are repeated a predetermined number of times n, for example, 20 times (block 510).
It may then be desired to perform the "deep test," which may be performed with the knowledge gained from a pre-evaluation as above or de novo. If the pre- evaluation has been performed, the software automatically generates a set of symbols, wherein each symbol is representative of a word containing at least one of the errors determined in the pre-evaluation. Then the steps as above are performed using the generated set of symbols, and an evaluation is made of articulation errors for the whole set.
If a single word is desired to be analyzed for, the steps in blocks 502-509 are performed once for the desired word. Once a word has been pronounced and the phonetic representation entered into the processor, the therapist may decide to display a frequency spectrum of the user's pronunciation (block 508). If desired, a sample of a correct pronunciation of the word may be broadcast via a speaker in signal communication with the processor (block 509).
When a plurality of words have been tested, the evaluating step also comprises automatically recognizing an underlying commonality by correlating the errors detected. This pattern recognition permits the software to achieve an overarching diagnosis of a problem speech sound (block 511).
Following the error categorization, if desired, a report can be issued detailing the user's error(s) (block 512). Additionally, the error may be saved in a database that is accessible by the processor (block 513). If a previous entry for this user already exists, which is determined by a search, the error found in the present test may be compared with an error previously found, and a change overtime determined for that user (block 514), to note whether an improvement has occurred. Again, if desired, a report may be issued (block 515) as to the change determined.
An additional feature of this invention is the ability, once a categorization has been made of an error, of recommending a therapeutic program to address the error
(block 516). Such a recommendation formulation may comprise, for example, creating a set of records as detailed above in FIGS. 1A-2.
If connected speech analysis is desired to be performed, the "symbol" comprises a motion picture representative of an action, and the user is prompted to provide a narration on the action into a microphone in signal communication with a processor. The therapist then enters a phonetic representation of the user's pronunciation of the narration into the processor. Software resident in the processor automatically determines whether an error exists in the user pronunciation, and, if an error exists, automatically categorizes the error. Another aspect of the present invention relates to a system and method for transcribing student-produced speech by a therapist (FIGS.6A-15), foranalyzing the transcribed speech, and for producing a report and recommendations based upon the analysis. The steps of the method are illustrated in flow-chart form in FIGS. 6A.6B, and exemplary screens, letters, and reports in FIGS.7-15. The system of the invention is substantially as illustrated schematically in FIG.2 within the "professional site" 32. The method of the present invention includes the steps of entering student and therapist information (block 601), such as demographic information. The therapist 11 is then permitted to choose (block 602) between administering a "phonemic profile" (block 603) or a "connected speech sample" (block 604), and also whether or not to record the student's production. In either case, the therapist 11 may select between basic English International Phonetic Alphabet (IPA) or full IPA. If the phonemic profile selection is made, a stimulus is presented to the student (block 605), such as by displaying a picture on the screen 20 to elicit a particular sound, which may comprise one or more phonemes. For example, a picture of a cat would elicit the student to say "cat."
For each stimulus, the correct target word (e.g., "cat") and predicted incorrect productions (e.g., "tat") are displayed on the screen 20 to the therapist 11 (block 606) in, for example, IPA format. The therapist 11 is then permitted, if a match occurs (block 607), to select from among the displayed options based upon the student's production (block 608) or to enter the student's production in IPA format (block 609).
The selection of block 608 is made, for example, by a "point and click" method using the mouse 19 on a screen such as FIG. 7; the production entering of block 609 may also be made by a "point and click" method using the mouse 19 on a transcribing screen such as in FIG. 8. Once the phonemic profile is complete (block 610), the software package 15 performs an automatic analysis forthe student (block 611 ), displays the results of the analysis on the screen 20 or prints the analysis results on the printer 31 (block 612), applies a filter such as an age and/or a dialect filter (block 613), and displays the results of the analysis with applied filter(s) on the screen 20 or prints the analysis results on the printer 31 if one or more filters were applied (block 614). Then the analysis is used to prepare a narrative parent letter and/or report that includes problem sounds (FIG. 9) and recommendations for treatment (block 615). An example of available student production report selections is shown in FIG. 10. Additional report selections include descriptions of student productions; word length, stress pattern, and word shape inventories; and consonant and vowel inventories.
If desired (block 616), the therapist 11 can proceed to an individualized phonological evaluation (IPE). The stimuli for this evaluation are determined based upon the results of the phoneme profile, and there are four levels of evaluation possible, as will be reflected in the treatment reports to be discussed in the following. For example, if the student pronounced "tat" for "cat," words such as "can," "call," "cad," or "cast" may be selected for presentation to the student 12. Once again, stimuli, transcription, and analyses are performed analogous to blocks 605-611
(block 617), with the analysis based upon both the phonemic profile and the IPE, and a report, a letter, and treatment recommendations provided analogous to blocks 612- 615 (block 618). Exemplary treatment suggestion reports for four levels of IPEs are shown in FIGS. 11A-11E, 12A-12E, 13A-13B, and 14. If the connected speech option was selected (block 604), a stimulus is presented to the student 12 (block 619), such as a video clip on the screen 20 or other external stimulus. The therapist 11 determines an intended target sentence (block 620) as the student's production is made. The therapist 11 enters the target production on the keyboard 18 in orthographic format (block 621; FIG. 15), and the system 15 converts it into IPA format (block 622). The student's production is defaulted to be the target production (block 623), and the therapist 11 edits the production fields in order to convert it into the actual student production (block 624).
Once the editing is complete, the production is analyzed (block 611), with a comparison being made between the target and actual productions. Reports are then displayed (block 612) on the student's production and the comparison. The remaining blocks are substantially the same as with the phonemic profile.
It may be appreciated by one skilled in the art that additional embodiments may be contemplated, including alternate forms of presentation of the symbols and sounds.

Claims

What is claimed is:
1. A method for providing speech therapy comprising the steps of: selecting a problem speech sound; searching a database comprising a plurality of records, each record comprising a picture and a word associated therewith; automatically generating a set of records from the plurality of records, each record containing a word specific to the problem speech sound; automatically presenting at least a portion of each record in the set of records to a user sequentially on a display device; prompting the user to pronounce a word associated with the displayed record portion; and scoring the pronunciation of each word.
2. The method recited in Claim 1 , wherein the picture comprises a moving picture for imparting a concept of action.
3. The method recited in Claim 1 , wherein the record further comprises a sound associated with the word and the presenting step further comprises broadcasting the sound to the user along with the picture and the word.
4. The method recited in Claim 1 , wherein the record further comprises data on the word, including a word length and a vocabulary level, and the generating step comprises selecting records containing at least one of a desired word length and a desired vocabulary level
5. The method recited in Claim 1 , further comprising the step, prior to the selecting step, of diagnosing an articulation problem of the user to determine a problem speech sound.
6. The method recited in Claim 1 , further comprising the step, following the record presenting step, of permitting the user to hear the word associated with the presented record pronounced in a recommended manner.
7. The method recited in Claim 1 , wherein each record further comprises at least one category to which the record belongs, and further comprising, prior to the generating step, the step of permitting the user to select a category, and wherein the generating step comprises generating a set of records, each record a member of the selected category.
8. The method recited in Claim 1 , wherein each record further comprises at least one concept to which the record belongs, and further comprising, prior to the generating step, the step of permitting the user to select a concept, and wherein the generating step comprises generating a set of records, each record a member of the selected concept.
9. The method recited in Claim 1 , further comprising the steps of: saving the generated set of records on a storage medium; and automatically presenting the set of records to a user sequentially on a second display device; and prompting a user situated adjacent the second display device to pronounce the word displayed on the second display device.
10. The method recited in Claim 1 , further comprising the steps of: transmitting the generated set of records electronically to a remote display device; and prompting a user situated adjacent the remote display device to pronounce the word displayed on the remote display device.
11. The method recited in Claim 1 , wherein the presenting step comprises presenting the set of records in a predetermined sequence.
12. The method recited in Claim 11 , wherein the predetermined sequence is selected from a group consisting of alphabetical order, random order, and a chosen sequence.
13. The method recited in Claim 1 , wherein each word in the records comprises at least one of the letters "r," "I," and "s."
14. The method recited in Claim 1 , further comprising the steps of: receiving each score; and calculating a final aggregate score for the problem speech sound.
15. The method recited in Claim 14, wherein the database further comprises a previously saved score for the user, and further comprising the steps of: saving the aggregate score in the database; and calculating a historical change from the previously saved score to the aggregate score.
16. The method recited in Claim 15, wherein the saving step further comprises storing demographic information on the user.
17. The method recited in Claim 16, further comprising the step of calculating historical data based upon a selected demographic filter.
18. The method recited in Claim 1 , wherein the problem speech sound is selected from a group consisting of a phoneme and a feature, the feature comprising at least one of a place, a manner, and a voicing characteristic.
19. The method recited in Claim 1 , further comprising the step, prior to the presenting step, of selecting a portion of each record to be presented, the portion selected from a group consisting of a word, a picture, and a word and a picture.
20. A system for providing speech therapy comprising: a processor; an input device in communication with the processor having means for selecting a problem speech sound; a display device in communication with the processor; a database resident on the processor comprising a plurality of records, each record comprising a picture and a word associated therewith; and software means resident on the processor adapted to: automatically generate a set of records from the plurality of records, each record containing a word specific to the problem speech sound; automatically present the set of records to a user sequentially on the display device; prompt the user to pronounce the displayed word; and receive via the input device a score for the pronunciation of each word.
21. The system recited in Claim 20, wherein the picture comprises a moving picture for imparting a concept of action.
22. The system recited in Claim 20, further comprising an audio speaker in communication with the processor, and wherein: the record further comprises digitized data representative of a sound associated with the word; and the software means is further adapted to direct the speaker to broadcast the sound to the user along with the picture and the word.
23. The system recited in Claim 20, further comprising a microphone in communication with the processor, and wherein the software means is further adapted to receive input from the microphone and diagnose an articulation problem of the user based upon a pronounced word to determine a problem speech sound.
24. The system recited in Claim 20, wherein: the record further comprises digitized data representative of a recommended pronunciation of the word associated with the presented record; and the software means is further adapted to broadcast via the speaker the recommended pronunciation of the word to the user.
25. The system recited in Claim 20, wherein: each record further comprises at least one category to which the record belongs; and the software means is further adapted to permit the user to select a category via the input device and to generate a set of records wherein each record is a member of the selected category.
26. The system recited in Claim 20, wherein: each record further comprises at least one concept to which the record belongs; and the software means is further adapted to permit the user to select via the input device a concept and to generate a set of records wherein each record is a member of the selected concept.
27. The system recited in Claim 20, further comprising a storage medium removably affixable in communication with the processor, and wherein the software means is further adapted to store the set of records on the storage medium for subsequent presentation to the user on a second display device.
28. The system recited in Claim 20, wherein the software means is further adapted to present the set of records in a predetermined sequence.
29. The system recited in Claim 28, wherein the predetermined sequence is selected from a group consisting of alphabetical order, random order, and a chosen sequence.
30. The system recited in Claim 20, wherein each word in the records comprises at least one of the letters "r," "I," and "s."
31. The system recited in Claim 20, wherein the software means is further adapted to: receive each score via the input device; and calculate a final aggregate score for the problem speech sound.
32. The system recited in Claim 31 , wherein: the database further comprises a previously saved score for the user; and the software means is further adapted to save the aggregate score in the database, calculate a historical change from the previously saved score to the aggregate score, and display the historical change on the display device.
33. The system recited in Claim 32, wherein the software means is further adapted to store demographic information on the user entered via the input device.
34. The system recited in Claim 33, wherein the software means is further adapted to calculate historical data based upon a selected demographic filter.
35. A method for evaluating an articulation disorder comprising the steps of: presenting to a user a symbol representative of a word; prompting the user to pronounce the word represented by the symbol into a microphone in signal communication with a processor; entering a phonetic representation of the user pronunciation into the processor; automatically determining whether an error exists in the user pronunciation; and if an error exists, automatically categorizing the error.
36. The method recited in Claim 35, further comprising the step, following the prompting step, of displaying a frequency spectrum of the user pronunciation.
37. The method recited in Claim 35, further comprising the step, following the prompting step, of broadcasting a sample of a correct pronunciation of the word.
38. The method recited in Claim 35, further comprising the step of issuing a report on an error in user pronunciation.
39. The method recited in Claim 35, further comprising the steps of: saving the error in a database accessible by the processor; searching the database to determine whether a previous entry for the user exists; and if a previous entry exists, comparing the error with an error in the previous entry and determining a change with time.
40. The method recited in Claim 39, further comprising the step of issuing a report on the determined change.
41. The method recited in Claim 35, further comprising the step, if an error exists, of recommending a therapeutic program to address the error.
42. The method recited in Claim 41 , wherein the program recommending step comprises the steps of: searching a database comprising a plurality of records, each record comprising a picture and a word associated therewith; and automatically generating a set of records from the plurality of records, each record containing a word containing a problem speech sound representative of the error, the set of records for subsequent display and pronunciation by the user.
43. The method recited in Claim 35, wherein the presenting step comprises displaying a picture on a display screen.
44. The method recited in Claim 35, wherein the error is selected from a group consisting of a substitution, a mispronunciation, and an omission.
45. The method recited in Claim 35, wherein the determining step comprises applying a dialectical filter adapted to discriminate between an error and a predetermined normal dialect word pronunciation.
46. A method for evaluating an articulation disorder comprising the steps of: performing a pre-evaluation comprising the steps of:
(a) presenting to a user a symbol representative of a word; (b) prompting the user to pronounce the word represented by the symbol into a microphone in signal communication with a processor;
(c) entering a phonetic representation of the user pronunciation into the processor;
(d) automatically determining whether an error exists in the user pronunciation; and
(e) if an error exists, automatically categorizing the error; repeating steps (a)-(e) a predetermined number of times; automatically generating a set of symbols, each symbol representative of a word containing at least one of the errors determined in the pre-evaluation; and performing an evaluation comprising performing steps (a)-(e) using the generated set of symbols.
47. The method recited in Claim 46, further comprising automatically generating a report summarizing the errors detected in the evaluation performing step.
48. The method recited in Claim 46, wherein the evaluation performing step comprises automatically recognizing an underlying commonality in the errors to achieve a diagnosis of a problem speech sound.
49. The method recited in Claim 48, further comprising the step of recommending a therapeutic program to address the diagnosed problem speech sound.
50. A method for evaluating an articulation disorder comprising the steps of:
(a) presenting to a user a symbol representative of a word;
(b) prompting the user to pronounce the word represented by the symbol into a microphone in signal communication with a processor;
(c) entering a phonetic representation of the user pronunciation into the processor;
(d) automatically determining whether an error exists in the user pronunciation;
(e) if an error exists, automatically categorizing the error; repeating steps (a)-(e) a predetermined number of times; and correlating the categorized errors to determine an existence of an articulation disorder.
51. A method for evaluating an articulation disorder comprising the steps of: presenting to a user a motion picture representative of an action; prompting the user to provide a narration on the action into a microphone in signal communication with a processor; entering a phonetic representation of the user pronunciation of the narration into the processor; automatically determining whether an error exists in the user pronunciation; and if an error exists, automatically categorizing the error.
52. A system for evaluating an articulation disorder comprising: a processor: an output device and an input device, each in signal communication with the processor; software means installable on the processor adapted to: present on the output device a symbol representative of a word; prompt a user via the output device to pronounce the word represented by the symbol; receive via the input device a phonetic representation of the user pronunciation; automatically determine whether an error exists in the user pronunciation; and if an error exists, automatically categorize the error.
53. The system recited in Claim 52, wherein the display device comprises at least one of a printer and a display screen and the input device comprises at least one of a keyboard, a pointing device, and a microphone.
54. The system recited in Claim 52, wherein the software means is further adapted to display on the display device a frequency spectrum of the user pronunciation.
55. The system recited in Claim 52, further comprising broadcasting means in signal communication with the processor and wherein the software means is further adapted to direct a sample of a correct pronunciation of the word to be broadcast via the broadcast means.
56. The system recited in Claim 55,wherein the broadcasting means comprises an audio speaker.
57. The system recited in Claim 52, wherein the software means is further adapted to issue a report on an error in user pronunciation via the display device.
58. The system recited in Claim 52, wherein the software means is further adapted, if an error exists, to recommend a therapeutic program to address the error.
59. The system recited in Claim 58, further comprising a database resident on the processor comprising a plurality of records, each record comprising a picture and a word associated therewith; and wherein the software means is further adapted to automatically generate a set of records from the plurality of records, each record containing a word containing a problem speech sound representative of the error, the set of records for subsequent display and pronunciation by the user.
60. The system recited in Claim 52, wherein the symbol comprises a picture and the output device comprises a display screen.
61. The system recited in Claim 52, wherein the output device comprises a display screen, and wherein the software is adapted to direct a presentation of a motion picture representative of an action on the display screen.
62. A method for evaluating an articulation disorder comprising the steps of: presenting to a user a symbol representative of a word; prompting the user to pronounce the word represented by the symbol into a microphone in signal communication with a processor; entering a phonetic representation of the user pronunciation into an input and storage device not in signal communication with the processor; transferring the phonetic representation from the input and storage device to the processor; automatically determining whether an error exists in the user pronunciation; and if an error exists, automatically categorizing the error.
63. The method recited in Claim 62, further comprising the step, following the prompting step, of displaying a frequency spectrum of the user pronunciation.
64. The method recited in Claim 62, further comprising the step, following the prompting step, of broadcasting a sample of a correct pronunciation of the word.
65. The method recited in Claim 62, further comprising the step of issuing a report on an error in user pronunciation.
66. The method recited in Claim 62, further comprising the steps of: saving the error in a database accessible by the processor; searching the database to determine whether a previous entry for the user exists; and if a previous entry exists, comparing the error with an error in the previous entry and determining a change with time.
67. The method recited in Claim 66, further comprising the step of issuing a report on the determined change.
68. The method recited in Claim 62, further comprising the step, if an error exists, of recommending a therapeutic program to address the error.
69. The method recited in Claim 68, wherein the program recommending step comprises the steps of: searching a database comprising a plurality of records, each record comprising a picture and a word associated therewith; and automatically generating a set of records from the plurality of records, each record containing a word containing a problem speech sound representative of the error, the set of records for subsequent display and pronunciation by the user.
70. The method recited in Claim 62, wherein the presenting step comprises displaying a picture on a display screen.
71. The method recited in Claim 62, wherein the error is selected from a group consisting of a substitution, a mispronunciation, and an omission.
72. The method recited in Claim 62, wherein the determining step comprises applying a dialectical filter adapted to discriminate between an error and a predetermined normal dialect word pronunciation.
73. A method for evaluating an articulation disorder comprising the steps of: performing a pre-evaluation comprising the steps of:
(a) presenting to a user a symbol representative of a word; (b) prompting the user to pronounce the word represented by the symbol into a microphone in signal communication with a processor;
(c) entering a phonetic representation of the user pronunciation into an input and storage device not in signal communication with the processor; (d) transferring the phonetic representation to the processor;
(e) automatically determining whether an error exists in the user pronunciation; and
(f) if an error exists, automatically categorizing the error; repeating steps (a)-(f) a predetermined number of times; automatically generating a set of symbols, each symbol representative of a word containing at least one of the errors determined in the pre-evaluation; and performing an evaluation comprising performing steps (a)-(f) using the generated set of symbols.
74. The method recited in Claim 73, further comprising automatically generating a report summarizing the errors detected in the evaluation performing step.
75. The method recited in Claim 73, wherein the evaluation performing step comprises automatically recognizing an underlying commonality in the errors to achieve a diagnosis of a problem speech sound.
76. The method recited in Claim 75, further comprising the step of recommending a therapeutic program to address the diagnosed problem speech sound.
77. A method for evaluating an articulation disorder comprising the steps of:
(a) presenting to a user a symbol representative of a word;
(b) prompting the user to pronounce the word represented by the symbol into a microphone in signal communication with a processor;
(c) entering a phonetic representation of the user pronunciation into an input and storage device not in signal communication with the processor;
(d) transferring the phonetic representation from the input and storage device to the processor;
(e) automatically determining whether an error exists in the user pronunciation; (f) if an error exists, automatically categorizing the error; repeating steps (a)-(f) a predetermined number of times; and correlating the categorized errors to determine an existence of an articulation disorder.
78. A method for evaluating an articulation disorder comprising the steps of: presenting to a user a motion picture representative of an action; prompting the user to provide a narration on the action into a microphone in signal communication with a processor; entering a phonetic representation of the user pronunciation of the narration into an input and storage device not in signal communication with the processor; transferring the phonetic representation to the processor; automatically determining whether an error exists in the user pronunciation; and if an error exists, automatically categorizing the error.
79. A system for evaluating an articulation disorder comprising: a processor: an output device and a user input device in signal communication with the processor; an operator input and storage device having means for receiving and storing data and connectable with the processor for downloading data thereinto; software means installable on the processor adapted to: present on the output device a symbol representative of a word; prompt a user via the output device to pronounce the word represented by the symbol into the user input device; receive from the input and storage device a phonetic representation of the user pronunciation entered thereinto by the operator and downloaded into the processor; automatically determine whether an error exists in the user pronunciation; and if an error exists, automatically categorize the error.
80. The system recited in Claim 79, wherein the display device comprises at least one of a printer and a display screen, the user input device comprises a microphone, and the input and storage device comprises at least one of a keyboard and a pointing device.
81. The system recited in Claim 79, wherein the software means is further adapted to display on the display device a frequency spectrum of the user pronunciation.
82. The system recited in Claim 79, further comprising broadcasting means in signal communication with the processor and wherein the software means is further adapted to direct a sample of a correct pronunciation of the word to be broadcast via the broadcast means.
83. The system recited in Claim 82,wherein the broadcasting means comprises an audio speaker.
84. The system recited in Claim 79, wherein the software means is further adapted to issue a report on an error in user pronunciation via the display device.
85. The system recited in Claim 79, wherein the software means is further adapted, if an error exists, to recommend a therapeutic program to address the error.
86. The system recited in Claim 85, further comprising a database resident on the processor comprising a plurality of records, each record comprising a picture and a word associated therewith; and wherein the software means is further adapted to automatically generate a set of records from the plurality of records, each record containing a word containing a problem speech sound representative of the error, the set of records for subsequent display and pronunciation by the user.
87. The system recited in Claim 79, wherein the symbol comprises a picture and the output device comprises a display screen.
88. The system recited in Claim 79, wherein the output device comprises a display screen, and wherein the software is adapted to direct a presentation of a motion picture representative of an action on the display screen.
89. A method for use by a therapist to transcribe speech of a student comprising the steps of: prompting a student to produce at least one phoneme orally; displaying a correct production of the at least one phoneme to a therapist; displaying at least one incorrect production of the at least one phoneme to the therapist; and permitting the therapist to make an electronic selection from among the displayed correct and incorrect productions based upon the student-produced at least one phoneme.
90. The method recited in Claim 89, wherein the prompting step comprises selecting from a stimulus for eliciting a unitary word and a stimulus for eliciting connected speech.
91. The method recited in Claim 89, wherein the incorrect production displaying step comprises displaying a plurality of incorrect productions.
92. The method recited in Claim 91 , further comprising the step of, if the student-produced at least one phoneme does not match one of the displayed correct and incorrect productions, permitting the therapist to enter the student-produced at least one phoneme one character at a time into a processor.
93. The method recited in Claim 92, wherein the therapist is permitted to enter each character in International Phonetic Alphabet form.
94. The method recited in Claim 89, further comprising the step, following the permitting step, of performing an analysis of a student speech production difficulty based upon an incorrect production selection.
95. The method recited in Claim 94, further comprising the step, following the permitting step, of applying a filter to the incorrect production relating to at least one of age and dialect.
96. The method recited in Claim 95, further comprising the step, following the analysis performing step, of collecting a plurality of phonemes from a database based upon the incorrect production selection.
97. A system for transcribing speech of a student comprising: a processor; display means in signal communication with the processor for: prompting a student to produce at least one phoneme orally; displaying a correct production of the at least one phoneme to a therapist; and displaying at least one incorrect production of the at least one phoneme to the therapist; and input means in signal communication with the processor for permitting the therapist to select from among the displayed correct and incorrect productions based upon the student produced at least one phoneme.
98. The system recited in Claim 97, wherein the display means comprises a screen.
99. The system recited in Claim 97, wherein the input means comprises a pointing device.
PCT/US2002/002258 2001-01-25 2002-01-25 Speech transcription, therapy, and analysis system and method WO2002059856A2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2002237945A AU2002237945A1 (en) 2001-01-25 2002-01-25 Speech transcription, therapy, and analysis system and method

Applications Claiming Priority (8)

Application Number Priority Date Filing Date Title
US09/770,093 2001-01-25
US09/769,776 2001-01-25
US09/769,776 US6732076B2 (en) 2001-01-25 2001-01-25 Speech analysis and therapy system and method
US09/770,093 US6711544B2 (en) 2001-01-25 2001-01-25 Speech therapy system and method
US09/999,249 US6714911B2 (en) 2001-01-25 2001-11-15 Speech transcription and analysis system and method
US09/997,204 2001-11-15
US09/999,249 2001-11-15
US09/997,204 US6725198B2 (en) 2001-01-25 2001-11-15 Speech analysis system and method

Publications (2)

Publication Number Publication Date
WO2002059856A2 true WO2002059856A2 (en) 2002-08-01
WO2002059856A3 WO2002059856A3 (en) 2003-06-26

Family

ID=27505718

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2002/002258 WO2002059856A2 (en) 2001-01-25 2002-01-25 Speech transcription, therapy, and analysis system and method

Country Status (1)

Country Link
WO (1) WO2002059856A2 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006109268A1 (en) * 2005-04-13 2006-10-19 Koninklijke Philips Electronics N.V. Automated speech disorder detection method and apparatus
US20190221317A1 (en) * 2018-01-12 2019-07-18 Koninklijke Philips N.V. System and method for providing model-based treatment recommendation via individual-specific machine learning models

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0360909A1 (en) * 1988-09-30 1990-04-04 Siemens Audiologische Technik GmbH Speech practising apparatus
EP0504927A2 (en) * 1991-03-22 1992-09-23 Kabushiki Kaisha Toshiba Speech recognition system and method
US5393236A (en) * 1992-09-25 1995-02-28 Northeastern University Interactive speech pronunciation apparatus and method
US5487671A (en) * 1993-01-21 1996-01-30 Dsp Solutions (International) Computerized system for teaching speech
US5562453A (en) * 1993-02-02 1996-10-08 Wen; Sheree H.-R. Adaptive biofeedback speech tutor toy
US5791904A (en) * 1992-11-04 1998-08-11 The Secretary Of State For Defence In Her Britannic Majesty's Government Of The United Kingdom Of Great Britain And Northern Ireland Speech training aid
WO1999013446A1 (en) * 1997-09-05 1999-03-18 Idioma Ltd. Interactive system for teaching speech pronunciation and reading
EP1089246A2 (en) * 1999-10-01 2001-04-04 Siemens Aktiengesellschaft Method and apparatus for speech impediment therapy

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0360909A1 (en) * 1988-09-30 1990-04-04 Siemens Audiologische Technik GmbH Speech practising apparatus
EP0504927A2 (en) * 1991-03-22 1992-09-23 Kabushiki Kaisha Toshiba Speech recognition system and method
US5393236A (en) * 1992-09-25 1995-02-28 Northeastern University Interactive speech pronunciation apparatus and method
US5791904A (en) * 1992-11-04 1998-08-11 The Secretary Of State For Defence In Her Britannic Majesty's Government Of The United Kingdom Of Great Britain And Northern Ireland Speech training aid
US5487671A (en) * 1993-01-21 1996-01-30 Dsp Solutions (International) Computerized system for teaching speech
US5562453A (en) * 1993-02-02 1996-10-08 Wen; Sheree H.-R. Adaptive biofeedback speech tutor toy
WO1999013446A1 (en) * 1997-09-05 1999-03-18 Idioma Ltd. Interactive system for teaching speech pronunciation and reading
EP1089246A2 (en) * 1999-10-01 2001-04-04 Siemens Aktiengesellschaft Method and apparatus for speech impediment therapy

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006109268A1 (en) * 2005-04-13 2006-10-19 Koninklijke Philips Electronics N.V. Automated speech disorder detection method and apparatus
US20190221317A1 (en) * 2018-01-12 2019-07-18 Koninklijke Philips N.V. System and method for providing model-based treatment recommendation via individual-specific machine learning models
US10896763B2 (en) 2018-01-12 2021-01-19 Koninklijke Philips N.V. System and method for providing model-based treatment recommendation via individual-specific machine learning models

Also Published As

Publication number Publication date
WO2002059856A3 (en) 2003-06-26

Similar Documents

Publication Publication Date Title
US6714911B2 (en) Speech transcription and analysis system and method
O’Brien et al. Directions for the future of technology in pronunciation research and teaching
US6732076B2 (en) Speech analysis and therapy system and method
Ratner et al. Fluency Bank: A new resource for fluency research and practice
US5717828A (en) Speech recognition apparatus and method for learning
US5393236A (en) Interactive speech pronunciation apparatus and method
US9378650B2 (en) System and method for providing scalable educational content
McCrocklin ASR-based dictation practice for second language pronunciation improvement
US6134529A (en) Speech recognition apparatus and method for learning
JP2003504646A (en) Systems and methods for training phonological recognition, phonological processing and reading skills
AU2003300130A1 (en) Speech recognition method
US20040176960A1 (en) Comprehensive spoken language learning system
US20060053012A1 (en) Speech mapping system and method
Leather Interrelation of perceptual and productive learning in the initial acquisition of second-language tone
Elhadj E-Halagat: An e-learning system for teaching the holy Quran.
KR100995847B1 (en) Language training method and system based sound analysis on internet
Cheatham et al. How does independent practice of multiple-criteria text influence the reading performance and development of second graders?
Chenausky et al. Review of methods for conducting speech research with minimally verbal individuals with autism spectrum disorder
Neuhaus et al. The reliability and validity of rapid automatized naming scoring software ratings for the determination of pause and articulation component durations
US6711544B2 (en) Speech therapy system and method
Bernstein et al. ARTIFICIAL INTELLIGENCE FORSCORING ORAL READING FLUENCY
Herrera et al. The study of memorisation in piano students in higher education in Mexico
Ma et al. Pronunciation’s role in English speaking-proficiency ratings
WO1999013446A1 (en) Interactive system for teaching speech pronunciation and reading
WO2002059856A2 (en) Speech transcription, therapy, and analysis system and method

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SD SE SG SI SK SL TJ TM TN TR TT TZ UA UG UZ VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase in:

Ref country code: JP

WWW Wipo information: withdrawn in national office

Country of ref document: JP