US3870817A - Phonetic sound recognizer for all voices - Google Patents

Phonetic sound recognizer for all voices Download PDF

Info

Publication number
US3870817A
US3870817A US368264A US36826473A US3870817A US 3870817 A US3870817 A US 3870817A US 368264 A US368264 A US 368264A US 36826473 A US36826473 A US 36826473A US 3870817 A US3870817 A US 3870817A
Authority
US
United States
Prior art keywords
signals
flip
alternate
signal
trains
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
US368264A
Inventor
Meguer V Kalfaian
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to US368264A priority Critical patent/US3870817A/en
Application granted granted Critical
Publication of US3870817A publication Critical patent/US3870817A/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition

Definitions

  • ABSTRACT In an articulated phonetic sound wherein the information bearing group of format resonances varies within different frequency regions in the sound spectrum, such as a phonome by diffferently-pitched speakers the filter-separated signals derived from said group of resonances are regrouped (shifted) in prearranged combinations sequentially until a reference regrouping is established for neutralizing (normalizing) the undesired effect of said variations for adaptation to stan dard analysis. A specific group of signals from said regrouped signals are then selected and their amplitude ratios one with respect to another are further matched with standard groups of amplitude-ratio measuring means for obtaining a null output representing a good amplitude match.
  • FIG. 7 1 INVE/Vfflk AMPLITUDE-RATIO mart/ Ma ARRANGE/7H0 PHONETIC SOL IND RECOGNIZER FOR ALL VOICES This is a continuation of application Ser. No. 209,661, filed Dec. 20, 1971, now abandoned.
  • This invention relates to phonetic sound wave analysis, and more particularly to an improved and composite arrangement of my previously proposed systems for contructing a practical automata capable of recognizing phonetic sounds as spoken by different speakers.
  • the group of resonances in the arriving wave are regrouped in a reference frequency region in the sound spectrum in an order that, the lowest frequency (pitch) in the group is converted to a reference pitch frequency, and the other frequencies are converted to frequencies by the same factors ofmultiplication from the reference frequency as they differ from the original pitch frequency. Since in transmitting techniques of information, as known and practiced, one set of parameters may be substituted for another without loss of definition as long as the independent parameters remain unchanged, it is readily seen that the converted frequencies are relocated in fixed (standard) positions in the sound spectrum, and therefore, easily adapted to any type of analytical processing that may be desired. For practical purposes, however, the method just mentioned involves critical adjustments. Accordingly, the novel switching system used herein may be briefly described, as in the following:
  • a bank of channels arranged in numerical order, such as l, 2, 3 n each one of which is provided with a plurality of signal-admitting inputs, and a plurality of signal-switching inputs, respectively.
  • the detected signals derived from various resonances are applied to the plurality of signal-admitting inputs, so that anyone of the signals can be admitted to the output of anyone of the channels by the operation of a respective signalswitching input.
  • a plurality of prearranged combinations of groups of switching signals are applied to the plurality of signal-switching inputs for regrouping (shifting) the detected signals admitted to the outputs of the bank of channels sequentially until a specific group of the detected signals are regrouped along a reference. standard region of the numerically arranged channel outputs.
  • a group of signals from the re grouped signals at the channel outputs are then selected and their amplitude ratios one with respect to another are matched with plurality of prearranged groups of amplitude-ratio measuring means, for deriving from one of the last said means a final signal representative of the originally spoken phonetic sound.
  • the time period devoted to performing all of said combinations of signal regroupings is performed within a portion of a pitch period, which may not be long enough to render the systern versatile for various other relative uses.
  • the detected signals derived from the varying group of resonances are shifted to the outputs of said channels in an order that the signal derived from the lowest (pitch) frequency in the group is shifted to the number ONE output of the numerically arranged bank of channels, while the signals derived from the other resonances are shifted to the same mutually related numerical ratio separations along the channel outputs as differ in mutually related frequency ratios of the original resonances.
  • a fast signal-hunting distributor which scans all of the detected signals derived from the sub-bands starting from the lowest in the total number of sub-bands, and allow operation of the respective signal-regrouping combination only when a detected signal at the output of the pass-band filter responding to 600 cycles per second (exemplary) is present.
  • the hunting system can be made as fast as desired, so that the time consumed for hunting can be made negligible.
  • the information bearing signals are within a region above the pitch frequency (as explained in my reference patent)
  • Another object of the invention is to provide an arrangement for alternate operation during complete succeeding pitch periods, so that greater time during the pitch periods can be devoted to signal-regrouping operations. Still another object is the specific subdivision of the sound spectrum by pass-band filters, for obtaining highly precise frequency conversion of the sound wave.
  • the major peaks of the sound waves are the indicating points between which the sound waves are analyzed and re-constructed into the original sound information.
  • the waves contained between the high-peaked major peaks represent both the phonetic and quality informations of the sound.
  • the waves contained within the sub-major peaks of the sound represent only the phonetic information.
  • each puff of air enters the specifically formed mouth cavities, it forms an initial high peaked wave, representing the high-peaked major peaks.
  • the initial air force in the mouth sets up several resonances which have definite frequency and amplitude ratios one with respect to another (these resonances do not have fixed frequency distribution in the sound spectrum, as believed conventionally, but fixed ratio relationships, representing the phonetic sound).
  • the vocal folds also start vibrating at a frequency as determined by their physical structures, and the physical tension imposed upon them.
  • the time period between each high-peaked major peak is the pitch period of the sound
  • the time period between each subpeaked major peak is the fundamental time period of the sound.
  • these peaks are absolutely necessary for the brain to be able to interpret the arriving phonetic sound. This is the reason why a person who has lost his voice box by surgery, is able to produce intelligible phonetic sounds in speech, by forming his mouth slowly to the phonetic sounds that he normally would for the speech, and pressing a mechanical vibrator against his throat; the vibratory lobes representing the major peaks.
  • the major peaks are not part of the phonetic sound, but their presence is absolutely necessary for the brain to know when to start and when to conclude the analysis of the information presented.
  • FIG. 1 is partly schematic and partly block diagram of the complete system of speech typewriter according to the invention.
  • FIG. 2 is a waveform illustrating the alternate READ and WRITE operating periods of the system in FIG. 1.
  • FIG. 3 is an exemplary signal-amplitude-ratio measuring arrangement according to the invention.
  • FIG. 4 shows the center frequency sub-divisions of the pass-band filters, as used in the present invention.
  • FIG. 5 is a numerical chart showing how the detected signal outputs of the pass-band filters are switched linearly to the outputs of the numerically arranged channels, in accordance with the invention.
  • FIG. 6 is a switching arrangement using analog channel switches, showing the simple parallel connected gate electrodes of the switches in accordance with the specific sub-band divisions of FIG. 4.
  • FIG. 7 shows how these analog switches can be oriented on a planar surface for simple interconnections of terminals, in accordance with the invention.
  • the center frequencies of the sub-dividing band-pass filters are arranged as shown in FIG. 4, wherein the frequency sub-divisions are similar to the standard musical scale.
  • Such a numerical arrangement simplfies the actual channel switching arrangement, because it requires linearly sequenced numerical transfer without cross coupling of any of the pass-band outputs, such as would be by the arrangement of pass-band center frequencies of the filters shown in FIG. 11, and the chart of switching combinations in FIG. 12 of my first reference patent, as mentioned in the foregoing. This is shown in greater clarity by the chart in FIG.
  • the harmonic sequence at like intervals is considered in sub-dividing the sound spectrum, because the number of sub-band divisions and center frequencies of the pass-band filters may be arranged other than the frequencies shown in FIG. 4 without affecting the required accuracy.
  • the numerical sequence assigned to the pass-band filters may be reversed, or coded, according to any mode of analytical switching of the sub-band signals that may be desired.
  • the specific sub-band division of the sound spectrum simplifies the switching arrangement. Due to the plurality of parallel connections that are necessary in physical assembly, it is preferably to use high impedance transistors, such as the MOS FET transistor, which are particularly simple for integrated assembly on single wafers, but not necessarily, if practice indicates otherwise.
  • the circuitry of the channel switching may be arranged as in FIG. 6, wherein, the channel-1 is represented by the transistors O21, O23, Q26 and Q28; channel-2 is represented by the transistors O20, Q24, Q27; and the channel-3 is represented by the transistors Q21, Q25; and the channel outputs are represented by the load resistors R1 through R3*, respectively.
  • Such an arrangement provides plurality of inputs for each channel, for example, the input of channel-l may be switched to either the output of DRl, DR2, DR3 arriving at the inputs of block 4 in FIG. 1; this also being applied to the nth channel.
  • the switching arrangement shown in FIG. 6 is similar to the switching arrangement shown in FIG. 10 of my first reference patent, but it is simpler in physical assembly. (which can have high im dance values, due to the use of FET transistors, in order to avoid discharge of the storage capacitors in FIG. 1)
  • the switching sequence is obtained by parallel connections of the gate electrodes in the numerical sequence as shown in FIG. 5.
  • the gate electrode of Q19 channel-l is connected to the gate of Q20 (channel- 2), and to the gate electrode of Q21 (channel-3).
  • the gate electrode of Q23 channel-l is connected to the gate electrode of 024 (channel-2), and so on.
  • each square represents the semiconductor as mentioned, three squares of which are shown in shaded lines in order to render them distinguishable in the maze of electrical connections.
  • the connecting electrodes in each square are represented by the black dots to which are terminated the required parallel connections, as illustrated by the horizontal, vertical, and 45 parallel lines; the first representing the source electrode terminals, the second representing the drain electrode terminals, and the third representing the gate electrode terminals.
  • FIG. 1 STORAGE SYSTEM OF THE SUB-BAND SIGNALS IN ALTERNATE PITCH PERIODS
  • FIG. 1 there are shown only three pass-band filters in blocks 1, 2 and 3, The operation of all these subband filters in conjunction with their associated circuitry is the same, and therefore, the operation of only the filter in block l and its associated circuitry will be described, as a typical example.
  • the output of subband frequency f, in block 1 is divided into two separate branches across the center tapped secondaries Ll and L2 of transformer T1.
  • the output of L1 is full wave rectified by the diodes D1, D2, and stored across capacitor C1 in series with the charging transistor Q1, while the output of L2 is full wave rectified by diodes D3, D4, and stored across C2 in series with Q4.
  • the normally idle transistor O2 is connected in parallel with C1, so that the stored potential can be discharged after READ period has ended, and the normally idle Q4 is connected in parallel with C2 for discharging the stored potential during an alternate time period with respect to the first discharge.
  • the stored signals across C1 and C2 are switched alternately during alternate READ periods to the common output of transistors Q13 and Q14, which are excited into on-and-off states alternately during the succeeding (READ) periods.
  • This common output is connected directly to terminal DRl of the signal regrouping block 4.
  • the output of passband filter in block 1 is also applied to an amplitude threshold sensing device in block 5 (which can be obtained commercially in the form of integrated circuitry), and applied to the RS (set-reset) flip-flop in block 6 for operation.
  • the R-S flip-flop in block 6 When a signal appears at the output of block 1 just above a threshold level, the R-S flip-flop in block 6 operates in set state, indicating that a signal has appeared at the output of the filter in block 1.
  • This set-operating signal of block 6 is applied in 1 level to one of the inputs of AND gate in block 7, but it does not operate until its second input is also driven to 1 level.
  • the output of AND gate 7 becomes active, it operates the one-shot (OS) in block 8, which in turn produces an output pulse of predetermined time length and applies to one of the multiinputs of gate in block 9, and also to the lower left hand terminal 1 of the signal regrouping block 4.
  • OS one-shot
  • the common output of the transistors Q15, Q16 is applied to the terminals DR2 and DR3 of the block 4, and the output pulses of the one-shots in blocks 13 and 17 are applied to the inputs of gate in block 9, and also to the lower center and right hand terminals 2 and 3 of block 4, respectively.
  • the DRl, DR2 and DR3 terminals of block 4 in FIG. 1 represent the drain electrode terminals of the arrangement in FIG. 6, and the numerals of the lower end terminals of this block arrangement represent the same numerical terminals of the gate electrodes in the same arrangement.
  • the output pulse of one-shot in block 8 switches the transistors Q19-Q22 (FIG. 6) in ON states for regrouping the stored signals at the input of block 4.
  • the output pulse on one-shot in block 13 switches the transistors Q23-Q25 in ON states for regrouping the stored signals at the inputs of block 4.
  • the transistors Q26-Q27 are switched in ON states for regrouping the stored signals.
  • the object at this point accordingly, is to provide a hunting arrangement in the form of a distributor, which starts distributing 1 level pulses to the second inputs of AND gates in blocks 6, l1 and 15, starting from block 6, or in any sequence that may be desired, so that whichever AND gate has one of its inputs driven into 1 level by the output of its associated pass-band filter, its following one-shot operates by the distributor pulse for theprearranged signal regrouping.
  • the mutually related amplitude ratios of these regrouped signals are matched with prearranged ratio measuring groups to make sure that the proper signal regrouping has been established for recognition of the sound information. If not, however, the signal regrouping continues until the correct regrouping produces an output signal representing the information sought.
  • the distributor pulses are made much shorter than the output pulses of the one-shots, and the distribution frequency is made practically high enough to skip the inoperable AND gates (blocks 6, 11, 15) in negligible length of time.
  • the distributor is inhibited for the required analytical performance.
  • the Q3 becomes in ON state for charging the capacitor C2, while the Q14 becomes in OFF state for preventing any transfer of this charging action to the common output of Q13 and Q14.
  • the READ period of the charge of C2 is shown by the ON period of Q14 being in alternate periods of the Q13.
  • the discharging of the capacitor C2 is shown by the ON (shaded) periods of the O4 in alternate periods with respect to the Q2.
  • the voice sound wave in block 18 is applied to the pitch selector in block 19 (which may have a one-shot at its output for producing sharp edged pulses) for deriving pulse signals at pitch frequencies (major peaks).
  • These pulse signals are applied to the clock flip-flop in block 20 for producing alternate output signals in steady state steps, which represent the waves of Q1 or O3 in FIG. 2.
  • the steady state output signals of block 20 are applied directly and alternately to the first inputs of AND gates in blocks 21 and 22, and further also applied alternately to the inputs of setreset flip-flops in blocks 23 through 26 by way of direct and the differentiating coupling capacitors C7 through C10, respectively.
  • the purpose of using the coupling capacitors C7 through C10 is to avoid steady state excitation at the inputs of these flip-flops.
  • the second inputs of AND gates 21 and 22 are connected in parallel, and normally biased in backward direction so that signals in forward direction at their inputs alone will not render them operative.
  • the upper terminal of clock flip-flop 20 is in positive polarity, it prepares the AND gate 21 ready for operation when a forward bias pulse upon its second input arrives.
  • the lower terminal of block 20 applies a negative pulse to the input of set-reset flip-flop 26 to operate it in setposition for producing at its output the wave of 013 in FIG. 2.
  • a pulse signal in forward direction arrives at the parallel connected second inputs of AND gates 21 and 22, causing the gate 21 to operate and apply a pulse signal in forward direction to the inputs of set-reset flip-flops 23 and 25.
  • the flip-flop 23 is driven into reset-position (so as to terminate its READ period), and simultaneously drives the flip-flop 25 into set-position, which produces at its output the shaded portion (storage-discharge) of the wave 02 in FIG. 2.
  • set-reset flip-flop 25 When the clock flip-flop in block 20 reverses its state of operation by the arriving pitch pulse, the input of set-reset flip-flop 25 receives a pulse in forward direction from the upper terminal of flip-flop 20, and it shifts into reset-operation for terminating the shaded portion of the wave O2 in FIG. 2. Simultaneously, the setreset flip-flop in block 24 operates in set-position for repeating the previous operation in alternate cycles.
  • the terminals of flip-flops 23-26 are shown connected to the transistor control gates of Q1 through Q18, in series with the independent amplifiers in block 27.
  • the diagram in FIG. 1 indicates that the transistors Q1 through 018 are of the MOS FET type which require about l volts at the control gates for operation.
  • the flip-flops used are of the type using about volts for operation, there are also available integrated circuit amplifiers for the purpose of interfacing with high and low operating devices, such as used in the drawing of FIG. 1, although different types of available devices may also be used, if so desired.
  • the amplitude equalizer in block 28 is also shown, as used herein, for normalizing the amplitude variations of the sound wave prior to analysis, and it may be of any available type, or as described in my previous patent issues.
  • the signal distributing (hunting) system may now be described, as in the following:
  • signal regrouping occurs when the first and second inputs of anyone of the AND gates 7, 12 and 16 are simultaneously driven into 1 level voltages.
  • the purpose of the distributor is to apply distributory 1 level signals to the second inputs of these gates (as a hunting process), so that the AND gate which has received 1 level signal at its first input will operate, and thereby causing its associated one-shot to follow operation for the required signal-regrouping action in block 4.
  • the distributor is represented by the blocks 28 and 29, which for simplicity of design may be two of the available four-line-tol o-line-decoders connected in series for 32 distributory output pulses.
  • the outputs of a binary counter are connected to the A, B, C, D inputs of the blocks 28, 29, so that the sequential pulses applied to the input terminal 5 of the binary counter 30 are converted into sequentially distributed pulses along the total of 32 independent outputs of the blocks 28 and 29.
  • These outputs are applied individually to the second inputs of the AND gates, only three of which are shown in the drawing.
  • the use of 32 distributory outputs is shown for versatility of the arrangement in FIG. 1 for various other uses, and the number of these outputs may be more or less according to the type of complex wave analysis used for.
  • the input terminal 5 of binary counter in block 30 is driven by the clock generator in block 31 (for example, at a frequency of 300 KHZ) in series with the gate in block 32 and inverter block 33.
  • the three inputs of the gate are normally at 1 level, so that the 1 level clock pulses are admitted to the input 5 of the counter 30 for operating the distributor.
  • the first distributor pulse (at 1 level after being phase inverted, not shown) applied to the second input of AND gate 7 causes operation of the one-shot 8, which in turn applies a pulse of 0.2 millisecond at 0 level to the input of gate 32 (in series with the multi-input gate 9 and the 3-input gate 34) to render it inoperative, and thereby stop the clock pulses passing to the input of counter 30 for the first combination of signal regrouping operation in the block 4.
  • a counter is also included herein, so that it can reset the distributor after a predetermined number of counts, because more than a predetermined number of signal-regrouping counts is indicative of lack of information present.
  • the pitch pulse operates the one-shot in block 35 which triggers and applies a resetting pulse to the clear terminal 14 of binary counter 36, and to the RS flip-flop 37 which applies 1 level signal to one of the inputsof gate 32.
  • the normal output of gate 34 to one of the inputs of the gate 32 is also at 1 level, so that the clock pulses pass on to the counter 30 for operation.
  • the counter 30 triggers during the input low-to-high transition period, which allows time for resetting itself first.
  • the RS 40 also operates and resets the distributor in block 28 into normal operating state, while at the same time inhibiting the block 29.
  • the counter 30 now starts operation and the block 28 starts distributing 1 level pulses (after inversion, not shown) to the first inputs of the AND gates 7, l2, l6 n.
  • the counter 30 resets itself by a pulse from its carry output 12 to th clear input 14, via gate and inverter in block 39, and triggers the R-S 40 which in turn inhibits the block 28 and activates the block 29, for a total of 32 distributory outputs.
  • the binary counter in block 36 After a predetermined number of counts by the binary counter in block 36 (or any other type of counter), it triggers the R-S flip-flop in block 37, which inhibits the AND gate 32, and applies 1 level signal to the inputs of AND gates in blocks 21 and 22 (in series with auxiliary gate 41) to start discharge of one of the pairs of storage capacitors (C1 or C2), as a new cycle of operation by the following pitch pulse.
  • the counter 36 also sends a pulse signal to the signal-decoder block 42, (signal-amplitude ratio measuring circuits) which when activated by an incoming sound information (speech) a typewriter key is actuated in block 43, as a final translation from sound to visible indicia.
  • the signal from the nth terminal (y) is connected to the input of gate 41 to effect discharge of the capacitors C1-C6, so that recycling of the arrangement can start on the following pitch pulse.
  • the amplitude ratio measuring arrangement shown herein in FIG. 3 is similar to the arrangement shown in FIG. 14 of my disclosure in my U.S. Pat. No. 3,622,706 Nov. 23, 1971, with modified component parts for the sake of greater stability that be obtained for practical purposes.
  • the amplitude ratio between the signals A and B (FIG. 3) has to be matched with that of a predetermined ratio, the A signal is applied to the transistor Q29 and the B signal is applied to the transistor 030.
  • the selected group of signals are in ON states within only 0.2 milliseconds, which cause proportional currents through the primaries of the transformers T4 and T5.
  • One of the terminals of the secondaries of T4 and T5 are connected to ground in series with the bias source B1, and the other terminals are labeled as OUT representing the output terminals.
  • the secondaries of T4 and T5 are also shunted by diodes D13 ad D14, respectively, in series with the bias source B2.
  • diodes are used to prevent oscillation in the secondaries after initiating a pulse current (but may be omitted with the B1 and B2 if not found necessary), and the voltage levels of bias sources B1 and B2 are adjusted equal to the conducting threshold gaps of the diodes D13 and D14, so that the diodes will start conducting from close to zero voltage level across the secondaries of T4 T5.
  • the output of the secondary of T4 is connected to ground in series with one of the signal-mixing diodes D15 through D17, and resistor R4, and the output of the secondary of T5 is connected to ground in series with one of the signal-mixing diodes D18 to D20, and resistor R5.
  • the voltage gains across R4 and R5 are preadjusted, and applied to the gate electrodes of transistors Q31 and Q32, respectively.
  • the resultant effect is that, the oppositely polarized voltages across secondaries of T6 and T7 will either nullify to zero voltage for a specific gain ratio adjustments across R4 and R5, or above zero voltage (positive or negative) when the incoming information is other than the fixed gain adjustments.
  • the output pulse from the gate 32 in FIG. 1 is applied to the one-shot in block 46 of FIG. 3, it operates with a delay pulse and further operates the oneshot in block 47 through the differentiating coupling capacitor C11.
  • the output pulse of one-shot 47 is finally applied to one of the inputs of gate in block 44 in 1 level. If at this time the other input has not received 0 level voltage from the secondary of T8, and has re mained at 1 level, the gate 44 operates the one-shot, in block 45 for the required typing of a letter symbol. On the other hand, if the gate 44 has received 0 level signal from T8, it remains inoperative by the arriving pulse from one-shot in block 47.
  • the diode 21 is used to prevent excessive voltage applied to the gate 44, but may be dispensed with without affecting the operation of the system.
  • amplitude-ratio measuring arrangements other than shown herein may be used as long as it serves the purpose of the present invention, for example, voltage comparing circuits such as com surgeally available in the form of integrated devices may also be used.
  • the circuit shown in FIG. 3, however, is simple, and because of the transformers used for obtaining null signal, any number of transformer secondaries may be inserted in series with the secondaries of transformers T6 and T7.
  • the system for detecting and storing the peak amplitudes of said group of resonances during the time periods of said trains successively, adaptable to stored-signal regrouping as a representation of normalyzing said spectral variations, and analyzing the amplitude ratios between said regrouped signals, said system comprising a plurality of band-pass filters to separate the voice signals peak amplitude resonances into corresponding channels, each filter output feeding a parallel-pair of channels of detector/gate/storage/ discharge and mixer-gate channels to a common gated mixer output, said parallel pair of detector channels for alternate switching and storage of the filter output, where a plurality of detector means for detecting and storing the peak ampltudes of the resonances in said complex wave, each detector means consisting of a first pair of storage means for alternate

Abstract

In an articulated phonetic sound wherein the information bearing group of format resonances varies within different frequency regions in the sound spectrum, such as a phonome by diffferentlypitched speakers the filter-separated signals derived from said group of resonances are regrouped (shifted) in prearranged combinations sequentially until a reference regrouping is established for neutralizing (normalizing) the undesired effect of said variations for adaptation to standard analysis. A specific group of signals from said regrouped signals are then selected and their amplitude ratios one with respect to another are further matched with standard groups of amplitude-ratio measuring means for obtaining a null output representing a good amplitude match.

Description

United States Patent [1 1 Kalfaian PHONETIC SOUND RECOGNIZER FOR ALL VOICES BAND-PAS$ BAND-PASS BAND-PASS l fn AHPII T UDE EQUAL/BER 27 6-INDEPENBENT AHPLIFIERS AND AND 22 PHO/VA'TIC SPEELI/ TJPIWRH'ER FOR All VOICE$ Mar. 11,1975
Primary Examiner-Kathleen H. Claffy Assistant Examiner-E. S. Kemeny [5 7] ABSTRACT In an articulated phonetic sound wherein the information bearing group of format resonances varies within different frequency regions in the sound spectrum, such as a phonome by diffferently-pitched speakers the filter-separated signals derived from said group of resonances are regrouped (shifted) in prearranged combinations sequentially until a reference regrouping is established for neutralizing (normalizing) the undesired effect of said variations for adaptation to stan dard analysis. A specific group of signals from said regrouped signals are then selected and their amplitude ratios one with respect to another are further matched with standard groups of amplitude-ratio measuring means for obtaining a null output representing a good amplitude match.
3 Claims, 7 Drawing Figures 5 WI TCH/NG INPUT- 5 IG'NALS GEN. 300 K/lz PATENTEDHAR 11 ms 3,870,817
v sum 2 or 5 mas ME w OFFLQQZ Q4 ALTERNATE READ 8r WRITE OPERA T/O/VS 0F 77/! STORAGf SIGNALS FOR COIVTl/VL/0U5 READ SIGNALS 146E055 DR/ THROUGH-URI? all/T Fjgna man cum/m Bil/FER min/runs: man an: s im/r101 ourpurs $111055 RAT/0 ADI. Y m FIG. 7 1 INVE/Vfflk AMPLITUDE-RATIO mart/ Ma ARRANGE/7H0 PHONETIC SOL IND RECOGNIZER FOR ALL VOICES This is a continuation of application Ser. No. 209,661, filed Dec. 20, 1971, now abandoned.
This invention relates to phonetic sound wave analysis, and more particularly to an improved and composite arrangement of my previously proposed systems for contructing a practical automata capable of recognizing phonetic sounds as spoken by different speakers.
One of the important contributions in the present invention is the improvement over my switching arrangement as disclosed in my related U.S. Pat. No. 3,622,706 issued Nov. 23, 1971, and related U.S. Pat. No. 3,678,201 issued July, 18, 1972 which I had devised for modifying the characteristic components of the sound wave in such simulated manner that the spectrum variations which normally occur in oral speech are neutralized (normalized) during the analytical process of recognition.
One of the most problematic conditions in speech sound wave analysis is the spectrum variations that occur not only in different speakers, but also in the voice of a single speaker. Changes in pitch frequency change the absolute values of the frequency peak resonances (or formants as they are called in speech signal processing), also harmonic ratios and amplitude ratios are maintained for the same speech sound (for example, a specific phoneme). Systems have been previously proposed to neutralize these variations, and the practice has generally been called frequency conversion; frequency normalization; and frequency standardization. To be specific in the terminology used herein accordingly, the term normalization will be used hereinafter throughout the specification and claims. These systems, however, have one way or another been subject to critical controls, and the practice has not been favored for acceptance in experimental systems for speech recognition automata. In carrying out an exemplary method of spectrum normalization, the group of resonances in the arriving wave are regrouped in a reference frequency region in the sound spectrum in an order that, the lowest frequency (pitch) in the group is converted to a reference pitch frequency, and the other frequencies are converted to frequencies by the same factors ofmultiplication from the reference frequency as they differ from the original pitch frequency. Since in transmitting techniques of information, as known and practiced, one set of parameters may be substituted for another without loss of definition as long as the independent parameters remain unchanged, it is readily seen that the converted frequencies are relocated in fixed (standard) positions in the sound spectrum, and therefore, easily adapted to any type of analytical processing that may be desired. For practical purposes, however, the method just mentioned involves critical adjustments. Accordingly, the novel switching system used herein may be briefly described, as in the following:
There are used a bank of channels arranged in numerical order, such as l, 2, 3 n each one of which is provided with a plurality of signal-admitting inputs, and a plurality of signal-switching inputs, respectively. The detected signals derived from various resonances (sub-band divisions in the sound) are applied to the plurality of signal-admitting inputs, so that anyone of the signals can be admitted to the output of anyone of the channels by the operation of a respective signalswitching input. Thus in order to obtain neutralization of the frequency variations, a plurality of prearranged combinations of groups of switching signals are applied to the plurality of signal-switching inputs for regrouping (shifting) the detected signals admitted to the outputs of the bank of channels sequentially until a specific group of the detected signals are regrouped along a reference. standard region of the numerically arranged channel outputs. A group of signals from the re grouped signals at the channel outputs are then selected and their amplitude ratios one with respect to another are matched with plurality of prearranged groups of amplitude-ratio measuring means, for deriving from one of the last said means a final signal representative of the originally spoken phonetic sound.
As explained in my reference patents, the time period devoted to performing all of said combinations of signal regroupings is performed within a portion of a pitch period, which may not be long enough to render the systern versatile for various other relative uses. In order to reduce the number of signal regrouping operations during each pitch period, we may first refer to the lowest (pitch) frequency in a group of resonances representing a phonetic sound, which vary from 45 to 600 cycles per second, or higher, if singing is also to be considered. In neutralizing these frequency variations, the detected signals derived from the varying group of resonances are shifted to the outputs of said channels in an order that the signal derived from the lowest (pitch) frequency in the group is shifted to the number ONE output of the numerically arranged bank of channels, while the signals derived from the other resonances are shifted to the same mutually related numerical ratio separations along the channel outputs as differ in mutually related frequency ratios of the original resonances.
Thus if the lowest frequency in a group is 600 cycles per second, it is necessary to start from the signalregrouping combination that will effect shifting the detected signal derived from the sub-band frequency of 600 cycles per second to the output of number ONE channel. In order to avoid any time waste in testing of all the filter outputs from arriving frequencies below 600 cycles per second, a fast signal-hunting distributor is used which scans all of the detected signals derived from the sub-bands starting from the lowest in the total number of sub-bands, and allow operation of the respective signal-regrouping combination only when a detected signal at the output of the pass-band filter responding to 600 cycles per second (exemplary) is present. With the modern available integrated circuitry, the hunting system can be made as fast as desired, so that the time consumed for hunting can be made negligible. Then again, since the information bearing signals are within a region above the pitch frequency (as explained in my reference patent), it is only necessary to perform two or three more (or other if so desired) signal regrouping operations after the detected signal derived from the pitch frequency has been located by the distributor, for selecting a specific group of signals that represent the spoken phonetic sound. This is accomplished by a signal counter, which resets the distributory (hunting) system into normal operating state after four counts of signal-regrouping operations have been completed. Thus the object of the present invention is to provide improved and practical simplicity of operation over the system disclosed in my reference patent. Another object of the invention is to provide an arrangement for alternate operation during complete succeeding pitch periods, so that greater time during the pitch periods can be devoted to signal-regrouping operations. Still another object is the specific subdivision of the sound spectrum by pass-band filters, for obtaining highly precise frequency conversion of the sound wave.
In my previous disclosures of making automata for phonetic sound recognition, I have described systems both for direct analysis of the sound wave, such as described in my related US. Pat. No. 3,322,898 May 30, 1967, and modifying the frequency variations of the sound wave into a normalized form prior to analysis, so that the analyzing process can be standardized; in both cases the resulting performance being the same. In the system disclosed herein, however, normalization of frequency variations is preferred for practicability purposes, and since it is claimed herein to be free of environmental changes, it is also claimed to perform the ultimate in man made automata for phonetic sound recognition. Thus in order to draw a distinction between the highly accurate performance of the system disclosed herein and the systems given in prior teachings, it is first necessary to show how closely the analytical process of the presently given system simulates the analytical process of the human brain even though their physical performances may differ widely.
THEORETICAL BRIEF RELATING TO THE INTERPRETIVE FUNCTION OF THE BRAIN I have given a detailed description of the analytical function of the human brain in my first reference patent. Thus it is not necessary to repeat herein the complete text of my former description, and the following brief will be sufficient for the computer engineer to visualize a series of processing steps that will closely simulate the biological aspects involved in the interpretation of phonetic sound waves.
In order for the brain to be able to interpret sound informations intelligibly, it must be informed as to when to start analysis and when to end analysis of the arriving information. In sound information, the major peaks of the sound waves are the indicating points between which the sound waves are analyzed and re-constructed into the original sound information. For example, in phonetic sounds, there are high-peaked major peaks, and sub-peaked major peaks. The waves contained between the high-peaked major peaks represent both the phonetic and quality informations of the sound. Whereas, the waves contained within the sub-major peaks of the sound represent only the phonetic information. These major peaks are initiated by puffs of air from the glottis into the mouth cavities. For example, as each puff of air enters the specifically formed mouth cavities, it forms an initial high peaked wave, representing the high-peaked major peaks. According to the specific formation of the mouth cavities, the initial air force in the mouth sets up several resonances which have definite frequency and amplitude ratios one with respect to another (these resonances do not have fixed frequency distribution in the sound spectrum, as believed conventionally, but fixed ratio relationships, representing the phonetic sound). As each puff of air is forced into the mouth cavities, the vocal folds (vocal cords) also start vibrating at a frequency as determined by their physical structures, and the physical tension imposed upon them. Thus besides the high-peaked puffs of air from the glottis, there are also subpeaked puffs of air from the vocal cords that enter the mouth cavities.
The time period between each high-peaked major peak is the pitch period of the sound, and the time period between each subpeaked major peak is the fundamental time period of the sound. As indicated above, these peaks are absolutely necessary for the brain to be able to interpret the arriving phonetic sound. This is the reason why a person who has lost his voice box by surgery, is able to produce intelligible phonetic sounds in speech, by forming his mouth slowly to the phonetic sounds that he normally would for the speech, and pressing a mechanical vibrator against his throat; the vibratory lobes representing the major peaks. Thus the major peaks are not part of the phonetic sound, but their presence is absolutely necessary for the brain to know when to start and when to conclude the analysis of the information presented. For only phonetic sound recognition then, those resonances that occur above the fundamental are selected and analyzed by frequency and amplitude ratio measurements; an illustrative description of which would be too long to include herein, but reference may be made to my first refer ence patent. For practical simulation of the brains interpretive function, the frequency variations that normally occur in voices of different speakers are first normalized prior to analytical processing, a detailed description of which will now be given in conjunction with the accompanying drawings, wherein:
FIG. 1 is partly schematic and partly block diagram of the complete system of speech typewriter according to the invention.
FIG. 2 is a waveform illustrating the alternate READ and WRITE operating periods of the system in FIG. 1.
FIG. 3 is an exemplary signal-amplitude-ratio measuring arrangement according to the invention.
FIG. 4 shows the center frequency sub-divisions of the pass-band filters, as used in the present invention.
FIG. 5 is a numerical chart showing how the detected signal outputs of the pass-band filters are switched linearly to the outputs of the numerically arranged channels, in accordance with the invention.
FIG. 6 is a switching arrangement using analog channel switches, showing the simple parallel connected gate electrodes of the switches in accordance with the specific sub-band divisions of FIG. 4.
And FIG. 7 shows how these analog switches can be oriented on a planar surface for simple interconnections of terminals, in accordance with the invention.
In order to obtain high accuracy of signal regrouping without causing any cross switching of the input signal to the channel outputs, the center frequencies of the sub-dividing band-pass filters are arranged as shown in FIG. 4, wherein the frequency sub-divisions are similar to the standard musical scale. Such a numerical arrangement simplfies the actual channel switching arrangement, because it requires linearly sequenced numerical transfer without cross coupling of any of the pass-band outputs, such as would be by the arrangement of pass-band center frequencies of the filters shown in FIG. 11, and the chart of switching combinations in FIG. 12 of my first reference patent, as mentioned in the foregoing. This is shown in greater clarity by the chart in FIG. 6, wherein the top row of the numerals represent the channels, and the row of numerals below represent the sequence of the numerals left hand of the sub-band frequencies in FIG. 5. For example, in the first row (FIG. 6), all of the detected outputs of pass-band filters (starting from filter number-l) are applied to the inputs of the channels starting from channel number-l. In the secondrow, the detected filter outputs starting from the number-2 filter are applied to all of the channels starting from the channel number-l and so on. As stated in the foregoing, such simplicity of switching sequence becomes inherently accurate, as long. as the harmonic sequence at like intervals is considered in sub-dividing the sound spectrum, because the number of sub-band divisions and center frequencies of the pass-band filters may be arranged other than the frequencies shown in FIG. 4 without affecting the required accuracy. In furtherance, the numerical sequence assigned to the pass-band filters may be reversed, or coded, according to any mode of analytical switching of the sub-band signals that may be desired.
..THE SWITCHING ARRANGEMENT OF FIG. 6
As stated in the foregoing, the specific sub-band division of the sound spectrum simplifies the switching arrangement. Due to the plurality of parallel connections that are necessary in physical assembly, it is preferably to use high impedance transistors, such as the MOS FET transistor, which are particularly simple for integrated assembly on single wafers, but not necessarily, if practice indicates otherwise. The circuitry of the channel switching may be arranged as in FIG. 6, wherein, the channel-1 is represented by the transistors O21, O23, Q26 and Q28; channel-2 is represented by the transistors O20, Q24, Q27; and the channel-3 is represented by the transistors Q21, Q25; and the channel outputs are represented by the load resistors R1 through R3*, respectively. Such an arrangement provides plurality of inputs for each channel, for example, the input of channel-l may be switched to either the output of DRl, DR2, DR3 arriving at the inputs of block 4 in FIG. 1; this also being applied to the nth channel. The switching arrangement shown in FIG. 6 is similar to the switching arrangement shown in FIG. 10 of my first reference patent, but it is simpler in physical assembly. (which can have high im dance values, due to the use of FET transistors, in order to avoid discharge of the storage capacitors in FIG. 1)
The switching sequence is obtained by parallel connections of the gate electrodes in the numerical sequence as shown in FIG. 5. For example, in the first row of parallel connections, the gate electrode of Q19 (channel-l is connected to the gate of Q20 (channel- 2), and to the gate electrode of Q21 (channel-3). In the second row of parallel connections, the gate electrode of Q23 (channel-l) is connected to the gate electrode of 024 (channel-2), and so on. When these parallel connected terminals of the gate electrodes are normally biased in backward direction, all of the switches of interconnecting design, however, a large number of these transistors, or any type of controllable semiconductors, may be integrated on a single wafer, such as shown in FIG. 7, wherein, each square represents the semiconductor as mentioned, three squares of which are shown in shaded lines in order to render them distinguishable in the maze of electrical connections. The connecting electrodes in each square are represented by the black dots to which are terminated the required parallel connections, as illustrated by the horizontal, vertical, and 45 parallel lines; the first representing the source electrode terminals, the second representing the drain electrode terminals, and the third representing the gate electrode terminals.
Having described the details of frequency normalization, or signal-regrouping, as utilized herein, the complete arrangement of a phonetic sound recognizing apparatus will now be described by way of the part shematic and part block diagram in FIG. 1, as in the following:
PHONETIC SOUND RECOGNIZING APPARATUS OF FIG. 1
' periods, and these stored signals are analyzed and disremain in OFF states. Anyone of these parallel con- SWITCHING APPARATUS OF FIG. 7
In reference to the chart shown in FIG. 5, it is apparent that the use of a large numer of switching transistors is required for the switching performance. While pitch periods; (3) a frequency normalyzing (signalregrouping) arrangement as described in the foregoing; (4) a plurality of coupling means from the alternate storage signals to the frequency normalyzing arrangement for regrouping the-various stored signals in a group; (5) a signal distributor for operating the signal regrouping arrangement sequentially during the (READ) period until the proper signal regrouping is established; and (6) means for matching mutually related amplitude ratios of the regrouped signals during the (READ) period with that of prearranged amplitude ratio measuring means for final decision of the information sought; following discharge of the stored signals that had been analyzed.
STORAGE SYSTEM OF THE SUB-BAND SIGNALS IN ALTERNATE PITCH PERIODS In FIG. 1, there are shown only three pass-band filters in blocks 1, 2 and 3, The operation of all these subband filters in conjunction with their associated circuitry is the same, and therefore, the operation of only the filter in block l and its associated circuitry will be described, as a typical example. Thus the output of subband frequency f, in block 1 is divided into two separate branches across the center tapped secondaries Ll and L2 of transformer T1. The output of L1 is full wave rectified by the diodes D1, D2, and stored across capacitor C1 in series with the charging transistor Q1, while the output of L2 is full wave rectified by diodes D3, D4, and stored across C2 in series with Q4. The normally idle transistor O2 is connected in parallel with C1, so that the stored potential can be discharged after READ period has ended, and the normally idle Q4 is connected in parallel with C2 for discharging the stored potential during an alternate time period with respect to the first discharge. The stored signals across C1 and C2 are switched alternately during alternate READ periods to the common output of transistors Q13 and Q14, which are excited into on-and-off states alternately during the succeeding (READ) periods. This common output is connected directly to terminal DRl of the signal regrouping block 4. The output of passband filter in block 1 is also applied to an amplitude threshold sensing device in block 5 (which can be obtained commercially in the form of integrated circuitry), and applied to the RS (set-reset) flip-flop in block 6 for operation. Thus when a signal appears at the output of block 1 just above a threshold level, the R-S flip-flop in block 6 operates in set state, indicating that a signal has appeared at the output of the filter in block 1. This set-operating signal of block 6 is applied in 1 level to one of the inputs of AND gate in block 7, but it does not operate until its second input is also driven to 1 level. When the output of AND gate 7 becomes active, it operates the one-shot (OS) in block 8, which in turn produces an output pulse of predetermined time length and applies to one of the multiinputs of gate in block 9, and also to the lower left hand terminal 1 of the signal regrouping block 4. In similar fashion of operating conditions of the filter blocks 2 and 3, the common output of the transistors Q15, Q16 is applied to the terminals DR2 and DR3 of the block 4, and the output pulses of the one-shots in blocks 13 and 17 are applied to the inputs of gate in block 9, and also to the lower center and right hand terminals 2 and 3 of block 4, respectively. The DRl, DR2 and DR3 terminals of block 4 in FIG. 1 represent the drain electrode terminals of the arrangement in FIG. 6, and the numerals of the lower end terminals of this block arrangement represent the same numerical terminals of the gate electrodes in the same arrangement. Thus referring to the electrical connections just described, and starting from the lowest frequency in the sub-band division, if an arriving signal appears at the output of filter block 1, the output pulse of one-shot in block 8 switches the transistors Q19-Q22 (FIG. 6) in ON states for regrouping the stored signals at the input of block 4. In the event that the output of filter in block 1 is zero, and a signal appears at the output filter in block 2, then the output pulse on one-shot in block 13 switches the transistors Q23-Q25 in ON states for regrouping the stored signals at the inputs of block 4. Similarly, if the outputs of blocks 1 and 2 are zero, and a signal appears at the output of block 3, then the transistors Q26-Q27 are switched in ON states for regrouping the stored signals. The object at this point, accordingly, is to provide a hunting arrangement in the form of a distributor, which starts distributing 1 level pulses to the second inputs of AND gates in blocks 6, l1 and 15, starting from block 6, or in any sequence that may be desired, so that whichever AND gate has one of its inputs driven into 1 level by the output of its associated pass-band filter, its following one-shot operates by the distributor pulse for theprearranged signal regrouping. During any of these signal regrouping operation, the mutually related amplitude ratios of these regrouped signals are matched with prearranged ratio measuring groups to make sure that the proper signal regrouping has been established for recognition of the sound information. If not, however, the signal regrouping continues until the correct regrouping produces an output signal representing the information sought. In order to save time in these regrouping operations, the distributor pulses are made much shorter than the output pulses of the one-shots, and the distribution frequency is made practically high enough to skip the inoperable AND gates (blocks 6, 11, 15) in negligible length of time. During the operation of any one-shot, of course, the distributor is inhibited for the required analytical performance.
Having described the general purpose of the arrangement in FIG. 1, details will now be given, as in the following:
ALTERNATE SWITCHING SIGNALS It had been described above that the capacitors C1 and C2 are charged and discharged alternately during succeeding pitch periods in series with O1 and Q3 respectively, and the charged capacitors are switched to the common output of Q13 and Q14 alternately for analysis. The sequence of this operation is illustrated in FIG. 2, wherein during first part of the READ period the Q1 is in OFF state for transmitting the charge of C1 to the block 4. After the READ period has ended, the Q13 becomes in OFF state, and the Q2 becomes in ON (shaded) state for discharging the capacitor C2. During this READ period, the Q3 becomes in ON state for charging the capacitor C2, while the Q14 becomes in OFF state for preventing any transfer of this charging action to the common output of Q13 and Q14. The READ period of the charge of C2 is shown by the ON period of Q14 being in alternate periods of the Q13. Similarly, the discharging of the capacitor C2 is shown by the ON (shaded) periods of the O4 in alternate periods with respect to the Q2. With this sequence of alternate operations of the charging and discharging transistors to the capacitors C1 through C6, it is now necessary to provide the switching signals to these transistors, as in the following:
In FIG. 1 the voice sound wave in block 18 is applied to the pitch selector in block 19 (which may have a one-shot at its output for producing sharp edged pulses) for deriving pulse signals at pitch frequencies (major peaks). These pulse signals are applied to the clock flip-flop in block 20 for producing alternate output signals in steady state steps, which represent the waves of Q1 or O3 in FIG. 2. The steady state output signals of block 20 are applied directly and alternately to the first inputs of AND gates in blocks 21 and 22, and further also applied alternately to the inputs of setreset flip-flops in blocks 23 through 26 by way of direct and the differentiating coupling capacitors C7 through C10, respectively. The purpose of using the coupling capacitors C7 through C10 is to avoid steady state excitation at the inputs of these flip-flops. The second inputs of AND gates 21 and 22 are connected in parallel, and normally biased in backward direction so that signals in forward direction at their inputs alone will not render them operative. Thus assuming during one pitch period of the incoming sound wave that the upper terminal of clock flip-flop 20 is in positive polarity, it prepares the AND gate 21 ready for operation when a forward bias pulse upon its second input arrives. The lower terminal of block 20 applies a negative pulse to the input of set-reset flip-flop 26 to operate it in setposition for producing at its output the wave of 013 in FIG. 2. After a certain signal-distribution time period (the operation of which will be described further by the distributor arrangement), a pulse signal in forward direction arrives at the parallel connected second inputs of AND gates 21 and 22, causing the gate 21 to operate and apply a pulse signal in forward direction to the inputs of set-reset flip- flops 23 and 25. The flip-flop 23 is driven into reset-position (so as to terminate its READ period), and simultaneously drives the flip-flop 25 into set-position, which produces at its output the shaded portion (storage-discharge) of the wave 02 in FIG. 2. When the clock flip-flop in block 20 reverses its state of operation by the arriving pitch pulse, the input of set-reset flip-flop 25 receives a pulse in forward direction from the upper terminal of flip-flop 20, and it shifts into reset-operation for terminating the shaded portion of the wave O2 in FIG. 2. Simultaneously, the setreset flip-flop in block 24 operates in set-position for repeating the previous operation in alternate cycles.
** amplitude-equalizer in block 38, and to the pass-band filters in blocks 1, 2, 3, and to the nth filter used in the arrangement. This sound wave is also applied to the.
The terminals of flip-flops 23-26 are shown connected to the transistor control gates of Q1 through Q18, in series with the independent amplifiers in block 27. By using the present day component parts and integrated devices, the diagram in FIG. 1 indicates that the transistors Q1 through 018 are of the MOS FET type which require about l volts at the control gates for operation. Whereas, if the flip-flops used are of the type using about volts for operation, there are also available integrated circuit amplifiers for the purpose of interfacing with high and low operating devices, such as used in the drawing of FIG. 1, although different types of available devices may also be used, if so desired. The amplitude equalizer in block 28 is also shown, as used herein, for normalizing the amplitude variations of the sound wave prior to analysis, and it may be of any available type, or as described in my previous patent issues. Thus having described the alternate switching arrangement, the signal distributing (hunting) system may now be described, as in the following:
DISTRIBUTOR ARRANGEMENT OF FIG. 1.
As had been described in the foregoing, signal regrouping occurs when the first and second inputs of anyone of the AND gates 7, 12 and 16 are simultaneously driven into 1 level voltages. Thus the purpose of the distributor is to apply distributory 1 level signals to the second inputs of these gates (as a hunting process), so that the AND gate which has received 1 level signal at its first input will operate, and thereby causing its associated one-shot to follow operation for the required signal-regrouping action in block 4. The distributor is represented by the blocks 28 and 29, which for simplicity of design may be two of the available four-line-tol o-line-decoders connected in series for 32 distributory output pulses. The outputs of a binary counter are connected to the A, B, C, D inputs of the blocks 28, 29, so that the sequential pulses applied to the input terminal 5 of the binary counter 30 are converted into sequentially distributed pulses along the total of 32 independent outputs of the blocks 28 and 29. These outputs (phase inverted but not shown) are applied individually to the second inputs of the AND gates, only three of which are shown in the drawing. The use of 32 distributory outputs is shown for versatility of the arrangement in FIG. 1 for various other uses, and the number of these outputs may be more or less according to the type of complex wave analysis used for.
The input terminal 5 of binary counter in block 30 is driven by the clock generator in block 31 (for example, at a frequency of 300 KHZ) in series with the gate in block 32 and inverter block 33. The three inputs of the gate are normally at 1 level, so that the 1 level clock pulses are admitted to the input 5 of the counter 30 for operating the distributor. Thus assuming in one example that the first input of gate 7 has received a signal at 1 level from the RS flip-flop in block 6, the first distributor pulse (at 1 level after being phase inverted, not shown) applied to the second input of AND gate 7 causes operation of the one-shot 8, which in turn applies a pulse of 0.2 millisecond at 0 level to the input of gate 32 (in series with the multi-input gate 9 and the 3-input gate 34) to render it inoperative, and thereby stop the clock pulses passing to the input of counter 30 for the first combination of signal regrouping operation in the block 4. In a second example, assuming that the first input of gate 7 is at 0 level, and the first input of the gate 12 is at 1 level, the one-shot 8 does not operate, but the second distributory pulse causes operation of the one-shot in block 13. The output pulse of 0.2 millisecond long stops transmission of the clock pulses to the input of counter 30, and now the second combination of signal-regrouping is processed in the block 4. Such a perfonnance illustrates the time saving in the high frequency hunting (distributory) system, wherein, the time delay of 0.2 millisecond that would normally be required forthe operation of one-shot to the next, is skipped during only 8 microseconds during hunting periods. As described in the foregoing, however, the operation of one of the one- shots 8, 13, 17 n, may not besufficient for the correct signal regrouping, and several more of the number of signal-regrouping operations may be necessary before the correct signal regrouping has been established. Thus a counter is also included herein, so that it can reset the distributor after a predetermined number of counts, because more than a predetermined number of signal-regrouping counts is indicative of lack of information present. Thus the specific details of the counter'in connection with the distributor is explained, as in the following:
In detailing the operation of the distributor, and starting from a pitch pulse, assume that the pitch pulse operates the one-shot in block 35 which triggers and applies a resetting pulse to the clear terminal 14 of binary counter 36, and to the RS flip-flop 37 which applies 1 level signal to one of the inputsof gate 32. The normal output of gate 34 to one of the inputs of the gate 32 is also at 1 level, so that the clock pulses pass on to the counter 30 for operation. The counter 30 triggers during the input low-to-high transition period, which allows time for resetting itself first. The RS 40 also operates and resets the distributor in block 28 into normal operating state, while at the same time inhibiting the block 29. The counter 30 now starts operation and the block 28 starts distributing 1 level pulses (after inversion, not shown) to the first inputs of the AND gates 7, l2, l6 n. At the end of 16 distributory counts, the counter 30 resets itself by a pulse from its carry output 12 to th clear input 14, via gate and inverter in block 39, and triggers the R-S 40 which in turn inhibits the block 28 and activates the block 29, for a total of 32 distributory outputs.
After a predetermined number of counts by the binary counter in block 36 (or any other type of counter), it triggers the R-S flip-flop in block 37, which inhibits the AND gate 32, and applies 1 level signal to the inputs of AND gates in blocks 21 and 22 (in series with auxiliary gate 41) to start discharge of one of the pairs of storage capacitors (C1 or C2), as a new cycle of operation by the following pitch pulse. The counter 36 also sends a pulse signal to the signal-decoder block 42, (signal-amplitude ratio measuring circuits) which when activated by an incoming sound information (speech) a typewriter key is actuated in block 43, as a final translation from sound to visible indicia.
In the case that the distributor in block 29 operates to the nth output terminal without causing operation of the counter in block 36 for reset cycling, the signal from the nth terminal (y) is connected to the input of gate 41 to effect discharge of the capacitors C1-C6, so that recycling of the arrangement can start on the following pitch pulse.
SlGNAL-AMPLITUDE-RATIO MEASURING CIRCUIT OF FIG. 3
signals one with respect to another are further matched with predetermined ratios for final decision if the selected group of signals represent a specific sound information. The amplitude ratio measuring arrangement shown herein in FIG. 3 is similar to the arrangement shown in FIG. 14 of my disclosure in my U.S. Pat. No. 3,622,706 Nov. 23, 1971, with modified component parts for the sake of greater stability that be obtained for practical purposes. Thus assuning that the amplitude ratio between the signals A and B (FIG. 3) has to be matched with that of a predetermined ratio, the A signal is applied to the transistor Q29 and the B signal is applied to the transistor 030. As had been described by way of the one- shots 8, 13, 17 n, the selected group of signals are in ON states within only 0.2 milliseconds, which cause proportional currents through the primaries of the transformers T4 and T5. One of the terminals of the secondaries of T4 and T5 are connected to ground in series with the bias source B1, and the other terminals are labeled as OUT representing the output terminals. The secondaries of T4 and T5 are also shunted by diodes D13 ad D14, respectively, in series with the bias source B2. These diodes are used to prevent oscillation in the secondaries after initiating a pulse current (but may be omitted with the B1 and B2 if not found necessary), and the voltage levels of bias sources B1 and B2 are adjusted equal to the conducting threshold gaps of the diodes D13 and D14, so that the diodes will start conducting from close to zero voltage level across the secondaries of T4 T5.
The output of the secondary of T4 is connected to ground in series with one of the signal-mixing diodes D15 through D17, and resistor R4, and the output of the secondary of T5 is connected to ground in series with one of the signal-mixing diodes D18 to D20, and resistor R5. For each specific sound information the voltage gains across R4 and R5 are preadjusted, and applied to the gate electrodes of transistors Q31 and Q32, respectively. The resultant effect is that, the oppositely polarized voltages across secondaries of T6 and T7 will either nullify to zero voltage for a specific gain ratio adjustments across R4 and R5, or above zero voltage (positive or negative) when the incoming information is other than the fixed gain adjustments. Finally, the outputs of series connected secondaries of T4 and T5 are applied to the gate electrode of amplifier transistor Q33, and the secondary of T8 is full wave rectified by the diodes D21, D22 for application in 0 level to one of the inputs of the AND gate in block 44. The other input of this gate is normally biasd to 0 level, so that the output of gate 44 will not operate the one-shot in block 45, until both inputs of gate 44 are at 1 levels.
Thus when the output pulse from the gate 32 in FIG. 1 is applied to the one-shot in block 46 of FIG. 3, it operates with a delay pulse and further operates the oneshot in block 47 through the differentiating coupling capacitor C11. The output pulse of one-shot 47 is finally applied to one of the inputs of gate in block 44 in 1 level. If at this time the other input has not received 0 level voltage from the secondary of T8, and has re mained at 1 level, the gate 44 operates the one-shot, in block 45 for the required typing of a letter symbol. On the other hand, if the gate 44 has received 0 level signal from T8, it remains inoperative by the arriving pulse from one-shot in block 47. The diode 21 is used to prevent excessive voltage applied to the gate 44, but may be dispensed with without affecting the operation of the system. Similarly, amplitude-ratio measuring arrangements other than shown herein may be used as long as it serves the purpose of the present invention, for example, voltage comparing circuits such as com mercially available in the form of integrated devices may also be used. The circuit shown in FIG. 3, however, is simple, and because of the transformers used for obtaining null signal, any number of transformer secondaries may be inserted in series with the secondaries of transformers T6 and T7.
While specific embodiments of the invention have been selected to describe the invention, it is obvious to the skilled in the art that may be considered as exemplary, and that the invention is not limited in its utility. Accordingly, it is also obvious that various modifications, adaptations, and substitutions of parts may be made without departing from the true spirit and scope I of the invention.
What I claim, is:
1. In complex sound waves wherein the information is contained within the time periods of trains of waves, and said information is represented by the frequency and amplitude ratios between a group of resonances in each train of the complex wave, but wherein spectral variations occur, for different voices, the system for detecting and storing the peak amplitudes of said group of resonances during the time periods of said trains successively, adaptable to stored-signal regrouping as a representation of normalyzing said spectral variations, and analyzing the amplitude ratios between said regrouped signals, said system comprising a plurality of band-pass filters to separate the voice signals peak amplitude resonances into corresponding channels, each filter output feeding a parallel-pair of channels of detector/gate/storage/ discharge and mixer-gate channels to a common gated mixer output, said parallel pair of detector channels for alternate switching and storage of the filter output, where a plurality of detector means for detecting and storing the peak ampltudes of the resonances in said complex wave, each detector means consisting of a first pair of storage means for alternate storage of said detected signals, a second pair of discharging means for said storages, and a third pair of mixing means for said stored signals to a common output terminal; means for producing undirectional timing pulses at the frequency rate of said trains; a flip-flop (block 20), and means for operating it by said unidirectional timing pulses for producing alternate steady state output signals at the rate of said trains; coupling means from said steady state signals to said first pair of storage means alternately for effecting alternate signal storages of the detected peaks of said resonances during the successive time periods of said trains; normally inoperative A and B AND-gates (blocks 21, 22); parallel connections of the first inputs of said A and B AND-gates; direct coupling means from said steady state signals alternately to the second inputs of said A and B AND-gates; a one-shot (block 35) having operating time period not longer than the shortest time period that occurs during said trains; coupling means from said one-shot to said parallel connected first inputs of said A and B AND- gates; coupling means from said unidirectional timing pulses to said one-shot for operation, whereby operating said A and B AND-gates alternately during the first time portions of said trainsymeans for deriving alternate timing pulses (from block 20) at the frequency rate of said trains; first, second and third, fourth setreset flip-flops (R-S blocks 23, 24 and 25, 26), each flip-flop having set and reset inputs and set and reset outputs, respectively; A parallel coupling means of the set input of the first flip-flop with that of the reset input of said fourth flip-flop, and B parallel coupling means of the set input of the second flip-flop with that of the reset input of the third flip-flop; means for applying said alternate timing pulses to said A and B coupling means alternately, whereby effecting operation of the first and second flip-flops in alternate set states during the operating time periods of said one-shot, and operation of the third and fourth flip-flops in alternate reset states during quiescence of said one-shot; and coupling means from the set outputs of said first and second flipflops to said third pair of mixing means, and the reset outputs of said third and fourth flip-flops to said second pair of discharging means, for effecting storages of the detected signals of said common outputs, adaptable to said spectral normalization and analysis of said amplitude ratios.
2. The system as set forth in claim 1, wherein is included coupling means from said common outputs to a matrix of signal regrouping combinations, each combination having been prearranged for a specific regrouping of the signals at said common outputs under control of a control signal derived from a stored signal appearing at a specific output of said common outputs; coupling means from said common outputs to a plurality of signal sensing means, respectively, for deriving sensing signals from the stored signals at said common outputs; coupling means from said sensing means to the first inputs of a plurality of control AND-gates, respectively; means for applying sequential pulses to the second inputs of said plurality of AND-gates, whereby only those AND-gates which have received sensing signals at their first inputs operate by the distribution pulses, for producing said control signals; and coupling means from said control signals to said matrix for operating the respective signal regrouping combinations of said matrix, for regrouping the stored signals at said common outputs as a representation of said spectral normalization.
3. The system as covered by the claims 2, which in combination include a plurality of prearranged combinations of signal matching means; coupling means from said regrouped stored signals to said plurality of signal matching means for deriving a null signal from the combination that a match has been obtained by one of said sequential pulses, as a representation of said amplitude ratio analysis.

Claims (3)

1. In complex sound waves wherein the information is contained within the time periods of trains of waves, and said information is represented by the frequency and amplitude ratios between a group of resonances in each train of the complex wave, but wherein spectral variations occur, for different voices, the system for detecting and storing the peak amplitudes of said group of resonances during the time periods of said trains successively, adaptable to stored-signal regrouping as a representation of normalyzing said spectral variations, and analyzing the amplitude ratios between said regrouped signals, said system comprising a plurality of band-pass filters to separate the voice signal''s peak amplitude resonances into corresponding channels, each filter output feeding a parallelpair of channels of detector/gate/storAge/ discharge and mixergate channels to a common gated mixer output, said parallel pair of detector channels for alternate switching and storage of the filter output, where a plurality of detector means for detecting and storing the peak ampltudes of the resonances in said complex wave, each detector means consisting of a first pair of storage means for alternate storage of said detected signals, a second pair of discharging means for said storages, and a third pair of mixing means for said stored signals to a common output terminal; means for producing undirectional timing pulses at the frequency rate of said trains; a flip-flop (block 20), and means for operating it by said unidirectional timing pulses for producing alternate steady state output signals at the rate of said trains; coupling means from said steady state signals to said first pair of storage means alternately for effecting alternate signal storages of the detected peaks of said resonances during the successive time periods of said trains; normally inoperative A and B AND-gates (blocks 21, 22); parallel connections of the first inputs of said A and B AND-gates; direct coupling means from said steady state signals alternately to the second inputs of said A and B AND-gates; a one-shot (block 35) having operating time period not longer than the shortest time period that occurs during said trains; coupling means from said one-shot to said parallel connected first inputs of said A and B AND-gates; coupling means from said unidirectional timing pulses to said one-shot for operation, whereby operating said A and B AND-gates alternately during the first time portions of said trains; means for deriving alternate timing pulses (from block 20) at the frequency rate of said trains; first, second and third, fourth set-reset flip-flops (R-S blocks 23, 24 and 25, 26), each flipflop having set and reset inputs and set and reset outputs, respectively; A parallel coupling means of the set input of the first flip-flop with that of the reset input of said fourth flipflop, and B parallel coupling means of the set input of the second flip-flop with that of the reset input of the third flipflop; means for applying said alternate timing pulses to said A and B coupling means alternately, whereby effecting operation of the first and second flip-flops in alternate set states during the operating time periods of said one-shot, and operation of the third and fourth flip-flops in alternate reset states during quiescence of said one-shot; and coupling means from the set outputs of said first and second flip-flops to said third pair of mixing means, and the reset outputs of said third and fourth flip-flops to said second pair of discharging means, for effecting storages of the detected signals of said common outputs, adaptable to said spectral normalization and analysis of said amplitude ratios.
1. In complex sound waves wherein the information is contained within the time periods of trains of waves, and said information is represented by the frequency and amplitude ratios between a group of resonances in each train of the complex wave, but wherein spectral variations occur, for different voices, the system for detecting and storing the peak amplitudes of said group of resonances during the time periods of said trains successively, adaptable to stored-signal regrouping as a representation of normalyzing said spectral variations, and analyzing the amplitude ratios between said regrouped signals, said system comprising a plurality of band-pass filters to separate the voice signal''s peak amplitude resonances into corresponding channels, each filter output feeding a parallel-pair of channels of detector/gate/storAge/ discharge and mixer-gate channels to a common gated mixer output, said parallel pair of detector channels for alternate switching and storage of the filter output, where a plurality of detector means for detecting and storing the peak ampltudes of the resonances in said complex wave, each detector means consisting of a first pair of storage means for alternate storage of said detected signals, a second pair of discharging means for said storages, and a third pair of mixing means for said stored signals to a common output terminal; means for producing undirectional timing pulses at the frequency rate of said trains; a flip-flop (block 20), and means for operating it by said unidirectional timing pulses for producing alternate steady state output signals at the rate of said trains; coupling means from said steady state signals to said first pair of storage means alternately for effecting alternate signal storages of the detected peaks of said resonances during the successive time periods of said trains; normally inoperative A and B AND-gates (blocks 21, 22); parallel connections of the first inputs of said A and B AND-gates; direct coupling means from said steady state signals alternately to the second inputs of said A and B AND-gates; a one-shot (block 35) having operating time period not longer than the shortest time period that occurs during said trains; coupling means from said one-shot to said parallel connected first inputs of said A and B AND-gates; coupling means from said unidirectional timing pulses to said one-shot for operation, whereby operating said A and B AND-gates alternately during the first time portions of said trains; means for deriving alternate timing pulses (from block 20) at the frequency rate of said trains; first, second and third, fourth set-reset flip-flops (R-S blocks 23, 24 and 25, 26), each flip-flop having set and reset inputs and set and reset outputs, respectively; A parallel coupling means of the set input of the first flip-flop with that of the reset input of said fourth flip-flop, and B parallel coupling means of the set input of the second flip-flop with that of the reset input of the third flip-flop; means for applying said alternate timing pulses to said A and B coupling means alternately, whereby effecting operation of the first and second flip-flops in alternate set states during the operating time periods of said one-shot, and operation of the third and fourth flip-flops in alternate reset states during quiescence of said one-shot; and coupling means from the set outputs of said first and second flip-flops to said third pair of mixing means, and the reset outputs of said third and fourth flip-flops to said second pair of discharging means, for effecting storages of the detected signals of said common outputs, adaptable to said spectral normalization and analysis of said amplitude ratios.
2. The system as set forth in claim 1, wherein is included coupling means from said common outputs to a matrix of signal regrouping combinations, each combination having been prearranged for a specific regrouping of the signals at said common outputs under control of a control signal derived from a stored signal appearing at a specific output of said common outputs; coupling means from said common outputs to a plurality of signal sensing means, respectively, for deriving sensing signals from the stored signals at said common outputs; coupling means from said sensing means to the first inputs of a plurality of control AND-gates, respectively; means for applying sequential pulses to the second inputs of said plurality of AND-gates, whereby only those AND-gates which have received sensing signals at their first inputs operate by the distribution pulses, for producing said control signals; and coupling means from said control signals to said matrix for operating the respective signal regrouping combinations of said matrix, for regrouping the stored signals at said common outputs as a representation of said spectral noRmalization.
US368264A 1971-12-20 1973-06-08 Phonetic sound recognizer for all voices Expired - Lifetime US3870817A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US368264A US3870817A (en) 1971-12-20 1973-06-08 Phonetic sound recognizer for all voices

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US20966171A 1971-12-20 1971-12-20
US368264A US3870817A (en) 1971-12-20 1973-06-08 Phonetic sound recognizer for all voices

Publications (1)

Publication Number Publication Date
US3870817A true US3870817A (en) 1975-03-11

Family

ID=26904366

Family Applications (1)

Application Number Title Priority Date Filing Date
US368264A Expired - Lifetime US3870817A (en) 1971-12-20 1973-06-08 Phonetic sound recognizer for all voices

Country Status (1)

Country Link
US (1) US3870817A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3919481A (en) * 1975-01-03 1975-11-11 Meguer V Kalfaian Phonetic sound recognizer
US4039754A (en) * 1975-04-09 1977-08-02 The United States Of America As Represented By The Administrator Of The National Aeronautics And Space Administration Speech analyzer
US4343969A (en) * 1978-10-02 1982-08-10 Trans-Data Associates Apparatus and method for articulatory speech recognition
US4388495A (en) * 1981-05-01 1983-06-14 Interstate Electronics Corporation Speech recognition microcomputer
US4412098A (en) * 1979-09-10 1983-10-25 Interstate Electronics Corporation Audio signal recognition computer
WO1984000634A1 (en) * 1982-08-04 1984-02-16 Henry G Kellett Apparatus and method for articulatory speech recognition
WO1990011593A1 (en) * 1983-05-05 1990-10-04 Briar, Nellie, P. +Lf Method and apparatus for speech analysis

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3622706A (en) * 1969-04-29 1971-11-23 Meguer Kalfaian Phonetic sound recognition apparatus for all voices

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3622706A (en) * 1969-04-29 1971-11-23 Meguer Kalfaian Phonetic sound recognition apparatus for all voices

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3919481A (en) * 1975-01-03 1975-11-11 Meguer V Kalfaian Phonetic sound recognizer
US4039754A (en) * 1975-04-09 1977-08-02 The United States Of America As Represented By The Administrator Of The National Aeronautics And Space Administration Speech analyzer
US4343969A (en) * 1978-10-02 1982-08-10 Trans-Data Associates Apparatus and method for articulatory speech recognition
US4412098A (en) * 1979-09-10 1983-10-25 Interstate Electronics Corporation Audio signal recognition computer
US4388495A (en) * 1981-05-01 1983-06-14 Interstate Electronics Corporation Speech recognition microcomputer
WO1984000634A1 (en) * 1982-08-04 1984-02-16 Henry G Kellett Apparatus and method for articulatory speech recognition
WO1990011593A1 (en) * 1983-05-05 1990-10-04 Briar, Nellie, P. +Lf Method and apparatus for speech analysis

Similar Documents

Publication Publication Date Title
Ainsworth Duration as a cue in the recognition of synthetic vowels
US2646465A (en) Voice-operated system
US3870817A (en) Phonetic sound recognizer for all voices
JPS6466698A (en) Voice recognition equipment
US4461023A (en) Registration method of registered words for use in a speech recognition system
US3006228A (en) Circuit for use in musical instruments
US3304369A (en) Sound actuated devices
US3770891A (en) Voice identification system with normalization for both the stored and the input voice signals
WO2001067435A9 (en) Method for the voice-controlled initiation of actions by means of a limited circle of users, whereby said actions can be carried out in an appliance
US3665088A (en) Keyer circuit for an electronic musical instrument wherein a single switch may actuate a single note or a chord
US3622706A (en) Phonetic sound recognition apparatus for all voices
US4109104A (en) Vocal timing indicator device for use in voice recognition
US3211833A (en) Sound transmitting device
US3678201A (en) Bandwidth compression system in phonetic sound spectrum
US3067288A (en) Phonetic typewriter of speech
US3688009A (en) Musical device for automatically producing tone patterns
JPH0315898A (en) Method of recognizing voice
EP0527535A2 (en) Apparatus for transmission of speech
DE2539251C3 (en) Arpeggio circuit for an electronic musical instrument
US3659051A (en) Complex wave analyzing system
US3322898A (en) Means for interpreting complex information such as phonetic sounds
Arnold et al. The synthesis of English vowels
US2907244A (en) Electric musical instrument
US3432617A (en) Speech sound wave analysis
EP0519360B1 (en) Apparatus and method for speech recognition