WO2009090281A1 - Method of converting 5.1 sound format to hybrid binaural format - Google Patents

Method of converting 5.1 sound format to hybrid binaural format Download PDF

Info

Publication number
WO2009090281A1
WO2009090281A1 PCT/ES2008/070246 ES2008070246W WO2009090281A1 WO 2009090281 A1 WO2009090281 A1 WO 2009090281A1 ES 2008070246 W ES2008070246 W ES 2008070246W WO 2009090281 A1 WO2009090281 A1 WO 2009090281A1
Authority
WO
WIPO (PCT)
Prior art keywords
effects
music
signals
channels
format
Prior art date
Application number
PCT/ES2008/070246
Other languages
Spanish (es)
French (fr)
Inventor
Ivan Portas Arrondo
Original Assignee
Auralia Emotive Media Systems, S,L.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Auralia Emotive Media Systems, S,L. filed Critical Auralia Emotive Media Systems, S,L.
Publication of WO2009090281A1 publication Critical patent/WO2009090281A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S3/00Systems employing more than two channels, e.g. quadraphonic
    • H04S3/008Systems employing more than two channels, e.g. quadraphonic in which the audio signals are in digital form, i.e. employing more than two discrete digital channels
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S3/00Systems employing more than two channels, e.g. quadraphonic
    • H04S3/002Non-adaptive circuits, e.g. manually adjustable or static, for enhancing the sound image or the spatial distribution
    • H04S3/004For headphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/01Multi-channel, i.e. more than two input channels, sound reproduction with two speakers wherein the multi-channel information is substantially preserved

Definitions

  • the main object of the present invention is a method for converting sound into 5.1 sound format, usually used for recording and digital sound reproduction of cinematic content, in hybrid binaural format.
  • a sound system in 5.1 format represents the standard for the domestic sound reproduction of cinema.
  • a sound system in 5.1 format is composed of six audio channels where music, voice, sound effects, etc. are mixed in different proportions.
  • Each of the channels corresponds to a speaker, and in turn each of the speakers must be located in a specific location in relation to the user to achieve an optimal sound sensation.
  • the main speakers (FL and FR in Figure 1) ideally form an equilateral triangle with the user's position (O).
  • the lines formed by the surround speakers (SL and SR) and the user (O) form an angle of approximately 110 ° with respect to the vertical axis (straight that joins O and C).
  • the LFE (Low Frequency Enhancement) loudspeaker is intended to enhance bass sounds to produce a striking effect on reproduction. Its location is not decisive, since the information it transmits has a frequency spectrum generally less than 100 Hz, which has an omnidirectional nature. That is, you cannot determine where the sound comes from.
  • a drawback of audio systems based on the 5.1 format is that the user's sound sensation deteriorates rapidly when it is not located in the optimal location with respect to the speakers.
  • the use of headphones allows, however, an optimal positioning of the user at all times, since the sound reproduction systems, being attached to the user's head, do not modify their relative position with respect to their head.
  • the human being is a volumetric sound receiver, that is, it processes the sound that reaches it through, for example, reflections created by the shoulders and torso, or diffractions created by the sound when surrounding the head.
  • Human hearing is by nature binaural, where the result of the entire sound reception process ends in only two channels: right ear and left ear.
  • the term "binaural" refers to the nature of human hearing, because people are able to capture all the spatial sound information through a single pair of ears.
  • intracranial sound is usually produced, such as when listening to traditional stereo sound through headphones.
  • the intracranial sound consists in the sensation that the sound sources are inside the skull of the user, at a point located between the two headphones, so that traditional stereo sound is not an advisable format when trying to represent in a way Realistic three-dimensional sound spaces.
  • the first of these consists in replacing the pair of point receivers that are usually used by volumetric receivers, such as dummies, thereby achieving that the sound that reaches them is processed naturally. In this way a binaural stereo recording is achieved, where all the phenomenology described above is already introduced.
  • the second is based on performing an auralization procedure. For This usually measures or models the response of a certain receiver (a dummy or a human being, for example) to an impulse signal from a certain point in space (usually a broadband noise emitted from a certain point around the Username).
  • US Patent 2007213990 describes a method for transforming a traditional bacchanal stereo signal into a binaural signal, focusing on the treatment that the input signal must undergo for its preparation to be transformed into three-dimensional sound. Specifically, it is described how to divide the input signal according to different frequency bands so that, once the input signal is divided, auralize each sub-band and finally join them to form the two output channels in binaural format.
  • the present invention describes a new method for real-time audio auralization in 5.1 format. To achieve an optimal result, each channel is treated and auralized independently, so that it is possible to assign specific acoustic parameters to each of them in order to make the reproduction more realistic and spectacular.
  • the hybrid model described which combines the auralization of the FL, FR, SL and SR channels with the original C and LFE monophonic channels allows greater intelligibility of the dialogues, since there is no interference between the front channels and the C channel, as well as a superior immersion due to the constant unconscious referencing made by the brain between the C channel monophonic and the auralized channels.
  • the readjustment of the proportions of the different types of information allows to optimize from the beginning the content of the different channels to achieve an optimal result.
  • the reinforcement of the LFE channel allows to recreate the sensations produced by the serious components in the cinemas, balancing the reproduction system.
  • the term "auralize” refers to the processing of the different channels to get the user to have the impression that they come from specific places in space, thus achieving optimized spectacularity and intelligibility.
  • channel refers to the signal of each of the speakers that make up the 5.1 sound format or the hybrid binaural sound format.
  • the FL, FR, C, SL, SR or LFE channels which are the input channels in 5.1 format
  • L and R channels which are the output channels in binaural format.
  • the letters “L” and “R” will be used to distinguish between the positions of the channels located to the left (left, in English) and right (right, in English) of the user.
  • frontal plane” and “rear plane” will also be used to refer to the position of the channels in front of the user or behind the user, as well as “right side plane” or “left side plane” to refer to the position of the channels to the sides of the user.
  • the term "source” refers to a signal that contains sounds from a single physical process, that is, the sources will be, in general, music, voice and effects.
  • hybrid binaural is also defined as a sound format that mixes auralized channels with non-auralized or monophonic channels. Specifically, the present invention mixes the auralized channels FL, FR, SL and SR with the non-auralized channels C and LFE.
  • a method of converting from sound format 5.1 to hybrid binaural comprises the following operations:
  • FL mainly contains music, and to a lesser extent voice and effects.
  • FR contains mainly music, and to a lesser extent voice and effects.
  • C contains mainly voice, and to a lesser extent music and effects.
  • SL contains mainly effects, and to a lesser extent music.
  • SR contains mainly effects, and to a lesser extent music.
  • LFE contains only serious.
  • FR elevation of 0 or 30 °; azimuth from +10 to + 30 °.
  • SL elevation from 175 ° to 195 °; azimuth from -30 ° to -60 °.
  • SR elevation from 175 ° to 195 °; azimuth from + 30 ° to + 60 °.
  • auralizing a channel in a certain position means virtually locating that channel so that the reproduction of the resulting signals, one for the right channel and one for the left channel, through headphones produce the sensation in the user of that the sounds of that channel come from that particular position of space.
  • auralizing is a process by which a channel lacking usually monophonic spatial information, as in this case, that is, anechoic or dry, is processed by a procedure called convolution, with the impulse response (response in time and frequency at a given acoustic stimulus from a certain point in space) of a particular listener.
  • the response of a certain receiver (a dummy or a human being for example) to a pulse signal from a certain point in space (usually broadband noise emitted) is modeled or measured from a certain point around the user).
  • This response to the user's impulse is later used to process a monophonic source (without spatial information) through a convolution process, thus achieving the effect of listening to said source located at the point where the impulse has been emitted.
  • the inventors have discovered that placing virtually the FL, FR, SL and SR channels within the angular ranges described above gives all users a feeling of optimal spectacularity.
  • the reason that the angular ranges of the front speakers (FL and FR) are not very large is to avoid the loss of intelligibility of the dialogue channel (C) due to an excessive stereo image of the music, that is, that the energy of the FL channel goes almost completely to L and the energy of FR goes almost completely to R, and avoid the arrival of a large amount of energy to the lateral planes, near the ears that interfere with the location of the rear plane channels (SL and SR).
  • the dialogue channel (C) is not processed in the processing operation of the signals of the FL, FR, SL and SR channels, since maintaining it as a source provides two great advantages to the final output of the procedure.
  • the first one is to gain in intelligibility with respect to the input format, since by keeping this channel intact and auralizing those of the frontal (FL and FR) and rear (SL and SR) planes, the dialogues (C) are highlighted in Ia central position, reducing hearing fatigue for follow-up.
  • the second advantage lies in the fact that it constitutes an auditory reference point for the brain, since maintaining its intracranial nature makes its combination with the auralized channels ideal. In this way, the brain constantly compares the position of this channel with the auralized ones, making the user's auditory experience much more spectacular.
  • the LFE channel is also not processed in this procedure operation due to the non-directional nature of the frequencies it contains, that is, it gives the sensation of being heard in all positions. This feature makes that the speakers intended for the reproduction of this channel can be placed practically anywhere in the enclosure.
  • the front (FLi, FRi) and rear (SU, SRi) plane channels are processed independently using two impulse responses from different optimized enclosures.
  • the separate processing of the front and rear channels provides the advantage of using two different virtual enclosures, giving more depth only to the rear channels, which are the ones with the most spectacular effects. Excessive depth in the front channels, however, would make the intelligibility of the dialogues difficult.
  • the reverberation introduced in the Fl_i and FRi channels is within the range of 0.5 seconds to 1 second, and the reverberation introduced in the SU and SRi channels is within the range of 1 second to 3.5 seconds
  • the signals from the front plane FL 2 and FR 2 are obtained as output, and the signals from the rear plane SL 2 and SR 2
  • the conversion procedure of sound format 5.1 to hybrid binaural comprises, prior to the final mixing operation, compressing the LFE channel signal, obtaining an LFE 'signal.
  • Another preferred embodiment of the invention comprises, prior to Ia auralization operation, the operations of:
  • the mixing of the sources L music, R music, voice and front effects, rear effects L and rear effects R to obtain the channels is performed according to the following percentage ranges:
  • the LFE bass channel is already an independent component in itself, and therefore its information is not redundant in the other channels. For this reason it is not included in the optional initial separation and mixing operations.
  • this also extends to computer programs, in particular computer programs contained in a carrier, adapted to carry out the operations of the described procedure.
  • the program can be in the form of a source code, object code or an intermediate code between the source code and the object code, as a partially compiled form, or in any other suitable way to implement the operations of the invention.
  • the carrier can be any device or entity capable of transporting the program.
  • the carrier can comprise a storage medium, such as a ROM, a CD ROM or any other magnetic storage medium, for example a floppy disk or a hard disk.
  • the carrier can be a transmission carrier, such as an electrical or optical signal that can be communicated through electric, optical, radio or any other way.
  • the carrier can be an integrated circuit in which the program is stored, the circuit being adapted to carry out the operations of the procedure.
  • the carrier could be an ASIC, an FPGA, a DSP, a microprocessor or a microcontroller.
  • Figure 1. Shows a view of the location of the physical speakers of a cinema in a 5.1 sound format.
  • Figure 2.- Shows an explanatory scheme of the position of the elevation angles ( ⁇ ) and azimuth ( ⁇ ).
  • Figure 3. Shows a general scheme of the operations of the process according to the present invention.
  • Figure 1 shows the position of the speakers of the channels in a movie theater in relation to the position in which the user must be located for an optimum sound experience.
  • the procedure is carried out by a computer that, first, as shown in Figure 3, obtains from the DVD the signals of the original channels in 5.1 format (FL, FR, C, SL, SR, LFE ).
  • the LFE channel is separated to be processed in parallel independently, suffering only a compression that results in the LFE 'signal.
  • a selector (S) is provided that allows the user select or not the optional operations of extracting the sources from the original channels and remixing them according to new proportions to enhance the spectacular nature of the film.
  • the sources L music, R music, voice and front effects, rear effects L and rear effects R
  • the sources are separated, for example using the source separation algorithm by independent component analysis 'FastICA', developed by HUT (Helsinki University of Technology), to re-mix them according to new optimized proportions.
  • HUT Heelsinki University of Technology
  • the dialogue channel (C) is separated from the rest, the channels FL ', FR', SL 'and SR' are each amalized in an optimal geometric situation to enhance the spectacular user sound experience.
  • the listener has the characteristics of a standard user based on the impulse responses of a Kemar dummy.
  • FR ' elevation 15 °; azimuth 20 ° SL ': 180 ° elevation; azimuth -40 ° SR ': 180 ° elevation; azimuth 40 °
  • Figure 2 shows the reference of the location of the elevation and azimuth angles, respectively ⁇ and ⁇ .
  • the channels obtained in the previous operation FL'2, FR'2, SL'2 and SR'2 are mixed with the LFE 'and C channels to obtain only two signals in hybrid binaural format corresponding to the L and R channels of headphones.

Abstract

Method of converting 5.1 sound format to hybrid binaural format, comprising obtaining the signals from the FL, FR, C, SL, SR and LFE channels in 5.1 format which it is desired to convert into hybrid binaural format; auralizing the FL, FR, SL and SR channels in the following positions: FL: elevation from 0° to 30°, azimuth from -10° to -30°; FR: elevation from 0° to 30°, azimuth from +10° to +30°; SL: elevation from 175° to 195°, azimuth from -30° to -60°; SR: elevation from 175° to 195°, azimuth from +30° to +60°, thus obtaining the signals FL1, FR1, SL1 and SR1; modelling the response from the enclosure on the basis of the signals, introducing a reverberation effect; and mixing the signals FL2, FR2, SL2 and SR2 obtained in the previous step with the original LFE and C signals to obtain the two left and right output signals.

Description

PROCEDIMIENTO DE CONVERSIÓN DE FORMATO SONORO 5.1 A SOUND FORMAT CONVERSION PROCEDURE 5.1 A
BINAURAL HÍBRIDOHYBRID BINAURAL
D E S C R I P C I Ó ND E S C R I P C I Ó N
OBJETO DE LA INVENCIÓNOBJECT OF THE INVENTION
El objeto principal de Ia presente invención es un procedimiento para convertir sonido en formato sonoro 5.1 , habitualmente utilizado para registro y reproducción sonora digital de contenido cinematográfico, en formato binaural híbrido.The main object of the present invention is a method for converting sound into 5.1 sound format, usually used for recording and digital sound reproduction of cinematic content, in hybrid binaural format.
ANTECEDENTES DE LA INVENCIÓNBACKGROUND OF THE INVENTION
Actualmente, el formato 5.1 representa el estándar para Ia reproducción sonora doméstica de cine. Un sistema sonoro en formato 5.1 está compuesto por seis canales de audio donde se mezclan en distintas proporciones las señales de música, voz, efectos sonoros, etc. Cada uno de los canales corresponde a un altavoz, y a su vez cada uno de los altavoces debe estar situado en una ubicación concreta con relación al usuario para conseguir una sensación sonora óptima.Currently, the 5.1 format represents the standard for the domestic sound reproduction of cinema. A sound system in 5.1 format is composed of six audio channels where music, voice, sound effects, etc. are mixed in different proportions. Each of the channels corresponds to a speaker, and in turn each of the speakers must be located in a specific location in relation to the user to achieve an optimal sound sensation.
Los altavoces principales (FL y FR en Ia Figura 1 ) forman idealmente un triángulo equilátero con Ia posición del usuario (O). Además, las rectas formadas por los altavoces envolventes (SL y SR) y el usuario (O) forman un ángulo de aproximadamente 110° con respecto al eje vertical (recta que une O y C). El altavoz LFE (Low Frequency Enhancement, en sus siglas en inglés), tiene por objeto realzar los sonidos graves para producir un efecto impactante en Ia reproducción. Su ubicación no es determinante, ya que Ia información que transmite tiene un espectro de frecuencias generalmente menor de 100 Hz, que tiene una naturaleza omnidireccional. Es decir, no se puede determinar de donde proviene el sonido.The main speakers (FL and FR in Figure 1) ideally form an equilateral triangle with the user's position (O). In addition, the lines formed by the surround speakers (SL and SR) and the user (O) form an angle of approximately 110 ° with respect to the vertical axis (straight that joins O and C). The LFE (Low Frequency Enhancement) loudspeaker is intended to enhance bass sounds to produce a striking effect on reproduction. Its location is not decisive, since the information it transmits has a frequency spectrum generally less than 100 Hz, which has an omnidirectional nature. That is, you cannot determine where the sound comes from.
Un inconveniente de los sistemas de audio basados en el formato 5.1 es que la sensación sonora del usuario se deteriora rápidamente cuando éste no está situado en Ia ubicación óptima con respecto de los altavoces. La utilización de auriculares permite, sin embargo, una colocación óptima del usuario en todo momento, ya que los sistemas de reproducción sonora, al estar adheridos a Ia cabeza del usuario, no modifican su posición relativa respecto a su cabeza.A drawback of audio systems based on the 5.1 format is that the user's sound sensation deteriorates rapidly when it is not located in the optimal location with respect to the speakers. The use of headphones allows, however, an optimal positioning of the user at all times, since the sound reproduction systems, being attached to the user's head, do not modify their relative position with respect to their head.
Sin embargo, el ser humano es un receptor sonoro volumétrico, es decir, procesa el sonido que llega a él a través de, por ejemplo, reflexiones creadas por los hombros y el torso, o difracciones creadas por el sonido al rodear Ia cabeza. La audición humana es por naturaleza binaural, donde Ia resultante de todo el proceso de recepción sonora termina en dos únicos canales: oído derecho y oído izquierdo. El término "binaural" hace referencia a Ia naturaleza de Ia audición humana, debido a que las personas somos capaces de captar toda Ia información espacial sonora a través de un único par de oídos.However, the human being is a volumetric sound receiver, that is, it processes the sound that reaches it through, for example, reflections created by the shoulders and torso, or diffractions created by the sound when surrounding the head. Human hearing is by nature binaural, where the result of the entire sound reception process ends in only two channels: right ear and left ear. The term "binaural" refers to the nature of human hearing, because people are able to capture all the spatial sound information through a single pair of ears.
Cuando no se tiene en cuenta esta fenomenología se suele producir el denominado "sonido intracraneal", como por ejemplo al escuchar sonido estéreo tradicional a través de auriculares. El sonido intracraneal consiste en Ia sensación de que las fuentes sonoras se encuentran en el interior del cráneo del usuario, en un punto situado entre los dos auriculares, por Io que el sonido estéreo tradicional no es un formato aconsejable cuando se tratan de representar de forma realista espacios sonoros tridimensionales.When this phenomenology is not taken into account, the so-called "intracranial sound" is usually produced, such as when listening to traditional stereo sound through headphones. The intracranial sound consists in the sensation that the sound sources are inside the skull of the user, at a point located between the two headphones, so that traditional stereo sound is not an advisable format when trying to represent in a way Realistic three-dimensional sound spaces.
Existen fundamentalmente dos formas de lograr reproducciones binaurales:There are fundamentally two ways to achieve binaural reproductions:
La primera de ellas consiste en sustituir el par de receptores puntuales que se utiliza habitualmente por receptores volumétricos, como maniquíes, logrando de ese modo que el sonido que llega a ellos se procese de forma natural. De este modo se logra una grabación estéreo binaural, donde queda ya introducida toda Ia fenomenología descrita anteriormente.The first of these consists in replacing the pair of point receivers that are usually used by volumetric receivers, such as dummies, thereby achieving that the sound that reaches them is processed naturally. In this way a binaural stereo recording is achieved, where all the phenomenology described above is already introduced.
La segunda se basa en realizar un procedimiento de auralización. Para ello, se suele medir o modelar Ia respuesta de un determinado receptor (un maniquí o un ser humano, por ejemplo) a una señal de impulso proveniente de un determinado punto del espacio (habitualmente un ruido de banda ancha emitido desde un determinado punto alrededor del usuario). La patente US 2007213990 describe un método para transformar una señal estéreo bacanal tradicional en una señal binaural, centrándose en el tratamiento que debe sufrir Ia señal de entrada de cara a su preparación para ser transformada en sonido tridimensional. Específicamente, se describe cómo dividir Ia señal de entrada según distintas bandas de frecuencia para, una vez dividida Ia señal de entrada, auralizar cada sub-banda y finalmente unirlas para formar los dos canales de salida en formato binaural.The second is based on performing an auralization procedure. For This usually measures or models the response of a certain receiver (a dummy or a human being, for example) to an impulse signal from a certain point in space (usually a broadband noise emitted from a certain point around the Username). US Patent 2007213990 describes a method for transforming a traditional bacchanal stereo signal into a binaural signal, focusing on the treatment that the input signal must undergo for its preparation to be transformed into three-dimensional sound. Specifically, it is described how to divide the input signal according to different frequency bands so that, once the input signal is divided, auralize each sub-band and finally join them to form the two output channels in binaural format.
DESCRIPCIÓN DE LA INVENCIÓNDESCRIPTION OF THE INVENTION
La presente invención describe un nuevo procedimiento para Ia auralización en tiempo real de audio en formato 5.1. Para lograr un resultado óptimo, cada canal es tratado y auralizado independientemente, de modo que es posible asignar parámetros acústicos específicos a cada uno de ellos con el objetivo de hacer más realista y espectacular Ia reproducción.The present invention describes a new method for real-time audio auralization in 5.1 format. To achieve an optimal result, each channel is treated and auralized independently, so that it is possible to assign specific acoustic parameters to each of them in order to make the reproduction more realistic and spectacular.
Las ventajas más importantes del procedimiento de Ia invención se pueden resumir en las siguientes:The most important advantages of the process of the invention can be summarized in the following:
Se consigue una reproducción óptima en todos los casos, ya que, al estar los auriculares adheridos al usuario, Ia posición relativa entre el sistema de reproducción y el usuario no varía.Optimum reproduction is achieved in all cases, since, since the headphones are attached to the user, the relative position between the playback system and the user does not vary.
El modelo híbrido que se describe, que combina Ia auralización de los canales FL, FR, SL y SR con los canales monofónicos originales C y LFE permite una mayor inteligibilidad de los diálogos, al no existir interferencias entre los canales frontales y el canal C, así como una inmersión superior debido a Ia constante referenciación inconsciente que realiza el cerebro entre el canal C monofónico y los canales auralizados.The hybrid model described, which combines the auralization of the FL, FR, SL and SR channels with the original C and LFE monophonic channels allows greater intelligibility of the dialogues, since there is no interference between the front channels and the C channel, as well as a superior immersion due to the constant unconscious referencing made by the brain between the C channel monophonic and the auralized channels.
El reajuste de las proporciones de los diferentes tipos de información, mediante Ia separación de fuentes y posterior remezclado, permite optimizar desde el inicio el contenido de los distintos canales para lograr un resultado óptimo.The readjustment of the proportions of the different types of information, by means of the separation of sources and subsequent remixing, allows to optimize from the beginning the content of the different channels to achieve an optimal result.
La colocación virtual específica de los canales FL y FR, así como el modelado del recinto específico, permiten un perfecto equilibrio con el canal de diálogos C, no interfiriendo con su inteligibilidad y dotando de Ia profundidad justa al plano frontal.The specific virtual placement of the FL and FR channels, as well as the modeling of the specific enclosure, allow a perfect balance with the dialogue channel C, not interfering with its intelligibility and providing the frontal plane with just depth.
La colocación virtual específica de los canales SL y SR, así como el modelado de un recinto específico diferente para los canales de los planos delantero y trasero, aportan una sensación de profundidad trasera impactante, dotando al sistema de planos diferenciados de reproducción sonora, creando de este modo una experiencia altamente inmersiva.The specific virtual placement of the SL and SR channels, as well as the modeling of a different specific enclosure for the channels of the front and rear planes, provide a sensation of striking rear depth, giving the system differentiated planes of sound reproduction, creating This way a highly immersive experience.
El refuerzo del canal LFE permite recrear las sensaciones producidas por las componentes graves en las salas de cine, equilibrando el sistema de reproducción.The reinforcement of the LFE channel allows to recreate the sensations produced by the serious components in the cinemas, balancing the reproduction system.
En el presente documento, el término "auralizar" hace referencia al procesado de los diferentes canales para conseguir que el usuario tenga Ia impresión de que provienen de lugares del espacio concretos, consiguiéndose así una espectacularidad e inteligibilidad optimizadas.In this document, the term "auralize" refers to the processing of the different channels to get the user to have the impression that they come from specific places in space, thus achieving optimized spectacularity and intelligibility.
Del mismo modo, el término "canal" hace referencia a Ia señal de cada uno de los altavoces que componen el formato de sonido 5.1 o el formato de sonido binaural híbrido. Así, hablaremos de los canales FL, FR, C, SL, SR o LFE, que son los canales de entrada en formato 5.1 y de los canales L y R, que son los canales de salida en formato binaural. Se utilizarán las letras "L" y "R" para distinguir entre las posiciones de los canales situadas a Ia izquierda (left, en inglés) y derecha (right, en inglés) del usuario. También se utilizarán los términos "plano frontal" y "plano trasero" para hacer referencia a Ia posición de los canales frente al usuario o detrás del usuario, así como "plano lateral derecho" o "plano lateral izquierdo" para hacer referencia a Ia posición de los canales a los lados del usuario.Similarly, the term "channel" refers to the signal of each of the speakers that make up the 5.1 sound format or the hybrid binaural sound format. Thus, we will talk about the FL, FR, C, SL, SR or LFE channels, which are the input channels in 5.1 format and the L and R channels, which are the output channels in binaural format. The letters "L" and "R" will be used to distinguish between the positions of the channels located to the left (left, in English) and right (right, in English) of the user. The terms "frontal plane" and "rear plane" will also be used to refer to the position of the channels in front of the user or behind the user, as well as "right side plane" or "left side plane" to refer to the position of the channels to the sides of the user.
Por otro lado, el término "fuente" se refiere a una señal que contiene sonidos de un solo proceso físico, es decir, las fuentes serán, en general, música, voz y efectos.On the other hand, the term "source" refers to a signal that contains sounds from a single physical process, that is, the sources will be, in general, music, voice and effects.
Se define también el término "binaural híbrido" como un formato sonoro que mezcla canales auralizados con canales no auralizados o monofónicos. Concretamente, Ia presente invención mezcla los canales auralizados FL, FR, SL y SR con los canales no auralizados C y LFE.The term "hybrid binaural" is also defined as a sound format that mixes auralized channels with non-auralized or monophonic channels. Specifically, the present invention mixes the auralized channels FL, FR, SL and SR with the non-auralized channels C and LFE.
De acuerdo con un aspecto de Ia presente invención, se describe un procedimiento de conversión de formato sonoro 5.1 a binaural híbrido, caracterizado porque comprende las siguientes operaciones:In accordance with one aspect of the present invention, a method of converting from sound format 5.1 to hybrid binaural is described, characterized in that it comprises the following operations:
1 ) Obtener las señales de los canales FL, FR, C, SL, SR y LFE del formato 5.1 que se desea convertir en formato binaural híbrido. La información que contienen estas señales es normalmente una mezcla de varias fuentes, donde:1) Obtain the signals of the FL, FR, C, SL, SR and LFE channels of the 5.1 format that you want to convert into hybrid binaural format. The information contained in these signals is usually a mixture of several sources, where:
FL: contiene principalmente música, y en menor medida voz y efectos.FL: mainly contains music, and to a lesser extent voice and effects.
FR: contiene principalmente música, y en menor medida voz y efectos. C: contiene principalmente voz, y en menor medida música y efectos. SL: contiene principalmente efectos, y en menor medida música. SR: contiene principalmente efectos, y en menor medida música. LFE: contiene únicamente graves.FR: contains mainly music, and to a lesser extent voice and effects. C: contains mainly voice, and to a lesser extent music and effects. SL: contains mainly effects, and to a lesser extent music. SR: contains mainly effects, and to a lesser extent music. LFE: contains only serious.
2) Auralizar los canales FL, FR, SL y SR en las siguientes posiciones: FL: elevación de 0o a 30°; azimut de -10o a -30°.2) Auralize the FL, FR, SL and SR channels in the following positions: FL: elevation of 0 or 30 °; azimuth from -10 or -30 °.
FR: elevación de 0o a 30°; azimut de +10 a +30°.FR: elevation of 0 or 30 °; azimuth from +10 to + 30 °.
SL: elevación de 175° a 195°; azimut de -30° a -60°. SR: elevación de 175° a 195°; azimut de +30° a +60°.SL: elevation from 175 ° to 195 °; azimuth from -30 ° to -60 °. SR: elevation from 175 ° to 195 °; azimuth from + 30 ° to + 60 °.
obteniéndose como resultado las señales FLi, FRi, SLi y SRi.resulting in the signals FLi, FRi, SLi and SRi.
Diremos que "auralizar" un canal en una posición determinada significa ubicar virtualmente ese canal de forma que Ia reproducción de las señales resultantes, una para el canal derecho y otra para el canal izquierdo, a través de unos auriculares producen Ia sensación en el usuario de que los sonidos de ese canal provienen de esa posición determinada del espacio.We will say that "auralizing" a channel in a certain position means virtually locating that channel so that the reproduction of the resulting signals, one for the right channel and one for the left channel, through headphones produce the sensation in the user of that the sounds of that channel come from that particular position of space.
Dicho de otro modo, auralizar es un proceso mediante el cual un canal carente de información espacial habitualmente monofónico, como en este caso, es decir, anecoico o seco, es procesado mediante un procedimiento llamado convolución, con Ia respuesta al impulso (respuesta en tiempo y frecuencia a un determinado estímulo acústico proveniente de un determinado punto del espacio) de un determinado oyente.In other words, auralizing is a process by which a channel lacking usually monophonic spatial information, as in this case, that is, anechoic or dry, is processed by a procedure called convolution, with the impulse response (response in time and frequency at a given acoustic stimulus from a certain point in space) of a particular listener.
Sin embargo, debido a las diferencias físicas entre los diferentes usuarios (tamaño, distancia entre los oídos, etc.), no todos ellos responden de igual modo ante los nuevos canales FLi, FRi, SLi y SRi.However, due to physical differences between different users (size, distance between ears, etc.), not all of them respond equally to the new FLi, FRi, SLi and SRi channels.
Para conocer Ia respuesta de cada tipo de usuario, se modela o se mide Ia respuesta de un determinado receptor (un maniquí o un ser humano por ejemplo) a una señal de impulso proveniente de un determinado punto del espacio (habitualmente ruido de banda ancha emitido desde un determinado punto alrededor del usuario). Ésta respuesta al impulso del usuario se utiliza más tarde para procesar una fuente monofónica (sin información espacial) mediante un proceso de convolución, logrando así el efecto de escuchar dicha fuente situada en el punto donde ha sido emitido el impulso.To know the response of each type of user, the response of a certain receiver (a dummy or a human being for example) to a pulse signal from a certain point in space (usually broadband noise emitted) is modeled or measured from a certain point around the user). This response to the user's impulse is later used to process a monophonic source (without spatial information) through a convolution process, thus achieving the effect of listening to said source located at the point where the impulse has been emitted.
Los inventores han descubierto que situar virtualmente los canales FL, FR, SL y SR dentro de los rangos angulares descritos anteriormente proporciona a todos los usuarios una sensación de espectacularidad óptima.The inventors have discovered that placing virtually the FL, FR, SL and SR channels within the angular ranges described above gives all users a feeling of optimal spectacularity.
El motivo de que los rangos angulares de los altavoces frontales (FL y FR) no sean muy grandes es evitar Ia pérdida de inteligibilidad del canal de diálogos (C) debido a una imagen estéreo excesiva de Ia música, es decir, que Ia energía del canal FL vaya casi completamente a L y Ia energía de FR vaya casi completamente a R, y evitar Ia llegada de una gran cantidad de energía a los planos laterales, cerca de los oídos que interfiera en Ia localización de los canales del plano trasero (SL y SR).The reason that the angular ranges of the front speakers (FL and FR) are not very large is to avoid the loss of intelligibility of the dialogue channel (C) due to an excessive stereo image of the music, that is, that the energy of the FL channel goes almost completely to L and the energy of FR goes almost completely to R, and avoid the arrival of a large amount of energy to the lateral planes, near the ears that interfere with the location of the rear plane channels (SL and SR).
El canal de diálogos (C) no se procesa en Ia operación de procesado de las señales de los canales FL, FR, SL y SR, ya que mantenerlo como fuente aporta dos grandes ventajas a Ia salida final del procedimiento.The dialogue channel (C) is not processed in the processing operation of the signals of the FL, FR, SL and SR channels, since maintaining it as a source provides two great advantages to the final output of the procedure.
La primera de ellas es ganar en inteligibilidad respecto al formato de entrada, ya que al mantener este canal intacto y auralizar los de los planos frontal (FL y FR) y trasero (SL y SR), los diálogos (C) quedan destacados en Ia posición central, reduciendo Ia fatiga auditiva para su seguimiento.The first one is to gain in intelligibility with respect to the input format, since by keeping this channel intact and auralizing those of the frontal (FL and FR) and rear (SL and SR) planes, the dialogues (C) are highlighted in Ia central position, reducing hearing fatigue for follow-up.
La segunda ventaja reside en el hecho de constituir un punto de referencia auditivo para el cerebro, ya que mantener su naturaleza intracraneal hace ideal su combinación con los canales auralizados. De este modo, el cerebro compara constantemente Ia posición de este canal con los auralizados, haciendo Ia experiencia auditiva del usuario mucho más espectacular.The second advantage lies in the fact that it constitutes an auditory reference point for the brain, since maintaining its intracranial nature makes its combination with the auralized channels ideal. In this way, the brain constantly compares the position of this channel with the auralized ones, making the user's auditory experience much more spectacular.
El canal LFE tampoco se procesa en esta operación del procedimiento debido a Ia naturaleza no direccional de las frecuencias que contiene, es decir, da Ia sensación de ser escuchado en todas las posiciones. Esta característica hace que los altavoces destinados a Ia reproducción de este canal puedan ser situados prácticamente en cualquier lugar del recinto.The LFE channel is also not processed in this procedure operation due to the non-directional nature of the frequencies it contains, that is, it gives the sensation of being heard in all positions. This feature makes that the speakers intended for the reproduction of this channel can be placed practically anywhere in the enclosure.
3) Modelar respuestas de recinto independientes para los planos frontal y trasero.3) Model independent enclosure responses for the front and rear planes.
Se procesan los canales del plano frontal (FLi, FRi) y trasero (SU , SRi) independientemente utilizando dos respuestas a impulso de recintos optimizados diferentes. El procesado por separado de los canales frontales y traseros aporta Ia ventaja de utilizar dos recintos virtuales diferentes, dotando de mayor profundidad únicamente a los canales traseros, que son los que poseen efectos más espectaculares. Una profundidad excesiva en los canales delanteros, sin embargo, dificultaría Ia inteligibilidad de los diálogos.The front (FLi, FRi) and rear (SU, SRi) plane channels are processed independently using two impulse responses from different optimized enclosures. The separate processing of the front and rear channels provides the advantage of using two different virtual enclosures, giving more depth only to the rear channels, which are the ones with the most spectacular effects. Excessive depth in the front channels, however, would make the intelligibility of the dialogues difficult.
De acuerdo con realizaciones preferidas de Ia presente invención, Ia reverberación introducida en los canales Fl_i y FRi está dentro del rango de 0,5 segundos a 1 segundo, y Ia reverberación introducida en los canales SU y SRi está dentro del rango de 1 segundo a 3,5 segundos.In accordance with preferred embodiments of the present invention, the reverberation introduced in the Fl_i and FRi channels is within the range of 0.5 seconds to 1 second, and the reverberation introduced in the SU and SRi channels is within the range of 1 second to 3.5 seconds
Así, después de Ia operación de modelar Ia respuesta del recinto se obtienen como salida las señales del plano frontal FL2 y FR2, y las señales del plano trasero SL2 y SR2 Thus, after the operation of modeling the response of the enclosure, the signals from the front plane FL 2 and FR 2 are obtained as output, and the signals from the rear plane SL 2 and SR 2
4) Mezclar las señales obtenidas en Ia operación anterior junto con las señales LFE y C originales para obtener las señales de salida del canal izquierdo y el canal derecho (L y R).4) Mix the signals obtained in the previous operation together with the original LFE and C signals to obtain the output signals of the left channel and the right channel (L and R).
De acuerdo con una realización preferida de Ia presente invención, el procedimiento de conversión de formato sonoro 5.1 a binaural híbrido, comprende, previamente a Ia operación final de mezcla, comprimir Ia señal del canal LFE, obteniéndose una señal LFE'.In accordance with a preferred embodiment of the present invention, the conversion procedure of sound format 5.1 to hybrid binaural, comprises, prior to the final mixing operation, compressing the LFE channel signal, obtaining an LFE 'signal.
Otra realización preferida de Ia invención comprende, previamente a Ia operación de auralización, las operaciones de:Another preferred embodiment of the invention comprises, prior to Ia auralization operation, the operations of:
a) Separar las señales de los canales FL, FR, C, SL, SR en las fuentes que los componen música L, música R, voz y efectos frontales, efectos traseros L y efectos traseros R. La separación se realiza utilizando un algoritmo de análisis de componentes independientes. Este análisis realiza una comparación de las distintas entradas (canales) que contienen información redundante en diferentes proporciones. Partiendo de Ia teoría de que varias señales se pueden considerar independientes si proceden de procesos físicos distintos, se logra aislar las distintas componentes, que en este caso son voz, música y efectos.a) Separate the signals from the FL, FR, C, SL, SR channels in the sources that comprise L music, R music, voice and front effects, rear effects L and rear effects R. The separation is performed using an algorithm of independent component analysis. This analysis makes a comparison of the different inputs (channels) that contain redundant information in different proportions. Starting from the theory that several signals can be considered independent if they come from different physical processes, it is possible to isolate the different components, which in this case are voice, music and effects.
b) Mezclar las fuentes música L, música R, voz y efectos frontales, efectos traseros L y efectos traseros R para obtener las señales que constituirán Ia entrada a Ia operación subsiguiente de auralización de los canales. Esta operación de mezcla reconstruye las señales FL, FR, C, SL y SR con las proporciones óptimas de las fuentes que se separaron en Ia operación anterior.b) Mix the sources music L, music R, voice and front effects, rear effects L and rear effects R to obtain the signals that will constitute the input to the subsequent auralization operation of the channels. This mixing operation reconstructs the signals FL, FR, C, SL and SR with the optimal proportions of the sources that were separated in the previous operation.
De acuerdo con una realización preferida de Ia presente invención, Ia mezcla de las fuentes música L, música R, voz y efectos frontales, efectos traseros L y efectos traseros R para obtener los canales se realiza según los siguientes rangos porcentuales:In accordance with a preferred embodiment of the present invention, the mixing of the sources L music, R music, voice and front effects, rear effects L and rear effects R to obtain the channels is performed according to the following percentage ranges:
FL .:: 70-90% música L, 30-10% voz y efectos frontalesFL. :: 70-90% L music, 30-10% voice and front effects
FR *: 70-90% música R, 30-10% voz y efectos frontalesFR *: 70-90% R music, 30-10% voice and front effects
C: 70-90% voz y efectos frontales, 30-10% música L y RC: 70-90% voice and front effects, 30-10% music L and R
SL: 70-90% efectos traseros L, 30-10% música LSL: 70-90% L back effects, 30-10% L music
SR: 7700--9900%% efectos traseros R, 30-10% música RSR: 7700--9900 %% R back effects, 30-10% R music
El objetivo de estas dos operaciones opcionales es garantizar que cada canal en el proceso de auralización contenga Ia proporción adecuada de las distintas componentes, ya que Ia mezcla original en 5.1 estaba optimizada para su reproducción a través de 6 altavoces físicos, esquema completamente distinto a un par de auriculares. A Ia hora de reproducir en auriculares, Ia información redundante característica de sistemas cuadrafónicos como el 5.1 obstaculiza Ia percepción de realismo espacial, y por eso es necesario este reajuste.The objective of these two optional operations is to ensure that each channel in the auralization process contains the adequate proportion of the different components, since the original 5.1 mix was optimized for reproduction through 6 physical speakers, a completely different scheme to A pair of headphones. When reproducing in headphones, the redundant information characteristic of quadraphonic systems such as 5.1 hinders the perception of spatial realism, and that is why this readjustment is necessary.
El canal de frecuencias graves LFE ya es en sí mismo una componente independiente, y por tanto su información no es redundante en los demás canales. Por este motivo no se incluye en las operaciones iniciales opcionales de separación y mezcla.The LFE bass channel is already an independent component in itself, and therefore its information is not redundant in the other channels. For this reason it is not included in the optional initial separation and mixing operations.
De acuerdo con otro aspecto de Ia invención, ésta se extiende también a programas de ordenador, en particular programas de ordenador en contenidos en una portadora, adaptados para llevar a cabo las operaciones del procedimiento descrito. El programa puede estar en forma de código fuente, código objeto o un código intermedio entre el código fuente y el código objeto, como una forma parcialmente compilada, o de cualquier otra forma adecuada para implementar las operaciones de Ia invención.According to another aspect of the invention, this also extends to computer programs, in particular computer programs contained in a carrier, adapted to carry out the operations of the described procedure. The program can be in the form of a source code, object code or an intermediate code between the source code and the object code, as a partially compiled form, or in any other suitable way to implement the operations of the invention.
La portadora puede ser cualquier dispositivo o entidad capaz de transportar el programa. Por ejemplo, Ia portadora puede comprender un medio de almacenamiento, como una ROM, un CD ROM o cualquier otro medio de almacenamiento magnético, por ejemplo un disquete o un disco duro. Además, Ia portadora puede ser una portadora de transmisión, como una señal eléctrica u óptica que se pueda comunicar a través de cable eléctrico, óptico, por radio o de cualquier otro modo.The carrier can be any device or entity capable of transporting the program. For example, the carrier can comprise a storage medium, such as a ROM, a CD ROM or any other magnetic storage medium, for example a floppy disk or a hard disk. In addition, the carrier can be a transmission carrier, such as an electrical or optical signal that can be communicated through electric, optical, radio or any other way.
Alternativamente, Ia portadora puede ser un circuito integrado en el que está almacenado el programa, estando el circuito adaptado para efectuar las operaciones del procedimiento. En particular, podría ser un ASIC, una FPGA, un DSP, un microprocesador o un microcontrolador.Alternatively, the carrier can be an integrated circuit in which the program is stored, the circuit being adapted to carry out the operations of the procedure. In particular, it could be an ASIC, an FPGA, a DSP, a microprocessor or a microcontroller.
DESCRIPCIÓN DE LOS DIBUJOS Para complementar Ia descripción que se está realizando y con objeto de ayudar a una mejor comprensión de las características de Ia invención, de acuerdo con un ejemplo preferente de realización práctica de Ia misma, se acompaña como parte integrante de dicha descripción, un juego de dibujos en donde con carácter ilustrativo y no limitativo, se ha representado Io siguiente:DESCRIPTION OF THE DRAWINGS To complement the description that is being made and in order to help a better understanding of the characteristics of the invention, according to a preferred example of practical implementation thereof, a set of drawings is attached as an integral part of said description. where, for the purposes of illustration and not limitation, the following has been represented:
Figura 1.- Muestra una vista de Ia ubicación de los altavoces físicos de un cine en un formato sonoro 5.1.Figure 1.- Shows a view of the location of the physical speakers of a cinema in a 5.1 sound format.
Figura 2.- Muestra un esquema explicativo de Ia posición de los ángulos de elevación (α) y de azimut (β).Figure 2.- Shows an explanatory scheme of the position of the elevation angles (α) and azimuth (β).
Figura 3.- Muestra un esquema general de las operaciones del procedimiento de acuerdo con Ia presente invención.Figure 3.- Shows a general scheme of the operations of the process according to the present invention.
REALIZACIÓN PREFERENTE DE LA INVENCIÓNPREFERRED EMBODIMENT OF THE INVENTION
Se parte del sonido original de una película en formato 5.1 que se desea convertir en binaural híbrido, que en este caso está grabado en un disco de tipoIt is based on the original sound of a movie in 5.1 format that you want to convert into hybrid binaural, which in this case is recorded on a disc of type
DVD. La Figura 1 muestra Ia posición de los altavoces de los canales en una sala de cine con relación a Ia posición en Ia que debe estar situado el usuario para una experiencia sonora óptima.DVD. Figure 1 shows the position of the speakers of the channels in a movie theater in relation to the position in which the user must be located for an optimum sound experience.
En este ejemplo, el procedimiento Io efectúa un ordenador que, en primer lugar, como se muestra en Ia Figura 3, obtiene a partir del DVD las señales de los canales originales en formato 5.1 (FL, FR, C, SL, SR, LFE). El canal LFE es separado para ser procesado en paralelo de modo independiente, sufriendo únicamente una compresión que da como resultado Ia señal LFE'.In this example, the procedure is carried out by a computer that, first, as shown in Figure 3, obtains from the DVD the signals of the original channels in 5.1 format (FL, FR, C, SL, SR, LFE ). The LFE channel is separated to be processed in parallel independently, suffering only a compression that results in the LFE 'signal.
En este ejemplo, se proporciona un selector (S) que permite que el usuario seleccione o no las operaciones opcionales de extraer las fuentes de los canales originales y remezclarlas de acuerdo con unas proporciones nuevas para realzar Ia espectacularidad de Ia película. Para ello, se separan Ia fuentes (música L, música R, voz y efectos frontales, efectos traseros L y efectos traseros R), por ejemplo utilizando el algoritmo de separación de fuentes por análisis de componentes independientes 'FastICA', desarrollado por el HUT (Helsinki University of Technology), para volver a mezclarlos de acuerdo con unas proporciones nuevas optimizadas. En este ejemplo supondremos que Ia película es de acción, Io cual implica Ia existencia de una serie de características sonoras, como explosiones, tiros, ruido de motores, etc. Para conseguir Ia mayor espectacularidad posible en este tipo de películas, se han determinado las siguientes proporciones óptimas de mezcla:In this example, a selector (S) is provided that allows the user select or not the optional operations of extracting the sources from the original channels and remixing them according to new proportions to enhance the spectacular nature of the film. For this, the sources (L music, R music, voice and front effects, rear effects L and rear effects R) are separated, for example using the source separation algorithm by independent component analysis 'FastICA', developed by HUT (Helsinki University of Technology), to re-mix them according to new optimized proportions. In this example we will assume that the film is action, which implies the existence of a series of sound characteristics, such as explosions, shots, engine noise, etc. In order to achieve the greatest possible spectacularity in this type of films, the following optimal mixing ratios have been determined:
FL': 80% música L + 20% voz y efectos frontales FR': 80% música R + 20% voz y efectos frontalesFL ': 80% music L + 20% voice and front effects FR': 80% music R + 20% voice and front effects
C: 80% voz y efectos frontales + 20% música L y RC: 80% voice and front effects + 20% music L and R
SL': 80% efectos traseros L + 20% música LSL ': 80% rear effects L + 20% music L
SR': 80% efectos traseros R + 20% música RSR ': 80% rear effects R + 20% R music
Una vez mezcladas las fuentes en los canales de este modo optimizado, se separa el canal de diálogos (C) del resto, los canales FL', FR', SL' y SR' son auralizados cada uno de ellos en una situación geométrica óptima para realzar Ia espectacularidad de Ia experiencia sonora del usuario. En este caso, se ha considerado que el oyente tiene las características de un usuario estándar basado en las respuestas al impulso de un maniquí Kemar.Once the sources in the channels are mixed in this optimized way, the dialogue channel (C) is separated from the rest, the channels FL ', FR', SL 'and SR' are each amalized in an optimal geometric situation to enhance the spectacular user sound experience. In this case, it has been considered that the listener has the characteristics of a standard user based on the impulse responses of a Kemar dummy.
A continuación se presentan las posiciones óptimas de los canales, descritas a través del ángulo de elevación (α) y del ángulo de azimut (β) que forman con el oyente:Below are the optimal positions of the channels, described through the elevation angle (α) and the azimuth angle (β) that form with the listener:
FL': elevación 15°; azimut -20°FL ': elevation 15 °; azimuth -20 °
FR': elevación 15°; azimut 20° SL': elevación 180°; azimut -40° SR': elevación 180°; azimut 40°FR ': elevation 15 °; azimuth 20 ° SL ': 180 ° elevation; azimuth -40 ° SR ': 180 ° elevation; azimuth 40 °
La Figura 2 muestra Ia referencia de Ia ubicación de los ángulos de elevación y azimut, respectivamente α y β. Después de Ia operación de auralización, se obtienen las señales FL'i, FR'i, SL'i y SRV A continuación, se procesan las señales FL'i y FR'i con Ia respuesta al impulso de un recinto similar a una sala de cine, con un tiempo de reverberación (Tr) de 0,5 segundos aproximadamente; y las señales SL'i y SR'i con Ia respuesta al impulso de otro recinto similar a una sala de cine diferente, con un tiempo de reverberación de 2 segundos aproximadamente.Figure 2 shows the reference of the location of the elevation and azimuth angles, respectively α and β. After the auralization operation, the signals FL'i, FR'i , SL'i and SRV are obtained Next, the signals FL'i and FR'i are processed with the impulse response of an enclosure similar to a room of cinema, with a reverberation time (T r ) of approximately 0.5 seconds; and the SL'i and SR'i signals with the impulse response of another enclosure similar to a different movie theater, with a reverberation time of approximately 2 seconds.
Finalmente, se mezclan los canales obtenidos en Ia operación anterior, FL'2, FR'2, SL'2 y SR'2 con los canales LFE' y C para obtener únicamente dos señales en formato binaural híbrido correspondientes a los canales L y R de unos auriculares. Finally, the channels obtained in the previous operation, FL'2, FR'2, SL'2 and SR'2 are mixed with the LFE 'and C channels to obtain only two signals in hybrid binaural format corresponding to the L and R channels of headphones.

Claims

R E I V I N D I C A C I O N E S
1. Procedimiento de conversión de formato sonoro 5.1 a binaural híbrido, caracterizado porque comprende las siguientes operaciones1. Conversion procedure from sound format 5.1 to hybrid binaural, characterized in that it comprises the following operations
obtener las señales de los canales FL, FR, C, SL, SR y LFE del formato 5.1 que se desea convertir en formato binaural híbrido;obtain the signals of the FL, FR, C, SL, SR and LFE channels of the 5.1 format that you want to convert into a hybrid binaural format;
auralizar los canales FL, FR, SL y SR en las siguientes posiciones:auralize the FL, FR, SL and SR channels in the following positions:
FL: elevación de 0o a 30°; azimut de -10o a -30°.FL: elevation of 0 or 30 °; azimuth from -10 or -30 °.
FR: elevación de 0o a 30°; azimut de +10 a +30°.FR: elevation of 0 or 30 °; azimuth from +10 to + 30 °.
SL: elevación de 175° a 195°; azimut de -30° a -60°.SL: elevation from 175 ° to 195 °; azimuth from -30 ° to -60 °.
SR: elevación de 175° a 195°; azimut de +30° a +60°,SR: elevation from 175 ° to 195 °; azimuth from + 30 ° to + 60 °,
obteniéndose como resultado las señales FLi, FRi, SLi y SRi;resulting in the signals FLi, FRi, SLi and SRi;
procesar independientemente las señales del plano frontal (FLi y FRi) y las del plano trasero (SLi y SRi), utilizando para ello, las respuestas a impulso de dos recintos virtuales diferentes, optimizados cada uno de ellos para dichos planos, obteniéndose como resultado las señales FL2, FR2, SL2 y SR2;independently process the signals from the front plane (FLi and FRi) and those from the back plane (SLi and SRi), using the impulse responses of two different virtual enclosures, each optimized for said planes, resulting in the results FL 2 , FR 2 , SL 2 and SR 2 signals;
mezclar las señales FL2, FR2, SL2 y SR2 obtenidas en Ia operación anterior junto con las señales originales LFE y C para obtener las dos señales de salida izquierda y derecha.Mix the signals FL 2 , FR 2 , SL 2 and SR 2 obtained in the previous operation together with the original signals LFE and C to obtain the two left and right output signals.
2. Procedimiento de conversión de formato sonoro 5.1 a binaural híbrido de acuerdo con Ia reivindicación anterior, caracterizado porque las respuestas a impulso de los recintos virtuales utilizados para el procesado del plano frontal y trasero, comprenden tiempos de reverberación de entre 0,5 s y 1 s para el primero, y de entre 1 s y 3,5 s para el segundo. 2. Conversion procedure of sound format 5.1 to hybrid binaural according to the preceding claim, characterized in that the impulse responses of the virtual enclosures used for the processing of the frontal and rear plane, comprise reverberation times of between 0.5 s and 1 s for the first, and between 1 s and 3.5 s for the second.
3. Procedimiento de conversión de formato sonoro 5.1 a binaural híbrido de acuerdo con cualquiera de las reivindicaciones anteriores, caracterizado porque comprende, previamente a Ia operación final de mezcla, una compresión del canal LFE.3. Conversion procedure from sound format 5.1 to hybrid binaural according to any of the preceding claims, characterized in that it comprises, prior to the final mixing operation, a compression of the LFE channel.
4. Procedimiento de conversión de formato sonoro 5.1 a binaural híbrido de acuerdo con cualquiera de las reivindicaciones anteriores, caracterizado porque antes de Ia operación de auralización comprende las operaciones de:4. Conversion procedure from sound format 5.1 to hybrid binaural according to any of the preceding claims, characterized in that before the auralization operation comprises the operations of:
separar las señales de los canales FL, FR, C, SL, SR en las fuentes que los componen: música L, música R, voz y efectos frontales, efectos traseros L y efectos traseros R;separate the signals from the FL, FR, C, SL, SR channels in the sources that compose them: L music, R music, voice and front effects, rear effects L and rear effects R;
remezclar las fuentes estimadas en proporciones optimizadas para procesos posteriores, reconstruyendo los canales FL, FR, C, SL y SR.remix the estimated sources in optimized proportions for subsequent processes, rebuilding the FL, FR, C, SL and SR channels.
5. Procedimiento de conversión de formato sonoro 5.1 a binaural híbrido de acuerdo con Ia reivindicación anterior, caracterizado porque Ia operación de remezcla de las fuentes música L, música R, voz y efectos frontales, efectos traseros L y efectos traseros R se realiza de acuerdo con los siguientes rangos porcentuales:5. Conversion procedure of sound format 5.1 to hybrid binaural according to the preceding claim, characterized in that the operation of remixing of the sources music L, music R, voice and front effects, rear effects L and rear effects R is performed according with the following percentage ranges:
FL .:: 70-90% música L, 30-10% voz y efectos frontalesFL. :: 70-90% L music, 30-10% voice and front effects
FR *: 70-90% música R, 30-10% voz y efectos frontalesFR *: 70-90% R music, 30-10% voice and front effects
C: 70-90% voz y efectos frontales, 30-10% música L y RC: 70-90% voice and front effects, 30-10% music L and R
SL: 70-90% efectos traseros L, 30-10% música LSL: 70-90% L back effects, 30-10% L music
SR: 7700--9900%% efectos traseros R, 30-10% música RSR: 7700--9900 %% R back effects, 30-10% R music
6. Procedimiento de conversión de formato sonoro 5.1 a binaural híbrido de acuerdo con Ia reivindicación anterior, caracterizado porque es llevado a cabo por uno dispositivo de entre los de Ia siguiente lista: un ASIC, una FPGA, un DSP, un microprocesador y un microcontrolador. 6. Conversion procedure of sound format 5.1 to hybrid binaural according to the preceding claim, characterized in that it is carried out by a device among those of the following list: an ASIC, an FPGA, a DSP, a microprocessor and a microcontroller .
7. Programa de ordenador que comprende instrucciones de programa que provocan que un ordenador lleve a cabo las operaciones del método de acuerdo con cualquiera de las reivindicaciones anteriores.7. Computer program comprising program instructions that cause a computer to carry out the operations of the method according to any of the preceding claims.
8. Programa de ordenador de acuerdo con Ia reivindicación 7, caracterizado porque está almacenado en unos medios de almacenamiento.8. Computer program according to claim 7, characterized in that it is stored in storage media.
9. Programa de ordenador de acuerdo con Ia reivindicación 7, caracterizado porque se transmite a través de una señal portadora. 9. Computer program according to claim 7, characterized in that it is transmitted through a carrier signal.
PCT/ES2008/070246 2008-01-17 2008-12-30 Method of converting 5.1 sound format to hybrid binaural format WO2009090281A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
ES200800112A ES2323563B1 (en) 2008-01-17 2008-01-17 SOUND FORMAT CONVERSION PROCEDURE 5.1. TO HYBRID BINAURAL.
ESP200800112 2008-01-17

Publications (1)

Publication Number Publication Date
WO2009090281A1 true WO2009090281A1 (en) 2009-07-23

Family

ID=40825163

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/ES2008/070246 WO2009090281A1 (en) 2008-01-17 2008-12-30 Method of converting 5.1 sound format to hybrid binaural format

Country Status (2)

Country Link
ES (1) ES2323563B1 (en)
WO (1) WO2009090281A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5742689A (en) * 1996-01-04 1998-04-21 Virtual Listening Systems, Inc. Method and device for processing a multichannel signal for use with a headphone
US6002775A (en) * 1997-01-24 1999-12-14 Sony Corporation Method and apparatus for electronically embedding directional cues in two channels of sound
EP1816890A1 (en) * 2006-02-01 2007-08-08 Sony Corporation Audio reproducing system and method thereof
WO2007123788A2 (en) * 2006-04-03 2007-11-01 Srs Labs, Inc. Audio signal processing

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5742689A (en) * 1996-01-04 1998-04-21 Virtual Listening Systems, Inc. Method and device for processing a multichannel signal for use with a headphone
US6002775A (en) * 1997-01-24 1999-12-14 Sony Corporation Method and apparatus for electronically embedding directional cues in two channels of sound
EP1816890A1 (en) * 2006-02-01 2007-08-08 Sony Corporation Audio reproducing system and method thereof
WO2007123788A2 (en) * 2006-04-03 2007-11-01 Srs Labs, Inc. Audio signal processing

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"Neural Networks, 2005. Proceedings. 2005 IEEE International Joint Conference on Montreal", vol. 2, QUE., CANADA., article CIARAMEL THE A.: "BSS toolbox for delayed and convolved mixtures", pages: 1245 - 1250 *
TECHNOLOGIES FOR PRESENTATION OF SORROUND-SOUND IN HEADPHONES., 17 December 2007 (2007-12-17), Retrieved from the Internet <URL:http://www.headwize.com/tech/sshd_tech.htm> [retrieved on 20090323] *

Also Published As

Publication number Publication date
ES2323563B1 (en) 2010-04-27
ES2323563A1 (en) 2009-07-20

Similar Documents

Publication Publication Date Title
US10021507B2 (en) Arrangement and method for reproducing audio data of an acoustic scene
KR102182526B1 (en) Spatial audio rendering for beamforming loudspeaker array
JP4633870B2 (en) Audio signal processing method
AU2017279615B2 (en) Method and device for rendering acoustic signal, and computer-readable recording medium
US9769589B2 (en) Method of improving externalization of virtual surround sound
US20150110310A1 (en) Method for reproducing an acoustical sound field
TW201119420A (en) Virtual audio processing for loudspeaker or headphone playback
JP2004187300A (en) Directional electroacoustic transduction
JP5757945B2 (en) Loudspeaker system for reproducing multi-channel sound with improved sound image
CN103535052A (en) Apparatus and method for a complete audio signal
KR20190059642A (en) Apparatus for implementing multi-channel sound using open-ear headphone and method for the same
JP2019508964A (en) Method and system for providing virtual surround sound on headphones
ES2717330T3 (en) Apparatus and procedure for the processing of stereo signals for reproduction in automobiles, to achieve an individual three-dimensional sound by the front speakers
US6990210B2 (en) System for headphone-like rear channel speaker and the method of the same
JP4221746B2 (en) Headphone device
Cuevas-Rodríguez et al. The 3D Tune-In Toolkit–3D audio spatialiser, hearing loss and hearing aid simulations
ES2323563B1 (en) SOUND FORMAT CONVERSION PROCEDURE 5.1. TO HYBRID BINAURAL.
Klepko 5-channel microphone array with binaural-head for multichannel reproduction
US6983054B2 (en) Means for compensating rear sound effect
Paterson et al. Producing 3-D audio
Enomoto et al. 3-D sound reproduction system for immersive environments based on the boundary surface control principle
Tan Binaural recording methods with analysis on inter-aural time, level, and phase differences
WO2014084706A1 (en) Method for three-dimensional audio localisation in real time using a parametric mixer and pre-decomposition into frequency bands
Fodde Spatial Comparison of Full Sphere Panning Methods
EP4305851A1 (en) Set of headphones

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08870792

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 08870792

Country of ref document: EP

Kind code of ref document: A1