WO2009090281A1 - Procédé de conversion de format sonore 5.1 en format binaural hybride - Google Patents

Procédé de conversion de format sonore 5.1 en format binaural hybride Download PDF

Info

Publication number
WO2009090281A1
WO2009090281A1 PCT/ES2008/070246 ES2008070246W WO2009090281A1 WO 2009090281 A1 WO2009090281 A1 WO 2009090281A1 ES 2008070246 W ES2008070246 W ES 2008070246W WO 2009090281 A1 WO2009090281 A1 WO 2009090281A1
Authority
WO
WIPO (PCT)
Prior art keywords
effects
music
signals
channels
format
Prior art date
Application number
PCT/ES2008/070246
Other languages
English (en)
Spanish (es)
Inventor
Ivan Portas Arrondo
Original Assignee
Auralia Emotive Media Systems, S,L.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Auralia Emotive Media Systems, S,L. filed Critical Auralia Emotive Media Systems, S,L.
Publication of WO2009090281A1 publication Critical patent/WO2009090281A1/fr

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S3/00Systems employing more than two channels, e.g. quadraphonic
    • H04S3/008Systems employing more than two channels, e.g. quadraphonic in which the audio signals are in digital form, i.e. employing more than two discrete digital channels
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S3/00Systems employing more than two channels, e.g. quadraphonic
    • H04S3/002Non-adaptive circuits, e.g. manually adjustable or static, for enhancing the sound image or the spatial distribution
    • H04S3/004For headphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/01Multi-channel, i.e. more than two input channels, sound reproduction with two speakers wherein the multi-channel information is substantially preserved

Definitions

  • the main object of the present invention is a method for converting sound into 5.1 sound format, usually used for recording and digital sound reproduction of cinematic content, in hybrid binaural format.
  • a sound system in 5.1 format represents the standard for the domestic sound reproduction of cinema.
  • a sound system in 5.1 format is composed of six audio channels where music, voice, sound effects, etc. are mixed in different proportions.
  • Each of the channels corresponds to a speaker, and in turn each of the speakers must be located in a specific location in relation to the user to achieve an optimal sound sensation.
  • the main speakers (FL and FR in Figure 1) ideally form an equilateral triangle with the user's position (O).
  • the lines formed by the surround speakers (SL and SR) and the user (O) form an angle of approximately 110 ° with respect to the vertical axis (straight that joins O and C).
  • the LFE (Low Frequency Enhancement) loudspeaker is intended to enhance bass sounds to produce a striking effect on reproduction. Its location is not decisive, since the information it transmits has a frequency spectrum generally less than 100 Hz, which has an omnidirectional nature. That is, you cannot determine where the sound comes from.
  • a drawback of audio systems based on the 5.1 format is that the user's sound sensation deteriorates rapidly when it is not located in the optimal location with respect to the speakers.
  • the use of headphones allows, however, an optimal positioning of the user at all times, since the sound reproduction systems, being attached to the user's head, do not modify their relative position with respect to their head.
  • the human being is a volumetric sound receiver, that is, it processes the sound that reaches it through, for example, reflections created by the shoulders and torso, or diffractions created by the sound when surrounding the head.
  • Human hearing is by nature binaural, where the result of the entire sound reception process ends in only two channels: right ear and left ear.
  • the term "binaural" refers to the nature of human hearing, because people are able to capture all the spatial sound information through a single pair of ears.
  • intracranial sound is usually produced, such as when listening to traditional stereo sound through headphones.
  • the intracranial sound consists in the sensation that the sound sources are inside the skull of the user, at a point located between the two headphones, so that traditional stereo sound is not an advisable format when trying to represent in a way Realistic three-dimensional sound spaces.
  • the first of these consists in replacing the pair of point receivers that are usually used by volumetric receivers, such as dummies, thereby achieving that the sound that reaches them is processed naturally. In this way a binaural stereo recording is achieved, where all the phenomenology described above is already introduced.
  • the second is based on performing an auralization procedure. For This usually measures or models the response of a certain receiver (a dummy or a human being, for example) to an impulse signal from a certain point in space (usually a broadband noise emitted from a certain point around the Username).
  • US Patent 2007213990 describes a method for transforming a traditional bacchanal stereo signal into a binaural signal, focusing on the treatment that the input signal must undergo for its preparation to be transformed into three-dimensional sound. Specifically, it is described how to divide the input signal according to different frequency bands so that, once the input signal is divided, auralize each sub-band and finally join them to form the two output channels in binaural format.
  • the present invention describes a new method for real-time audio auralization in 5.1 format. To achieve an optimal result, each channel is treated and auralized independently, so that it is possible to assign specific acoustic parameters to each of them in order to make the reproduction more realistic and spectacular.
  • the hybrid model described which combines the auralization of the FL, FR, SL and SR channels with the original C and LFE monophonic channels allows greater intelligibility of the dialogues, since there is no interference between the front channels and the C channel, as well as a superior immersion due to the constant unconscious referencing made by the brain between the C channel monophonic and the auralized channels.
  • the readjustment of the proportions of the different types of information allows to optimize from the beginning the content of the different channels to achieve an optimal result.
  • the reinforcement of the LFE channel allows to recreate the sensations produced by the serious components in the cinemas, balancing the reproduction system.
  • the term "auralize” refers to the processing of the different channels to get the user to have the impression that they come from specific places in space, thus achieving optimized spectacularity and intelligibility.
  • channel refers to the signal of each of the speakers that make up the 5.1 sound format or the hybrid binaural sound format.
  • the FL, FR, C, SL, SR or LFE channels which are the input channels in 5.1 format
  • L and R channels which are the output channels in binaural format.
  • the letters “L” and “R” will be used to distinguish between the positions of the channels located to the left (left, in English) and right (right, in English) of the user.
  • frontal plane” and “rear plane” will also be used to refer to the position of the channels in front of the user or behind the user, as well as “right side plane” or “left side plane” to refer to the position of the channels to the sides of the user.
  • the term "source” refers to a signal that contains sounds from a single physical process, that is, the sources will be, in general, music, voice and effects.
  • hybrid binaural is also defined as a sound format that mixes auralized channels with non-auralized or monophonic channels. Specifically, the present invention mixes the auralized channels FL, FR, SL and SR with the non-auralized channels C and LFE.
  • a method of converting from sound format 5.1 to hybrid binaural comprises the following operations:
  • FL mainly contains music, and to a lesser extent voice and effects.
  • FR contains mainly music, and to a lesser extent voice and effects.
  • C contains mainly voice, and to a lesser extent music and effects.
  • SL contains mainly effects, and to a lesser extent music.
  • SR contains mainly effects, and to a lesser extent music.
  • LFE contains only serious.
  • FR elevation of 0 or 30 °; azimuth from +10 to + 30 °.
  • SL elevation from 175 ° to 195 °; azimuth from -30 ° to -60 °.
  • SR elevation from 175 ° to 195 °; azimuth from + 30 ° to + 60 °.
  • auralizing a channel in a certain position means virtually locating that channel so that the reproduction of the resulting signals, one for the right channel and one for the left channel, through headphones produce the sensation in the user of that the sounds of that channel come from that particular position of space.
  • auralizing is a process by which a channel lacking usually monophonic spatial information, as in this case, that is, anechoic or dry, is processed by a procedure called convolution, with the impulse response (response in time and frequency at a given acoustic stimulus from a certain point in space) of a particular listener.
  • the response of a certain receiver (a dummy or a human being for example) to a pulse signal from a certain point in space (usually broadband noise emitted) is modeled or measured from a certain point around the user).
  • This response to the user's impulse is later used to process a monophonic source (without spatial information) through a convolution process, thus achieving the effect of listening to said source located at the point where the impulse has been emitted.
  • the inventors have discovered that placing virtually the FL, FR, SL and SR channels within the angular ranges described above gives all users a feeling of optimal spectacularity.
  • the reason that the angular ranges of the front speakers (FL and FR) are not very large is to avoid the loss of intelligibility of the dialogue channel (C) due to an excessive stereo image of the music, that is, that the energy of the FL channel goes almost completely to L and the energy of FR goes almost completely to R, and avoid the arrival of a large amount of energy to the lateral planes, near the ears that interfere with the location of the rear plane channels (SL and SR).
  • the dialogue channel (C) is not processed in the processing operation of the signals of the FL, FR, SL and SR channels, since maintaining it as a source provides two great advantages to the final output of the procedure.
  • the first one is to gain in intelligibility with respect to the input format, since by keeping this channel intact and auralizing those of the frontal (FL and FR) and rear (SL and SR) planes, the dialogues (C) are highlighted in Ia central position, reducing hearing fatigue for follow-up.
  • the second advantage lies in the fact that it constitutes an auditory reference point for the brain, since maintaining its intracranial nature makes its combination with the auralized channels ideal. In this way, the brain constantly compares the position of this channel with the auralized ones, making the user's auditory experience much more spectacular.
  • the LFE channel is also not processed in this procedure operation due to the non-directional nature of the frequencies it contains, that is, it gives the sensation of being heard in all positions. This feature makes that the speakers intended for the reproduction of this channel can be placed practically anywhere in the enclosure.
  • the front (FLi, FRi) and rear (SU, SRi) plane channels are processed independently using two impulse responses from different optimized enclosures.
  • the separate processing of the front and rear channels provides the advantage of using two different virtual enclosures, giving more depth only to the rear channels, which are the ones with the most spectacular effects. Excessive depth in the front channels, however, would make the intelligibility of the dialogues difficult.
  • the reverberation introduced in the Fl_i and FRi channels is within the range of 0.5 seconds to 1 second, and the reverberation introduced in the SU and SRi channels is within the range of 1 second to 3.5 seconds
  • the signals from the front plane FL 2 and FR 2 are obtained as output, and the signals from the rear plane SL 2 and SR 2
  • the conversion procedure of sound format 5.1 to hybrid binaural comprises, prior to the final mixing operation, compressing the LFE channel signal, obtaining an LFE 'signal.
  • Another preferred embodiment of the invention comprises, prior to Ia auralization operation, the operations of:
  • the mixing of the sources L music, R music, voice and front effects, rear effects L and rear effects R to obtain the channels is performed according to the following percentage ranges:
  • the LFE bass channel is already an independent component in itself, and therefore its information is not redundant in the other channels. For this reason it is not included in the optional initial separation and mixing operations.
  • this also extends to computer programs, in particular computer programs contained in a carrier, adapted to carry out the operations of the described procedure.
  • the program can be in the form of a source code, object code or an intermediate code between the source code and the object code, as a partially compiled form, or in any other suitable way to implement the operations of the invention.
  • the carrier can be any device or entity capable of transporting the program.
  • the carrier can comprise a storage medium, such as a ROM, a CD ROM or any other magnetic storage medium, for example a floppy disk or a hard disk.
  • the carrier can be a transmission carrier, such as an electrical or optical signal that can be communicated through electric, optical, radio or any other way.
  • the carrier can be an integrated circuit in which the program is stored, the circuit being adapted to carry out the operations of the procedure.
  • the carrier could be an ASIC, an FPGA, a DSP, a microprocessor or a microcontroller.
  • Figure 1. Shows a view of the location of the physical speakers of a cinema in a 5.1 sound format.
  • Figure 2.- Shows an explanatory scheme of the position of the elevation angles ( ⁇ ) and azimuth ( ⁇ ).
  • Figure 3. Shows a general scheme of the operations of the process according to the present invention.
  • Figure 1 shows the position of the speakers of the channels in a movie theater in relation to the position in which the user must be located for an optimum sound experience.
  • the procedure is carried out by a computer that, first, as shown in Figure 3, obtains from the DVD the signals of the original channels in 5.1 format (FL, FR, C, SL, SR, LFE ).
  • the LFE channel is separated to be processed in parallel independently, suffering only a compression that results in the LFE 'signal.
  • a selector (S) is provided that allows the user select or not the optional operations of extracting the sources from the original channels and remixing them according to new proportions to enhance the spectacular nature of the film.
  • the sources L music, R music, voice and front effects, rear effects L and rear effects R
  • the sources are separated, for example using the source separation algorithm by independent component analysis 'FastICA', developed by HUT (Helsinki University of Technology), to re-mix them according to new optimized proportions.
  • HUT Heelsinki University of Technology
  • the dialogue channel (C) is separated from the rest, the channels FL ', FR', SL 'and SR' are each amalized in an optimal geometric situation to enhance the spectacular user sound experience.
  • the listener has the characteristics of a standard user based on the impulse responses of a Kemar dummy.
  • FR ' elevation 15 °; azimuth 20 ° SL ': 180 ° elevation; azimuth -40 ° SR ': 180 ° elevation; azimuth 40 °
  • Figure 2 shows the reference of the location of the elevation and azimuth angles, respectively ⁇ and ⁇ .
  • the channels obtained in the previous operation FL'2, FR'2, SL'2 and SR'2 are mixed with the LFE 'and C channels to obtain only two signals in hybrid binaural format corresponding to the L and R channels of headphones.

Abstract

L'invention porte sur un procédé de conversion de format sonore 5.1 en format binaural hybride, lequel procédé consiste à : obtenir de canaux FL, FR, C, SL, SR et LFE les signaux de format 5.1 que l'on souhaite convertir en format binaural hybride; à auraliser les canaux FL, FR, SL et SR dans les positions suivantes: FL: élévation de 0º à 30º; azimut de -10º a -30º; FR: élévation de 0º a 30º; azimut de +10 a +30º; SL: élévation de 175º à 195º; azimut de -30º à -60º; SR: élévation de 175º à 195º; azimut de +30º à +60º, de manière à obtenir des signaux FL1, FR1, SL1 et SR1; modéliser la réponse de l'espace clos à partir des signaux en introduisant un effet de réverbération; et mélanger aux signaux originaux LFE et C les signaux FL2, FR2, SL2 et SR 2 obtenus par l'opération antérieure afin d'obtenir les deux signaux de sortie gauche et droite.
PCT/ES2008/070246 2008-01-17 2008-12-30 Procédé de conversion de format sonore 5.1 en format binaural hybride WO2009090281A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
ES200800112A ES2323563B1 (es) 2008-01-17 2008-01-17 Procedimiento de conversion de formato sonoro 5.1. a binaural hibrido.
ESP200800112 2008-01-17

Publications (1)

Publication Number Publication Date
WO2009090281A1 true WO2009090281A1 (fr) 2009-07-23

Family

ID=40825163

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/ES2008/070246 WO2009090281A1 (fr) 2008-01-17 2008-12-30 Procédé de conversion de format sonore 5.1 en format binaural hybride

Country Status (2)

Country Link
ES (1) ES2323563B1 (fr)
WO (1) WO2009090281A1 (fr)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5742689A (en) * 1996-01-04 1998-04-21 Virtual Listening Systems, Inc. Method and device for processing a multichannel signal for use with a headphone
US6002775A (en) * 1997-01-24 1999-12-14 Sony Corporation Method and apparatus for electronically embedding directional cues in two channels of sound
EP1816890A1 (fr) * 2006-02-01 2007-08-08 Sony Corporation Système de reproduction audio et procédé correspondant
WO2007123788A2 (fr) * 2006-04-03 2007-11-01 Srs Labs, Inc. Traitement de signal audio

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5742689A (en) * 1996-01-04 1998-04-21 Virtual Listening Systems, Inc. Method and device for processing a multichannel signal for use with a headphone
US6002775A (en) * 1997-01-24 1999-12-14 Sony Corporation Method and apparatus for electronically embedding directional cues in two channels of sound
EP1816890A1 (fr) * 2006-02-01 2007-08-08 Sony Corporation Système de reproduction audio et procédé correspondant
WO2007123788A2 (fr) * 2006-04-03 2007-11-01 Srs Labs, Inc. Traitement de signal audio

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"Neural Networks, 2005. Proceedings. 2005 IEEE International Joint Conference on Montreal", vol. 2, QUE., CANADA., article CIARAMEL THE A.: "BSS toolbox for delayed and convolved mixtures", pages: 1245 - 1250 *
TECHNOLOGIES FOR PRESENTATION OF SORROUND-SOUND IN HEADPHONES., 17 December 2007 (2007-12-17), Retrieved from the Internet <URL:http://www.headwize.com/tech/sshd_tech.htm> [retrieved on 20090323] *

Also Published As

Publication number Publication date
ES2323563A1 (es) 2009-07-20
ES2323563B1 (es) 2010-04-27

Similar Documents

Publication Publication Date Title
US10021507B2 (en) Arrangement and method for reproducing audio data of an acoustic scene
KR102182526B1 (ko) 빔형성 라우드스피커 어레이를 위한 공간적 오디오 렌더링
ES2729308T3 (es) Aparato y procedimiento para la correspondencia de un primer y un segundo canal de entrada con al menos un canal de salida
JP4633870B2 (ja) オーディオ信号処理方法
AU2017279615B2 (en) Method and device for rendering acoustic signal, and computer-readable recording medium
US9769589B2 (en) Method of improving externalization of virtual surround sound
US20150110310A1 (en) Method for reproducing an acoustical sound field
TW201119420A (en) Virtual audio processing for loudspeaker or headphone playback
JP2004187300A (ja) 指向性電気音響変換
Bates The composition and performance of spatial music
JP5757945B2 (ja) 改善された音像でマルチチャネル音声を再生するためのラウドスピーカシステム
CN103535052A (zh) 用于完整音频信号的设备和方法
KR20190059642A (ko) 귀 개방형 헤드폰을 이용한 다채널 사운드 구현 장치 및 그 방법
JP2019508964A (ja) ヘッドフォン上でバーチャルサラウンドサウンドを提供する方法及びシステム
ES2717330T3 (es) Aparato y procedimiento para el procesamiento de señales estéreo para la reproducción en automóviles, para lograr un sonido tridimensional individual por los altavoces frontales
US6990210B2 (en) System for headphone-like rear channel speaker and the method of the same
JP4221746B2 (ja) ヘッドホン装置
Cuevas-Rodríguez et al. The 3D Tune-In Toolkit–3D audio spatialiser, hearing loss and hearing aid simulations
ES2323563B1 (es) Procedimiento de conversion de formato sonoro 5.1. a binaural hibrido.
Klepko 5-channel microphone array with binaural-head for multichannel reproduction
US6983054B2 (en) Means for compensating rear sound effect
Paterson et al. Producing 3-D audio
Enomoto et al. 3-D sound reproduction system for immersive environments based on the boundary surface control principle
Tan Binaural recording methods with analysis on inter-aural time, level, and phase differences
WO2014084706A1 (fr) Procédé de positionnement tridimensionnel audio en temps réel utilisant un mélangeur paramétrique et prédécomposition en bandes de fréquence

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08870792

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 08870792

Country of ref document: EP

Kind code of ref document: A1