[ASP] Speech, Audio, Sound Characteristics

August 16, 2021

Orientation

Speech / Audio / Sound Characteristics

Broadcasting / Communication / Entertainment Core Technology

Killer Application
Killer Application is defined as a feature, function, or application of new technology or product which is presented as virtually imdispensable or much superior to rival product. For example, with the advent of movement from analog communication to digital communication, MMS or Video streaming has become possible in your own phone. Also, the application like Netflix, Facebook, Instagram, YouTube are making internet traffic. You can find the realtime stat in here for fun. And you can figure it out that most of application is related with audio.
Various independent audio service
There are so many independent audio service. In the past, Radio or Phone call were major services. But in nowdays, the commercialized applications like sleep pattern analysis app, or AI speaker are easilly seen around.

Shortage of professional engineers in academia/industry

vs Video/Image Area
In comparison with video/image area, there are few number of professional engineers.

Music
Music is a representative example. Most of people has their own genre of music and musicians is in their own league.
Movie, Musical, etc.

Representative bio signal

Voice based health care system
Welfare technology for the disabled

Video vs Audio

Data amount

4 Dimension : 2 Dimension

Psychological sensitivity

Specific sound can make a repulsion
Psychological change is sensitive to sound
For example, slowly and creepy BGM(Back Ground Music) create an horrible atmosphere. Vice versa, also posslible like this.

Required quality

In general, low picture quality is tolerable but low sound quality is not.

Multimedia independency

A topic to be discussed

Video vs Audio summary and conclusion

Component	Video	Audio
Data amount	4D	2D
Psychological sensitivity	Low	High
Required quality	Low	High
Multimedia independency	Need to discuss	Need to discuss

conclusion: Engineer and planner need to imporve a commercial value by colaborating two different area

Applications

Speech synthesis

Generate speech signal involving linguistic information

First shown in “New York World’s Fair”, 1939
Human-machine interface (HCI) core technology
|Before 2010|After 2010| |:—:|:—:| |connect digital waveform and apply boundary smoothing to waveform|Neural Net based waveform generation|
Requirement: linguistic, acoustic professional background knoledge
Current phase of commercialization
- Announcement server
- Audio newspaper
- DeepMind TTS demo
- Personalized speech synthesizing application
- sound color estimation from body structure

Music Synthesis

Musical signal synthesis by electronic device
Frequency modulation synthesis technology invented by John Chowning
After patent expiration, 1995, many company use this FM synthesizing technology and naturally electronic instrument market expanded

Sound Synthesis

Traditional method
To play sample waveform when the matched event occur. It has problems about boredom from simple repetition, limitation of expressing, memory allocation.
Sound synthesis engine
To produce sound mathematically according to the material, velocity, impact, etc. of an object. It’s more close to reality and memory efficient.

Speech Recognition

Technology that make machine be able to understand information in speech signal
Basically, speech recognition is second position
User demand high recognition ability for machine

Speech recognition and Synthesis

Speaker Verification and Identification

Verification
To verify person by extracting individual unique information in speech signal. It’s called voice password or bio signal verification
Identification
To search specific person in mixed audio signal. It’s used as voice ID or voice tag. Also, it’s applied to search scene or index in movie or video

Music information analysis

Content based music search
Music characteristics, for example melody or rythms, would be ingredient of searching music optimized to user’s demand. Also, Query-by-humming or Query-by-short-audio are possible. And Audio genre classification is more getting exact

Speech and Audio Coding

Technology that reduce data amount for storage or transfer.
The change from analog signal to digital signal is core of core technology for digital wireless mobile communication and portable device

Current research trend
All sound is processed by integrated method. Also, multi-channel sound, high quality with low bit audio are the research topic.
International Institute selects standard for codec PCM, mp3, EVRC, AMR, AAC, USAC, etc. But, there are also de facto standard

Others

3D Audio / Stereo sound
Object based audio broadcasting
Noise elimination
Signal restoration
Sound type classification
Music theraphy
Music equipment

Share on

Twitter Facebook LinkedIn

1FeS

[ASP] Speech, Audio, Sound Characteristics

Orientation

Speech / Audio / Sound Characteristics

Broadcasting / Communication / Entertainment Core Technology

Shortage of professional engineers in academia/industry

Representative bio signal

Video vs Audio

Data amount

Psychological sensitivity

Required quality

Multimedia independency

Applications

Speech synthesis

Music Synthesis

Sound Synthesis

Speech Recognition

Speech recognition and Synthesis

Speaker Verification and Identification

Music information analysis

Speech and Audio Coding

Others

Share on

Leave a comment

You may also enjoy

[Java] Thread를 활용한 응답

[Coding Test] Lower Bound and Upper Bound

[HISTORY] BLOG REOPEN

[Java] LocalDate, LocalTime and LocalDateTime

1FeS

Orientation

Speech / Audio / Sound Characteristics

Broadcasting / Communication / Entertainment Core Technology

Shortage of professional engineers in academia/industry

Related to Hobby Activities/Art

Representative bio signal

Video vs Audio

Data amount

Psychological sensitivity

Required quality

Multimedia independency

Applications

Speech synthesis

Music Synthesis

Sound Synthesis

Speech Recognition

Speech recognition and Synthesis

Speaker Verification and Identification

Music information analysis

Speech and Audio Coding

Others

Share on

Leave a comment

You may also enjoy

[Java] Thread를 활용한 응답

[Coding Test] Lower Bound and Upper Bound

[HISTORY] BLOG REOPEN

[Java] LocalDate, LocalTime and LocalDateTime