[ASP] Speech, Audio, Sound Characteristics
Orientation
Speech / Audio / Sound Characteristics
Broadcasting / Communication / Entertainment Core Technology
-
Killer Application
Killer Application is defined as a feature, function, or application of new technology or product which is presented as virtually imdispensable or much superior to rival product. For example, with the advent of movement from analog communication to digital communication,MMS
orVideo streaming
has become possible in your own phone. Also, the application likeNetflix
,Facebook
,Instagram
,YouTube
are making internet traffic. You can find the realtime stat in here for fun. And you can figure it out that most of application is related with audio. -
Various independent audio service
There are so many independent audio service. In the past, Radio or Phone call were major services. But in nowdays, the commercialized applications like sleep pattern analysis app, or AI speaker are easilly seen around.
Shortage of professional engineers in academia/industry
- vs Video/Image Area
In comparison with video/image area, there are few number of professional engineers.
Related to Hobby Activities/Art
- Music
Music is a representative example. Most of people has their own genre of music and musicians is in their own league. - Movie, Musical, etc.
Representative bio signal
- Voice based health care system
- Welfare technology for the disabled
Video vs Audio
Data amount
- 4 Dimension : 2 Dimension
Psychological sensitivity
- Specific sound can make a repulsion
- Psychological change is sensitive to sound
For example, slowly and creepy BGM(Back Ground Music) create an horrible atmosphere. Vice versa, also posslible like this.
Required quality
- In general, low picture quality is tolerable but low sound quality is not.
Multimedia independency
A topic to be discussed
Video vs Audio summary and conclusion
Component | Video | Audio |
---|---|---|
Data amount | 4D | 2D |
Psychological sensitivity | Low | High |
Required quality | Low | High |
Multimedia independency | Need to discuss | Need to discuss |
conclusion:
Engineer and planner need to imporve a commercial value by colaborating two different area
Applications
Speech synthesis
- Generate speech signal involving linguistic information
- First shown in “New York World’s Fair”, 1939
- Human-machine interface (HCI) core technology
|Before 2010|After 2010| |:—:|:—:| |connect digital waveform and apply boundary smoothing to waveform|Neural Net based waveform generation| - Requirement: linguistic, acoustic professional background knoledge
- Current phase of commercialization
- Announcement server
- Audio newspaper
- DeepMind TTS demo
- Personalized speech synthesizing application
- sound color estimation from body structure
Music Synthesis
- Musical signal synthesis by electronic device
- Frequency modulation synthesis technology invented by John Chowning
- After patent expiration, 1995, many company use this FM synthesizing technology and naturally electronic instrument market expanded
Sound Synthesis
-
Traditional method
To play sample waveform when the matched event occur. It has problems about boredom from simple repetition, limitation of expressing, memory allocation. -
Sound synthesis engine
To produce sound mathematically according to the material, velocity, impact, etc. of an object. It’s more close to reality and memory efficient.
Speech Recognition
- Technology that make machine be able to understand information in speech signal
- Basically, speech recognition is second position
- User demand high recognition ability for machine
Speech recognition and Synthesis
Speaker Verification and Identification
- Verification
To verify person by extracting individual unique information in speech signal. It’s called voice password or bio signal verification - Identification
To search specific person in mixed audio signal. It’s used as voice ID or voice tag. Also, it’s applied to search scene or index in movie or video
Music information analysis
- Content based music search
Music characteristics, for example melody or rythms, would be ingredient of searching music optimized to user’s demand. Also, Query-by-humming or Query-by-short-audio are possible. And Audio genre classification is more getting exact
Speech and Audio Coding
- Technology that reduce data amount for storage or transfer.
The change from analog signal to digital signal is core of core technology for digital wireless mobile communication and portable device
- Current research trend
All sound is processed by integrated method. Also, multi-channel sound, high quality with low bit audio are the research topic. - International Institute selects standard for codec
PCM, mp3, EVRC, AMR, AAC, USAC, etc. But, there are also
de facto standard
Others
- 3D Audio / Stereo sound
- Object based audio broadcasting
- Noise elimination
- Signal restoration
- Sound type classification
- Music theraphy
- Music equipment
Leave a comment