[ASP] Speech, Audio, Sound Characteristics

Orientation

Speech / Audio / Sound Characteristics

Broadcasting / Communication / Entertainment Core Technology

  • Killer Application
    Killer Application is defined as a feature, function, or application of new technology or product which is presented as virtually imdispensable or much superior to rival product. For example, with the advent of movement from analog communication to digital communication, MMS or Video streaming has become possible in your own phone. Also, the application like Netflix, Facebook, Instagram, YouTube are making internet traffic. You can find the realtime stat in here for fun. And you can figure it out that most of application is related with audio.

  • Various independent audio service
    There are so many independent audio service. In the past, Radio or Phone call were major services. But in nowdays, the commercialized applications like sleep pattern analysis app, or AI speaker are easilly seen around.

Shortage of professional engineers in academia/industry

  • vs Video/Image Area
    In comparison with video/image area, there are few number of professional engineers.
  • Music
    Music is a representative example. Most of people has their own genre of music and musicians is in their own league.
  • Movie, Musical, etc.

Representative bio signal

  • Voice based health care system
  • Welfare technology for the disabled

Video vs Audio

Data amount

  • 4 Dimension : 2 Dimension

Psychological sensitivity

  • Specific sound can make a repulsion
  • Psychological change is sensitive to sound
    For example, slowly and creepy BGM(Back Ground Music) create an horrible atmosphere. Vice versa, also posslible like this.

Required quality

  • In general, low picture quality is tolerable but low sound quality is not.

Multimedia independency

  • A topic to be discussed

Video vs Audio summary and conclusion

Component Video Audio
Data amount 4D 2D
Psychological sensitivity Low High
Required quality Low High
Multimedia independency Need to discuss Need to discuss

conclusion: Engineer and planner need to imporve a commercial value by colaborating two different area

Applications

Speech synthesis

  • Generate speech signal involving linguistic information

캡처

  • First shown in “New York World’s Fair”, 1939
  • Human-machine interface (HCI) core technology
    |Before 2010|After 2010| |:—:|:—:| |connect digital waveform and apply boundary smoothing to waveform|Neural Net based waveform generation|
  • Requirement: linguistic, acoustic professional background knoledge
  • Current phase of commercialization
    • Announcement server
    • Audio newspaper
    • DeepMind TTS demo
    • Personalized speech synthesizing application
    • sound color estimation from body structure

Music Synthesis

  • Musical signal synthesis by electronic device
  • Frequency modulation synthesis technology invented by John Chowning
  • After patent expiration, 1995, many company use this FM synthesizing technology and naturally electronic instrument market expanded

Sound Synthesis

  • Traditional method
    To play sample waveform when the matched event occur. It has problems about boredom from simple repetition, limitation of expressing, memory allocation.

  • Sound synthesis engine
    To produce sound mathematically according to the material, velocity, impact, etc. of an object. It’s more close to reality and memory efficient.

Speech Recognition

  • Technology that make machine be able to understand information in speech signal
  • Basically, speech recognition is second position
  • User demand high recognition ability for machine

Speech recognition and Synthesis

캡처

Speaker Verification and Identification

  • Verification
    To verify person by extracting individual unique information in speech signal. It’s called voice password or bio signal verification
  • Identification
    To search specific person in mixed audio signal. It’s used as voice ID or voice tag. Also, it’s applied to search scene or index in movie or video

Music information analysis

  • Content based music search
    Music characteristics, for example melody or rythms, would be ingredient of searching music optimized to user’s demand. Also, Query-by-humming or Query-by-short-audio are possible. And Audio genre classification is more getting exact

Speech and Audio Coding

  • Technology that reduce data amount for storage or transfer.
    The change from analog signal to digital signal is core of core technology for digital wireless mobile communication and portable device

캡처

  • Current research trend
    All sound is processed by integrated method. Also, multi-channel sound, high quality with low bit audio are the research topic.
  • International Institute selects standard for codec PCM, mp3, EVRC, AMR, AAC, USAC, etc. But, there are also de facto standard

Others

  • 3D Audio / Stereo sound
  • Object based audio broadcasting
  • Noise elimination
  • Signal restoration
  • Sound type classification
  • Music theraphy
  • Music equipment

Updated:

Leave a comment