Computer audition

Computer audition (CA) is general field of study of algorithms and systems for audio understanding by machine. Since the notion of what it means for a machine to "hear" is very broad and somewhat vague, computer audition attempts to bring together several disciplines that originally dealt with specific problems or had a concrete application in mind.

Inspired by models of human audition, CA deals with questions of representation, transduction, grouping, use of musical knowledge and general sound semantics for the purpose of performing intelligent operations on audio and music signals by the computer. Technically this requires a combination of methods from the fields of signal processing, auditory modelling, music perception and cognition, pattern recognition, and machine learning, as well as more traditional methods of artificial intelligence for musical knowledge representation.

Like computer vision versus image processing, computer audition versus audio engineering deals with understanding of audio rather than processing. It also differs from problems of speech understanding by machine since it deals with general audio signals, such as natural sounds and musical recordings.

Applications of computer auditions are widely varying, and include search for sounds, genre recognition, acoustic monitoring, music transcription, score following, audio texture, music improvisation, emotion in audio and so on.

Computer Audition overlaps with the following disciplines:

The study of CA could be roughly divided into the following sub-problems:

Computer audition deals with audio signals that can be represented in a variety of fashions, from direct encoding of digital audio in two or more channels to symbolically represented synthesis instructions. Audio signals are usually represented in terms of analogue or digital recordings. Digital recordings are samples of acoustic waveform or parameters of audio compression algorithms. One of the unique properties of musical signals is that they often combine different types of representations, such as graphical scores and sequences of performance actions that are encoded as MIDI files.

...
Wikipedia