An audio search engine is a web-based search engine which crawls the web for audio content. The information can consist of web pages, images, audio files, or another type of document. Various techniques exist for research on these engines.
Text entered into a search bar by the user is compared to the search engine's database. Matching results are accompanied by a brief description of the audio file and its characteristics such as sample frequency, bit rate, type of file, length, duration, or coding type. The user is given the option of downloading the resulting files.
The Query by Example(QBE) system is a searching algorithm that uses Content-based image retrieval(CBIR). Keywords are generated from the analysed image. These keywords are used to search for audio files in the database. The results of the search are displayed according to the user preferences regarding to the type of file (wav, mp3, aiff…) or other characteristics.
In audio search from audio, the user must play the audio of a song either with a music player, by singing or by humming to the computer microphone. Subsequently, a sound pattern, A, is derived from the audio waveform, and a frequency representation is derived from its Fourier Transform. This pattern will be matched with a pattern, B, corresponding to the waveform and transform of sound files found in the database. All those audio files in the database whose patterns are similar to the pattern search will be displayed as search results.
Search results are modified, or suspect, due to the large hosted video being given preferential treatment in search results.
Audio search has evolved slowly through several basic search formats which exist today and all use keywords. The keywords for each search can be found in the title of the media, any text attached to the media and content linked web pages, also defined by authors and users of video hosted resources.
Some search engines can search recorded speech such as podcasts, though this can be difficult if there is background noise. Around 40 phonemes exist in every language with about 400 in all spoken languages. Rather than applying a text search algorithm after speech-to-text processing is completed, some engines use a phonetic search algorithm to find results within the spoken word. Others work by listening to the entire podcast and creating a text transcription.