Query understanding is the process of inferring the intent of a search engine user by extracting semantic meaning from the searcher’s keywords. Query understanding methods generally take place before the search engine retrieves and ranks results. It is related to natural language processing but specifically focused on the understanding of search queries. Query understanding is at the heart of technologies like Amazon Alexa,Apple's Siri.Google Assistant,IBM's Watson, and Microsoft's Cortana.
Tokenization is the process of breaking up a text string into words or other meaningful elements called tokens. Typically, tokenization occurs at the word level. However, it is sometimes difficult to define what is meant by a "word". Often a tokenizer relies on simple heuristics, such as splitting the string on punctuation and whitespace characters. Tokenization is more challenging in languages without spaces between words, such as Chinese and Japanese. Tokenizing text in these languages requires the use of word segmentation algorithms.
Spelling correction is the process of automatically detecting and correcting spelling errors in search queries. Most spelling correction algorithms are based on a language model, which determines the a priori probability of an intended query, and an error model (typically a noisy channel model), which determines the probability of a particular misspelling, given an intended query.