Syntax (programming languages)

In computer science, the syntax of a computer language is the set of rules that defines the combinations of symbols that are considered to be a correctly structured document or fragment in that language. This applies both to programming languages, where the document represents source code, and markup languages, where the document represents data. The syntax of a language defines its surface form. Text-based computer languages are based on sequences of characters, while visual programming languages are based on the spatial layout and connections between symbols (which may be textual or graphical). Documents that are syntactically invalid are said to have a syntax error.

Syntax – the form – is contrasted with semantics – the meaning. In processing computer languages, semantic processing generally comes after syntactic processing, but in some cases semantic processing is necessary for complete syntactic analysis, and these are done together or concurrently. In a compiler, the syntactic analysis comprises the frontend, while semantic analysis comprises the backend (and middle end, if this phase is distinguished).

Computer language syntax is generally distinguished into three levels:

Distinguishing in this way yields modularity, allowing each level to be described and processed separately, and often independently. First a lexer turns the linear sequence of characters into a linear sequence of tokens; this is known as "lexical analysis" or "lexing". Second the parser turns the linear sequence of tokens into a hierarchical syntax tree; this is known as "parsing" narrowly speaking. Thirdly the contextual analysis resolves names and checks types. This modularity is sometimes possible, but in many real-world languages an earlier step depends on a later step – for example, the lexer hack in C is because tokenization depends on context. Even in these cases, syntactical analysis is often seen as approximating this ideal model.

...
Wikipedia