*** Welcome to piglix ***

Zipper data structure


A zipper is a technique of representing an aggregate data structure so that it is convenient for writing programs that traverse the structure arbitrarily and update its contents, especially in purely functional programming languages. The zipper was described by Gérard Huet in 1997. It includes and generalizes the gap buffer technique sometimes used with arrays.

The zipper technique is general in the sense that it can be adapted to lists, trees, and other recursively defined data structures. Such modified data structures are usually referred to as "a tree with zipper" or "a list with zipper" to emphasize that the structure is conceptually a tree or list, while the zipper is a detail of the implementation.

A layman's explanation for a tree with zipper would be an ordinary computer filesystem with operations to go to parent (often cd ..), and the possibility to go downwards (cd subdirectory). The zipper is the pointer to the current path. Behind the scenes the zippers are efficient when making (functional) changes to a data structure, where a new, slightly changed, data structure is returned from an edit operation (instead of making a change in the current data structure).

Many common data structures in computer science can be expressed as the structure generated by a few primitive constructor operations or observer operations. These include the structure of finite lists, which can be generated by two operations:

A list such as [1, 2, 3] is therefore the declaration Cons(1, Cons(2, Cons(3, Empty))). It is possible to describe the location in such a list as the number of steps from the front of the list to the target location. More formally, a location in the list is the number of Cons operations required to reconstruct the whole list from that particular location. For example, in Cons(1, Cons(2, Cons( X, Cons(4, Empty)))) a Cons(2, L) and a Cons(1, L) operation would be required to reconstruct the list relative to position X otherwise known as Cons( X, Cons(4, Empty)). This recording together with the location is called a zipped representation of the list or a list-zipper.

To be clear, a location in the list is not just the number of Cons operations, but also all of the other information about them; in this case, the values that must be reconnected. Here, these may be conveniently represented in as a separate list in the order of application from the target location. Specifically, from the context of "3" in the list [1, 2, 3], a recording (commonly referred to as a 'path') could be represented as [2, 1] where Cons(2, L) is applied followed by (Cons 1, L) to reconstitute the original list starting from [X, 4].


...
Wikipedia

...