*** Welcome to piglix ***

Streaming algorithm


In computer science, streaming algorithms are algorithms for processing data streams in which the input is presented as a sequence of items and can be examined in only a few passes (typically just one). These algorithms have limited memory available to them (much less than the input size) and also limited processing time per item.

These constraints may mean that an algorithm produces an approximate answer based on a summary or "sketch" of the data stream in memory.

Though streaming algorithms had already been studied by Munro and Paterson as early as 1980, as well as Philippe Flajolet and G. Nigel Martin in 1982/83, the field of streaming algorithms was first formalized and popularized in a 1996 paper by Noga Alon, Yossi Matias, and Mario Szegedy. For this paper, the authors later won the Gödel Prize in 2005 "for their foundational contribution to streaming algorithms." There has since been a large body of work centered around data streaming algorithms that spans a diverse spectrum of computer science fields such as theory, databases, networking, and natural language processing.

Semi-streaming algorithms were introduced in 2005 as an extension of streaming algorithms that allows for a constant or logarithmic number of passes over the dataset [1].

In the data stream model, some or all of the input data that are to be operated on are not available for random access from disk or memory, but rather arrive as one or more continuous data streams.

Streams can be denoted as an ordered sequence of points (or "updates") that must be accessed in order and can be read only once or a small number of times.

Much of the streaming literature is concerned with computing statistics on frequency distributions that are too large to be stored. For this class of problems, there is a vector (initialized to the zero vector ) that has updates presented to it in a stream. The goal of these algorithms is to compute functions of using considerably less space than it would take to represent precisely. There are two common models for updating such streams, called the "cash register" and "turnstile" models.


...
Wikipedia

...