*** Welcome to piglix ***

Culturomics


Culturomics is a form of computational lexicology that studies human behavior and cultural trends through the quantitative analysis of digitized texts. Researchers data mine large digital archives to investigate cultural phenomena reflected in language and word usage. The term is an American neologism first described in a 2010 Science article called Quantitative Analysis of Culture Using Millions of Digitized Books, co-authored by Harvard researchers Jean-Baptiste Michel and Erez Lieberman Aiden.

Michel and Aiden helped create the Google Labs project Google Ngram Viewer which uses n-grams to analyze the Google Books digital library for cultural patterns in language use over time.

Because the Google Ngram data set is not an unbiased sample, and does not include metadata, there are several pitfalls when using it to study language or the popularity of terms. Medical literature accounts for a large, but shifting, share of the corpus, which does not take into account how often the literature is printed, or read.

In a study called Culturomics 2.0, Kalev H. Leetaru examined news archives including print and broadcast media (television and radio transcripts) for words that imparted tone or "mood" as well as geographic data. The research was able to retroactively predict the 2011 Arab Spring and successfully estimate the final location of Osama Bin Laden to within 124 miles (200 km).

In a 2012 paper by Alexander M. Petersen and co-authors, they found a "dramatic shift in the birth rate and death rates of words": Deaths have increased and births have slowed. The authors also identified a universal "tipping point" in the life cycle of new words at about 30 to 50 years after their origin, they either enter the long-term lexicon or fall into disuse.


...
Wikipedia

...