The Global Database of Events, Language, and Tone (GDELT), created by Kalev Leetaru of Yahoo! and Georgetown University, along with Philip Schrodt and others, describes itself as "an initiative to construct a catalog of human societal-scale behavior and beliefs across all countries of the world, connecting every person, organization, location, count, theme, news source, and event across the planet into a single massive network that captures what's happening around the world, what its context is and who's involved, and how the world is feeling about it, every single day." Early explorations leading up to the creation of GDELT were described by co-creator Philip Schrodt in a conference paper in January 2011. The dataset is available on Google Cloud Platform.
GDELT includes data from 1979 to the present. The data is available as zip files in tab-separated value format using a CSV extension for easy import into Microsoft Excel or similar spreadsheet software. Data from 1979 to 2005 is available in the form of one zip file per year, with the file size gradually increased infrom 14.3 MB in 1979 to 125.9 MB in 2005, reflecting the increase in the number of news media and the frequency and comprehensiveness of event recording. Data files from January 2006 to March 2013 are available at monthly granularity, with the zipped file size rising from 11 MB in January 2006 to 103.2 MB in March 2013. Data files from April 1, 2013 onward are available at a daily granularity. The data file for each date is made available by 6 AM Eastern Standard Time the next day. As of June 2014, the size of the daily zipped file is about 5-12 MB. The data files use Conflict and Mediation Event Observations (CAMEO) coding for recording events.
In a blog post for Foreign Policy, co-creator Kalev Leetaru attempted to use GDELT data to answer the question of whether the Arab Spring sparked protests worldwide, using the quotient of the number of protest-related events to the total number of events recorded as a measure of protest intensity for which the time trend was then studied. Political scientist and data science/forecasting expert Jay Ulfelder critiqued the post on his personal blog, saying that Leetaru's normalization method may not have adequately accounted for the change in the nature and composition of media coverage.