Ink drawings of a water pump, sew machind table, accordion camera and surveying equipment.

Projects — Papers Past newspaper open data

Wondering what you can do with digitised newspaper data? Have a look at the examples on this page for inspiration. The examples use a variety of newspaper data from different sources.

Creating specialized corpora from digitised historical newspaper archives

When confronted with huge collections of digitised material, it is very hard for an individual researcher or small team of researchers with interests in a specialised topic to apply contemporary text mining and other computational methods.

Joshua Black has developed an approach to help solve this problem. The core idea is that the text mining methods used to gain insight into a specialised topic can also be used to generate increasingly focused corpora.

Creating specialized corpora from digitized historical newspaper archives: An iterative bootstrapping approach — Joshua's academic paper

New Publication: Creating Specialised Corpora from Digitized Historical Newspaper Archives — Joshua's blog about his academic paper

Newspaper Navigator

Newspaper Navigator is a project by Ben Lee during his time as an Innovator-in-Residence at the Library of Congress. The first stage of Newspaper Navigator was to extract content such as photographs, illustrations, cartoons, and news topics from the Chronicling America newspaper scans and corresponding OCR using emerging machine learning techniques.

The project has successfully pulled together millions of images from 1789 to 1963 and made them searchable as a discrete set.

Oceanic Exchanges

Oceanic Exchanges is an international project using newspaper data from six countries (Finland, Germany, Mexico, the Netherlands, the United Kingdom, and the United States) to examine patterns of information moving across national and linguistic boundaries.

Oceanic Exchanges

Viral Texts

Ryan Cordell, Associate Professor of English, Northeastern University in the United States, runs the Viral Texts project, which uses data, visualisations, and text to explore how news articles, short stories, and poems spread throughout nineteenth century newspapers.

Viral Texts

Kumara Times

Former National Library Digitisation Advisor Greig Roulston used data from Papers Past to first build a timeline of Louis Louisch’s life based on articles from the Kumara Times, and secondly, to analyse the advertisements by using animation and AdBlock.

Mining the Kumara Times for Gold, with machines (25 mins, YouTube) — See Greig’s presentation on the Kumara Times at the 2017 National Digital Forum on YouTube

Examining the WWI Papers Past corpus

Programmer and artist Douglas Bagnall examined the reporting around World War I, using data from newspapers on Papers Past published between 1913 and 1922, to see if the use of particular terms could be mapped over time. The data Douglas used had been converted into JSON and contained the digitised text and a limited amount of metadata.

article: {type: “War reports in all CAPITALS” — A blog by Emerson Vandy, Services Manager for Papers Past, that provides some context and history for Douglas’ work.

QueryPic

Historian and hacker Tim Sherratt used newspaper data from both Trove and Papers Past to build QueryPic — a tool that graphs the results of keyword searches in newspapers over time.

  • QueryPicNZ — Tim explains how he developed QueryPicNZ using the DigitalNZ API

  • A tale of two islands — blog by former National Library staff member Gordon Paynter about QueryPic

The Battle Times

After gathering up a band of rogues to build a prototype at the National Digital Forum 2013 hackathon, Greig Roulston started to flesh out what a card game might look like if built by using Papers Past articles to ‘roll’ the cards (via the DigitalNZ API). Unfortunately the project was never finished.

Cards against the Library — read about about Greig's (abandoned) plans of world domination.

Other newspaper open datasets

Get in touch

Get in touch if you know of any other examples that you think we should include or if you've created something you'd like us to showcase here.

We'd also love to hear how you've found using the data, what's gone well, what hasn't worked or what might make things easier.

Email us — paperspast@natlib.govt.nz


Feature image at top of page: Image created by Greig Roulston from pictures from the pilot dataset.