Projects — Papers Past newspaper open data | National Library of New Zealand

Wondering what you can do with digitised newspaper data? Have a look at the examples on this page for inspiration. The examples use a variety of newspaper data from different sources.

Creating specialized corpora from digitised historical newspaper archives

When confronted with huge collections of digitised material, it is very hard for an individual researcher or small team of researchers with interests in a specialised topic to apply contemporary text mining and other computational methods.

Joshua Black has developed an approach to help solve this problem. The core idea is that the text mining methods used to gain insight into a specialised topic can also be used to generate increasingly focused corpora.

Creating specialized corpora from digitized historical newspaper archives: An iterative bootstrapping approach — Joshua's academic paper

New Publication: Creating Specialised Corpora from Digitized Historical Newspaper Archives — Joshua's blog about his academic paper

Newspaper Navigator

Newspaper Navigator is a project by Ben Lee during his time as an Innovator-in-Residence at the Library of Congress. The first stage of Newspaper Navigator was to extract content such as photographs, illustrations, cartoons, and news topics from the Chronicling America newspaper scans and corresponding OCR using emerging machine learning techniques.

The project has successfully pulled together millions of images from 1789 to 1963 and made them searchable as a discrete set.

Newspaper Navigator — information about the project including link to pre-packaged datasets.
The Newspaper Navigator Dataset: Extracting And Analyzing Visual Content from 16 Million Historic Newspaper Pages in Chronicling America — research article about Newspaper Navigator.

Oceanic Exchanges

Oceanic Exchanges is an international project using newspaper data from six countries (Finland, Germany, Mexico, the Netherlands, the United Kingdom, and the United States) to examine patterns of information moving across national and linguistic boundaries.

Oceanic Exchanges

Viral Texts

Ryan Cordell, Associate Professor of English, Northeastern University in the United States, runs the Viral Texts project, which uses data, visualisations, and text to explore how news articles, short stories, and poems spread throughout nineteenth century newspapers.

Viral Texts

Kumara Times

Former National Library Digitisation Advisor Greig Roulston used data from Papers Past to first build a timeline of Louis Louisch’s life based on articles from the Kumara Times, and secondly, to analyse the advertisements by using animation and AdBlock.

Mining the Kumara Times for Gold, with machines (25 mins, YouTube) — See Greig’s presentation on the Kumara Times at the 2017 National Digital Forum on YouTube

Examining the WWI Papers Past corpus

Programmer and artist Douglas Bagnall examined the reporting around World War I, using data from newspapers on Papers Past published between 1913 and 1922, to see if the use of particular terms could be mapped over time. The data Douglas used had been converted into JSON and contained the digitised text and a limited amount of metadata.

article: {type: “War reports in all CAPITALS” — A blog by Emerson Vandy, Services Manager for Papers Past, that provides some context and history for Douglas’ work.

QueryPic

Historian and hacker Tim Sherratt used newspaper data from both Trove and Papers Past to build QueryPic — a tool that graphs the results of keyword searches in newspapers over time.

QueryPicNZ — Tim explains how he developed QueryPicNZ using the DigitalNZ API
A tale of two islands — blog by former National Library staff member Gordon Paynter about QueryPic

The Battle Times

After gathering up a band of rogues to build a prototype at the National Digital Forum 2013 hackathon, Greig Roulston started to flesh out what a card game might look like if built by using Papers Past articles to ‘roll’ the cards (via the DigitalNZ API). Unfortunately the project was never finished.

Cards against the Library — read about about Greig's (abandoned) plans of world domination.

Other newspaper open datasets

Atlas of Digitised Newspapers and Metadata — an open-access guide to 10 newspaper databases worldwide.
Chronicling America — API and bulk data from the Chronicling America: Historic American Newspapers website.
Data Foundry — data collections from the National Library of Scotland.
Historical Newspapers open data — data from the Bibliothèque nationale du Luxembourg, (National Library of Luxembourg).
Newspapers as data: A collections as data project by University of Arizona Libraries — a programme designed to introduce students to data literacy and computational analysis using digitized historical newspapers from Arizona.
Trove Bulk Download — Trove’s two sample bulk downloads of digitised data.

Get in touch

Get in touch if you know of any other examples that you think we should include or if you've created something you'd like us to showcase here.

We'd also love to hear how you've found using the data, what's gone well, what hasn't worked or what might make things easier.

Email us — paperspast@natlib.govt.nz

Contact us

Strategic directions to 2030

Collaborative projects

New Zealand Libraries Partnership Programme

New Zealand Libraries Partnership Programme in action

New Zealand Libraries Partnership Programme — reports

Te Awhi Rito New Zealand Reading Ambassador

Pūtoi Rito Communities of Readers

Pūtoi Rito Communities of Readers — reports

Open data

Papers Past data

Papers Past newspaper open data

Structure of the Library

Strategy and policy

Collecting plans

Removal and Disposal Policy

Takedown or request a review process

Digitisation plans

Scholarships and awards

Helen Zwartz Scholarship

New Zealand Poet Laureate Award

Friends and advisors

Statutory bodies

Our history

Media

Projects — Papers Past newspaper open data

On this page

Creating specialized corpora from digitised historical newspaper archives

Newspaper Navigator

Oceanic Exchanges

Viral Texts

Kumara Times

Examining the WWI Papers Past corpus

QueryPic

The Battle Times

Other newspaper open datasets

Get in touch

Related content

Dataset — Papers Past newspaper open data

Copyright and re-use — Papers Past newspaper open data

Contact us

Strategic directions to 2030

Collaborative projects

New Zealand Libraries Partnership Programme

New Zealand Libraries Partnership Programme in action

New Zealand Libraries Partnership Programme — reports

Te Awhi Rito New Zealand Reading Ambassador

Pūtoi Rito Communities of Readers

Pūtoi Rito Communities of Readers — reports

Open data

Papers Past data

Papers Past newspaper open data

Structure of the Library

Strategy and policy

Collecting plans

Removal and Disposal Policy

Takedown or request a review process

Digitisation plans

Scholarships and awards

Helen Zwartz Scholarship

New Zealand Poet Laureate Award

Friends and advisors

Statutory bodies

Our history

Media

Projects — Papers Past newspaper open data

On this page

Creating specialized corpora from digitised historical newspaper archives

Newspaper Navigator

Oceanic Exchanges

Viral Texts

Kumara Times

Examining the WWI Papers Past corpus

QueryPic

The Battle Times

Other newspaper open datasets

Get in touch

Related content

Dataset — Papers Past newspaper open data

Copyright and re-use — Papers Past newspaper open data

Share