21:3 (2006:09) 21st Conference: All the News That’s Fit to Digitize: Creating Colorado’s Historic Newspaper CollectionAugust 31, 2006 at 11:52 am | Posted in Conference Reports, Vision Sessions | Leave a comment
All the News That’s Fit to Digitize: Creating Colorado’s Historic Newspaper Collection
Brenda Bailey-Hainer, Director of Networking & Resource Sharing, Colorado State Library
Reported by Gaele Gillespie
Brenda Bailey-Hainer gave an eye-opening presentation about Colorado’s Historic Newspaper Collection (CHNC). In less than one hour, Ms. Bailey-Hainer not only clearly explained the process behind the creation of Colorado’s Historic Newspaper Collection and emphasized its underlying value, but she did it in a way that would inspire almost any listener to champion similar projects in their own state.
The project has been a partnership between the non-profit Collaborative Digitization Program (CDP), the Colorado Sate Library (CSL), and the Colorado Historical Society (CHS). The initial grants that funded the project, awarded by the Collaborative Digitization Program, are now over. The Colorado Sate Library will assume long-term management for the project, and the Colorado Historical Society permits use of the microfilm collection. Ms. Bailey-Hainer’s discussion was organized in four parts: (1) overview of the project, (2) database accessibility & user reactions, (3) funding model and (4) future plans.
OVERVIEW OF THE PROJECT
By Fall 2006, Colorado’s Historic Newspaper Collection (CHNC) will contain more than 110 statewide newspapers dating between 1859 and 1923, representing four languages: English, German, Spanish, and Swedish; cover 65 counties; and have more than 500,000 digitized pages. As of April/May 2006, there are 91 newspapers dating between 1859 and 1923, representing 49 cities, 36 counties, and 315,000 digitized pages. Although the long-term goal is to be as comprehensive as possible between the years 1859 and 1928, the decision was made to begin with the earliest newspapers first, since those published prior to 1923 are in the public domain, and then pursue those published after 1923 once they have copyright clearance. The decision was made to use Olive software with a robust search engine.
Getting the newspapers’ content into Colorado’s Historic Newspaper Collection, hereafter referred to as The Collection, includes the following process: Once the microfilm is available from the Colorado State Historical Society or other state historical societies, the master negative is pulled from the archives, a duplicate is made, and it is shipped to Israel, the location of the Olive software provider, where the processing takes place. The Olive software scans the film at 300 dip and performs a “distillation” on images to obtain the best possible image. The image is put into an XML format and burned on CDs instead of using FTP. The Olive software is also used to provide the searching interface and database structure.
DATABASE ACCESSIBILITY & USER REACTIONS
The project staff wanted to provide users with very precise search results, and there were several challenges to make the collection accessible to the public. Formatting used for older newspapers was non-standard, so it was difficult for the Olive software to find matches on search terms. In addition, there were numerous title changes for each newspaper and over time, the names of cities changed or ceased to exist, and county boundaries changed. Archaic or historic language that was used at the time a newspaper was published may be unknown or unrecognizable to modern readers. The solution was to implement keyword only, with no subject headings added at the present time. The sheer size of the database added to the challenges, as some words are not searchable at all or result in too many hits, for example, Kit Carson.
Colorado institutions provide access to The Collection in several ways. The Collection is cataloged and displayed in the OPAC. There is a link to The Collection on the library web site as well as a link from the “Databases by Subject” list. The project staff would also like to provide a link up front on the web page for each participating library, but this has not happened yet. The Collection is included in the list of regional history magazines and newspapers. There are two links that project staff would like to provide but have not been able to yet: an up-front link to The Collection on the web page for each participating library and a way to link directly to the newspapers themselves. The problem of how to get to individual newspapers is particularly thorny, so until that can be provided, the link currently defaults to the regional map as described below.
Ms. Bailey-Hainer conducted several searches to demonstrate access and use of the database. At the point of log-in, the system automatically asks the user what type of computer access they are using. In case they are not using a high-resolution type of access, the system offers a different interface if a lower-speed interface is necessary. Cookies are enabled to retain settings. Users can access newspapers by region, which creates and displays a map. Choosing the map and clicking on a region within it results in a list of newspapers for that region. From a selected list of titles, the user can search a single newspaper, a group of newspapers, or search all 91 newspapers. A searcher can look at an individual article or look at the article in context as part of the full page. When looking at an article in context, it shows any hand-written notes, etc., that were part of the printed newspaper, a feature that researchers and historians really appreciate. A keyword search, for example, the town of Wray, results in a list of four newspapers and cites the beginning and ending date range for each newspaper. Clicking on any of the four newspapers displays Date / Name of Paper / Headline / Number of Words in Heading. Clicking on “Headline” brings up the article with “Wray” highlighted. Clicking on the “Reader” icon pulls up a full page in context with the article highlighted. A search by word combination for example, “Wray + Sewer,” results in a 1921 article with tabs across the top. Choosing the “Browse All” tab displays all 91 newspapers with Title / Town / County information. Select “Title” and a drop-down calendar displays. Choose “Date” and an entire newspaper displays. The user can then jump from page to page or several pages within that newspaper. A “Feature Topics” tab was created by hand to include Colorado-specific topics, for example, Sand Creek Massacre.
The Collection has been well received, and use is steadily increasing. The first year The Collection went public, there were 1.3 million views. For the current year through April 2006, there have been 1.1 billion views. It is a “sticky” site, which means a viewer stays for 35-40 minutes per use. A user survey, co-created by Utah and Virginia, was mounted at The Collection, and the results show most users are doing family history or are history researchers. Fifty-two percent of those taking the survey are over 60 years of age; the next-highest age range is 40-60 years old. Most users live in Colorado, but many are out-of-state users.
FUNDING MODEL & FUTURE PLANS
Originally, the project was funded by two grants awarded to the project partners, the Colorado Digital Program (CDP), the Colorado State Library (CSL), and the Colorado Historical Society (CHS): a Library Services & Technology Act (LSTA, the only federal legislation that funds libraries exclusively) grant in the amount of $120,000 and an Institute of Museum & Library Services (IMLS, an independent federal grant-making agency) grant of $249,232. This start-up money paid for “basics” such as the Olive software, the hardware and server, and one terabyte of storage. Currently the project is funded primarily by donors and contributors. The project has received more than $325,000 from libraries, museums, friends’ groups, city governments, and foundations. There are still 22 million pages available for future digitization.
There are several collection development policies that have grown out of the project and will shape its future. As mentioned, the original grant money is gone and the project is currently driven by donations. Fundraising efforts will target counties with no newspapers online. Project staff are still hunting for the earliest available published issue for some historical newspapers published in Colorado which may be held in some other state’s collection. A continuing focus will be to obtain and process issues of newspapers published prior to1923. Issues published after 1923 are no longer in the public domain and pose copyright concerns, which will make getting access to them more time consuming and labor intensive. The copyright concerns are many. Digital processing of the newspapers is done from microfilm and project staff need to work from the negatives. The Colorado Historical Society paid for some of the original microfilming, but to process the earliest reels is very complicated. They must identify and work with format owners, i.e., the owner of the microfilm negative of the newspaper, and the content owners. After 1923, that is the newspaper owner and/or their heirs, writers, photographers, and newswire services. The Colorado Historic Newspaper Collection project has several long-term goals:
- Add more newspapers monthly.
- Add more featured groups of articles.
- Add a history of each newspaper.
- Add a Colorado timeline.
- Research access feasibility of adding keywords and/or subject headings to newspaper articles. Adding these access points would be very time consuming and needs to be carefully researched and weighed against the other goals.
- Upgrade to OAI harvestable version of the Olive software.
- Make the collection Z39.50 compatible.
In closing, Brenda Bailey-Hainer said the Colorado Historic Newspaper Collection is a great public relations project for a campus or company. Such a successful and valuable project, especially one that is also popular with users, is an incentive to those who work with historic newspapers to obtain a grant and do a similar project. Please visit the site and search The Collection at www.ColoradoHistoricNewspapers.org.