Introduction
Started in 1997, the Dutch census digitization has been conducted over a period of seven years. A large amount of people, institutes, and companies contributed to the realization of this project, which can be divided in three phases (see below). The census digitization project is still 'under construction'. During 2005 access to the information on the website will be improved. Besides the data check and correction process as well as the documentation of the tables will be completed at the end of 2005. Layout and availability of data
Because of the contributors' different approaches, different sorts of material (tables, introductions, attachments and instructions, lists, graphs, individual basic files), and emerging ideas and possibilities of digital accessability, not all information might have the same layout or availability. HGIN project
Beside the Life Courses in Context project, the Historisch-Geografisch Informatiesysteem voor Nederland is being developed. This system will enable the visualisation of data through map material. For this purpose maps will be digitized including all district-bounds (also hamlets for rural areas).
|
First project phase - Dutch Census Data Digitization 1795-1971 project
The digitization process of the Dutch censuses started in 1997, including the scanning of about 200 tapes (with more than 42,000 census document pages) and the data entry of the 1899 census (about 10,000 document pages). The results of this first project phase were both published on CD-ROM and on internet. The data entry was conducted by Statistics Netherlands and IVA data entry services (presently known as Cendris).
|
Second project phase - Dutch Census Data publication project
The second project was held between January 2002 and March 2004. Preparations were made for the digital publication of the censuses of 1795, 1829, 1839, 1849, 1859, 1930, 1960 en 1971. Check and correction of the imported data have been an essential element of this project. 1960 and 1970 census data
The original data files on individual level of the 1960 and 1971 censuses were also digitally documented. The access to these files will take place through remote execution so to guarantee personal privacy. For more information about the data analysis of these files, you may contact Data Archiving and Networked Services (DANS).
|
Third project phase - Life Courses in Context project
Finally, in the third project the censuses of 1869, 1879, 1889, 1909, 1919/20, 1947 and 1956 were imported (about 20,000 pages), partially by NIWI and partially by data-entry centres (Service Profs en P2D). The textual parts of the censuses were converted through optical character recognition (OCR), just like the figures on 1947 and 1956. The project was conducted between March 2003 and November 2004.
|
Digitized censuses and pages within Life Courses in Context project
More than 20,000 pages had to be digitized, related to 9 census years. Half of these pages covered the 1889 census. With regard to the import of the smaller censuses, NIWI took charge for a great part. About 10% of the material, including texts and small tables, was scanned and OCR-ed. The text pages (e.g. introductions) were published in PDF-format. Jaar | Pagina's | Uitvoerder | 1869 | 888 | NIWI | 1879 | 2268 | NIWI | 1889 | 10097 | Service Profs | 1909 | 3572 | NIWI/Service Profs | 1919 | 175 | NIWI | 1920 | 1903 | Service Profs | | 1930 | 70 | NIWI | 1947 | 1430 | Belfast/P2D | 1956 | 345 | Belfast/P2D |
|
Guidelines for data entry
The data entry for the last phase of the censuses digitization was conducted using the spreadsheet program Excel. Stencil-plates were created, which showed great resemblance with the original tables. In general one spreadsheet is created for each table. The framework of the tables for each province is mostly the same. The input of corresponding data, which were published in different tables, has been avoided as much as possible. Table components
A table consists of rows and colums. However there are also other components, which do not fit in this structure:
- Table titles: are already included in the Excel stencil-plate.
- Subheadings, such as the name of a province or occupational group: are entried once in a column, until e.g. the name changes. If the subheadings are repeated, they are noted in a shortened version. This is indicated in the instructions of each table.
- French or English descriptions: are not entried. The English translation of the texts will take place later.
- (Foot)notes: there are several possiblities, which are indicated in the instructions of each table.
Table contents
The real table consists of the actual content (figures):
- Table headings: are already included in the Excel stencil-plate.
- ' Precolumn' : includes mostly a hierarchical structure. In each table an instruction about this hierarchical classification is submitted. In principle the precolumn is entried, but in some cases the precolumn is copied from another table and reused for tables yet to be entried (e.g. classification in municipalities).
- Empty cells: in the printed tables these are indicated in different ways. Examples: (blank), a line (-), or an apostrophe ("). (line). Empty cells are left blanc in the spreadsheets.
- Unreadable information: some numbers are unreadable. If this is caused by damaged pages, NIWI looks for more readable book copies in other libraries. Rare unreadable figures are noted in the comment-field (# or as agreed upon).
- Metadata: on the right of each table, metadata are included which refer to the census type, census year, table number, page numbers, province, and image numbers.
Total values
Total, subtotal or other values (such as percentages) in the columns are computed using an Excel formula. The data entry workers check whether the computed numbers correspond with the printed numbers. In case of deviations, there are three possible reasons:
- A type error, which has to be corrected immediately.
- A count error in the publication. In this case the (correct) printed number is added to the number concerned in an Excel-comment.
- One of more unreadable numbers in the publication. In this case the (correct) printed number is added to the number concerned in an Excel-comment. If an unreadable number can be counted based on the printed or computed totals, this will be done immediately and the unreadable number will be replaced.
Totals, subtotals or other values (such as percentages) in rows are mostly entried by hand. These totals are used for checking the data entry retrospectively. In some cases Excel macros are created for direct counting:
- United cells: in some tables data from two or more cells are united. In that case one total is used for a number of cells. In the printed tables these united cells are mostly indicated through a brace (}). In the spreadsheets the value of the united cells is entried in the upper or left cell. The cells are marked by a brace or in a comment.
- Special features: some tables end with a page including special features. An example is a list of municipalities, in which the subject of the table did not appear (municipalities without blind people or unemployed persons). This information will be omitted (blanc).
|