Procedure 20-10

Document Digitization

Adoption

Date : 2022-03-17
Instance of approval : University Secretary-General

Originating/Responsible Department : University Archives

1.0 Purpose

1.1 The purpose of this procedure is to provide direction when undertaking a scanning project or establishing a scanning standard for a service, faculty or research unit and to ensure that scanned information is trustworthy, reliable and recognized as the authentic copy at the University or, as a means of protecting vital information.

2.0 Application

2.1 This procedure applies to all University information assets, regardless of medium, created or collected in the course of University business, that comply with section 2.2. This procedure does not apply to born-digital information.

2.2 Information assets are suitable for digitization if:

  • They are frequently used to support day to day operations or decision-making;
  • They are essential to the provision of services;
  • They are needed in order to action files via use of a workflow;
  • They are needed by multiple users;
  • The users are geographically dispersed;
  • The users require immediate access;
  • The original format makes it difficult to access the information asset (e.g., large map);
  • The information asset is fragile. Scanning will allow access to a copy and will thus protect the fragile original from damage that can result from handling.

Information assets that are NOT suitable for digitization include the following:

  • Draft or preliminary versions;
  • Printed copies of a born-digital information assets; and
  • Information that has intrinsic value, or that has been identified as archival and is to be transferred to Information and Archives Management Service (IAMS) in their original formats. Always confirm with IAMS whether the information has intrinsic value before proceeding with digitization.

3.0 Compliance Framework

3.1 Use of this procedure will ensure digitization activities comply with the following laws and standards:

Evidence Act, R.S.O. 1990, c. E.23

Electronic records as documentary evidence CAN/CGSB-72.34-2017

Copyright Act (R.S.C., 1985, c. C-42)

Freedom of Information and Protection of Privacy Act, R.S.O. 1990, c. F.31

3.2 Use of this procedure will ensure digitization activities comply with the following University of Ottawa policies and procedures:

 Policy 23- Information Management

 Procedure 20-4- Disposition of Information

 Policy 90- Access to Information and Protection of Privacy

 Policy 117- Information Classification and Handling

 Policy 119 – Accessibility

3.3 If digitization is completed by an external service provider, they must comply with all  policies, procedures and standards listed in this procedure.  This requirement must be expressly written in any contracts or agreements with third parties conducting digitization.

4.0 Responsibility

4.1 All Faculties, Departments, Services, and research units: Follow digitization procedures, consult with the IAMS on scanning and preserving source information assets that are found in a unique format (Ex. Bound volumes, oversize material, maps and drawings), or have particular archival value or interest. Verify the retention periods of source information assets before beginning, and check output format standards as per Appendix C.

4.2 Information and Archives Management Service (IAMS): Develop and maintain procedures, provide advice and guidance to units undertaking digitization projects, approve destruction of original source information assets, provide advice regarding the preservation of information assets that may be of historical interest.

5.0 Procedure

The digitization process includes the steps outlined in the following section. It should be noted that the workflow may vary depending on the nature and scope of the project.

Step 1

Step 2

Step 3

Step 4

Step 5

Step 6

 

Physical Preparation of Source Material

 

Information Classification

 

Prepare Scanning Log

 

Digitization

 

Quantity and Quality Control

 

Disposition and Ongoing Management 

 

5.1 Step 1: Physical Preparation of Source Material

Carefully prepare the information assets for digitization to reduce the risk of deterioration and poor-quality digitization.  Physically prepare the material by unfolding pages, placing the pages in the right order, removing bindings, duplicates, staples, paper clips, notes (post-its, memos etc.), and checking, for example, whether the information asset contains irregular formatting or highlighting. Include notes in scanning if they contain information.

5.2 Step 2: Information Classification

5.2.1 Ensure the material is identified and organized according the Classification and Retention Schedule.

5.2.2 Determine the order in which you will conduct the digitization (Ex. Alphabetical, chronological, by class code, oversize or notes before the main document etc.), and prepare the final location (according to University Information Management best practices) where the scanned copies will be saved by creating the file structure according to the Classification and Retention Schedule.

5.3 Step 3: Prepare Scanning Log

5.3.1 A scanning log, which includes all the necessary information, documented in one place, should be kept for each digitization initiative. The log should document the specific choices made for information in the current project. 

 5.3.2 Scanning logs should be maintained during the entire digitization process, with a view to tracking the work being done and keeping a record of the technicians doing the work. Logs should contain sufficient information to provide evidence of the authenticity and reliability of the digitized information assets. See Appendix A for the scanning log template (all information in the template is the basic information required for each digitization initiative).

5.3.3 All information in the template must be tracked, but additional information could be included in the scanning log:

  •    Requirements and procedures for preparation of the information assets, output formats, technical specifications, error     tolerance, etc. See Appendix C for format standards; 
  • For out-sourced digitization, maintain all documentation with the vendor, including contracts, agreements, progress reports, monthly volume and costs, invoices, and error reporting; 
  • Chain of custody logs for the transfer of both source information assets and digitized copies. 
  • Document any re-scanning that was conducted if errors were found in the quality/quantity control checks

 

5.4 Step 4: Digitization

Scan source information assets (Xerox WorkCentres/photocopiers can be used) and ensure scanned copies are created in the appropriate format standard with OCR (Optical Character Recognition) (See Appendix D: Definitions) for information with text.  Xerox WorkCentres/photocopiers can apply OCR through the “searchable” feature, but it will only detect one language at a time. If your information asset is bilingual, apply OCR to the scanned copy using a separate software such as Adobe Pro. See Appendix B for format standards. Name the digital copy and save it to the appropriate location. Complete the relevant portions of the scanning log at every stage.

5.5 Step 5: Quantity and Quality Control

At each stage of the digitization process, it is essential to evaluate the results. Two facets are involved: quantity control and quality control.

5.5.1 Quantity control: verify that each page has been digitized, either by manually counting the pages before digitization or by relying on automatic functions in the digitization software that mark and count the pages and then compare the result obtained to the number of pages digitized.

5.5.2 Quality control: Preserving the integrity of the information asset’s content and ensuring that the text is very legible, that no sentence or word has been cut off, that OCR (Optical character recognition) has been applied, and that the resolution and contrast is sufficient, are essential.  Confirm that the scanned copy is an accurate reproduction of the source information asset and that its resolution is appropriate for the type of information asset and how it will be used. The quality of the reproduction (for example, brightness, contrast, colours) must be checked to avoid losing detail in shaded areas and to avoid recording images that are unclear or incorrectly oriented or cropped. Visual checks of each digitized information asset must be performed to ensure and record the integrity of the reproduction. Ensure all scans are legible, no skew, no noise or speckle, no missing parts or pages, minimal colour dropout (colour removal on scanned image to increase OCR capability), good contrast between text and background

5.5.3 Complete relevant portion of the scanning log once these checks have been completed.

5.5.4 Repeat digitization (See Step 5.4) if any errors are detected during the quality and quantity control checks.

5.6 Step 6: Disposition and Ongoing Management

5.6.1 Digitized information assets must be treated the same as their original, based on  the University’s Classification and Retention Schedule, as well as any and all procedures or legislation to which the source information asset was subject to. Ex. If the original version was restricted for access to a certain unit/group of people, the scanned copy should also be restricted in the same way as the original.

5.6.2 Source information assets having undergone digitization can be destroyed 12 months after their digitization and when the digitization process has been documented and validated. If there are doubts about the integrity (quality and quantity) of the digitization, the information asset cannot be destroyed, and must be re-digitized.

5.6.3 Only the highest level director (or the highest-level manager if there is no director) can authorize the destruction of source information assets in accordance with Procedure 20-4. Update the scanning log as described in section 5.3 Prepare Scanning Log of this procedure.

EXCEPTION

13. No exceptions to this procedure can be made without the written approval of the Director, Information and Archives Management.

 

Appendix A: Scanning log template

* All information in the template is required for each digitization initiative.

Description of materials(class codes or heading titles,) Dates of creation of materials Scanned by Quality and quantity checks( See Appendix B) and any errors corrected Original destroyed (if applicable)

Example A: Applications for admission

Date (YYYY/MM/DD)

Initials

2009-2010

S.S.

 

Date (YYYY/MM/DD)

Initials

2021/12/01-2021/12-03

L.M.

 

Date (YYYY/MM/DD)

Initials

12/3/2021

S.S.

 

Date (YYYY/MM/DD)

Initials

12/3/2021

S.S.

 

 

Appendix B: Quantity and Quality Control checks

 

Quantity Control

 

Yes

No

To be revised

Comments

Does the scanned copy have the same number of pages/images as the source information asset?

 

 

 

 

Is the digitized copy of audio/visual material of the same length as the source information asset?

 

 

 

 

Quality Control

Is all the text, images, or sound legible/audible?

 

 

 

 

Are all details complete and no skew, speckles, blur, or noise present?

 

 

 

 

Is the colour contrast appropriate?

 

 

 

 

Do the resulting scanned copies have the same information as the source information asset?  

 

 

 

 

                 

 

Appendix C : Format Standards [1]

Paper

File format

PDF/A (with OCR)

Bit depth

8 bit greyscale, 24 bit colour

Resolution

Minimum 300 ppi, recommended 400ppi

Microfilm/microfiche

File format

TIFF or PDF/A (with OCR)

Bit depth

8 bit greyscale

Resolution

Varies based on original size of document

 

Maps and plans

File format

TIFF, GeoTIFF

Bit depth

8 bit greyscale, 24 bit colour

Resolution

Minimum 300ppi, recommended 600 ppi

Photographs

File format

TIFF

Bit depth

8 bit greyscale, 24 bit colour

Resolution

Minimum 300 ppi, recommended 400 ppi, 600 ppi for fragile

Negatives/Slides

File format

TIFF

Bit depth

8 bit greyscale, 24 bit colour

Resolution

Minimum 4000 pixels on longest edge, recommended 4000-6000 pixels on longest edge

Colour profile

RGB

Moving images (video)

 

Video/Film: Stream

Film: Image + Audio

File format

Stream: Uncompressed/MOV, Uncompressed/AVI, JPEG2000/MXF OP1a

Audio: LCPM

Image: DPX

Audio: LCMP/BWF

Bit depth

Stream: 8 bit

Audio: 16 bit

Image: 10 bit

Audio: 16 bit

Resolution

Stream: match original

Image: 4K-4,096 ppi (35mm)

2K-2,048 ppi (16mm)

 Sample Rate

Stream: variable, 30-50mbps

Audio: 48khz

Audio: 48khz

Colour

4Y,2Cb,2Cr (4:2:2)

RGB

Audio

File format

BWF(Broadcast wave) with LPCM encoding (Linear Pulse Code Modulated Audio)

Sample and Bit depth

96kHz/24 bits

 

Appendix D: Definitions

Digitization: the conversion of text, pictures, or sound into a digital form that can be processed by a computer.

Source information asset: information in the original format in which it was created, regardless of medium.

Information asset: information that is generated or managed by the University and has value to the University.

Integrity : Integrity refers to the reliability of information content, processes and systems as to its completeness, accuracy, consistency and authenticity

Disposition: means the final retention action carried out on an information asset. This may include destruction, deletion, secure destruction or deletion, or transfer for archival review or to a third party.

Scanned/digitized copy: information that has been converted from a source information asset to a digital format that can be processed by a computer.

Reliability: Reliability refers to the degree to which the quality of information content, processes and systems can be depended upon to be trustworthy, complete, accurate and authentic.

Authenticity: An authentic information asset is one that can be proven to be what it purports to be; to have been created or sent by the person purported to have created or sent it; to have been created or sent at the time purported.

OCR (Optical Character Recognition): the identification of printed characters using photoelectric devices and computer software. Xerox WorkCentres/photocopiers can apply OCR through the “searchable” feature, but it will only detect one language at a time. If your information asset is bilingual, apply OCR to the scanned copy using a separate software such as Adobe Pro.

 

[1]Compiled from National Heritage Digitization Strategy “Digitization Best Practices and Recommendations”, April 2019, https://cnhds.files.wordpress.com/2019/05/nhds-digitization-best-practices-and-recommendations-2019.pdf as well as Library and Archives Canada, “Guidelines on File Formats for Transferring Information Resources of Enduring Value”, 2015-02-05 https://www.bac-lac.gc.ca/eng/services/government-information-resources/guidelines/Documents/file-formats-irev.pdf or https://www.bac-lac.gc.ca/eng/services/government-information-resources/guidelines/Pages/guidelines-file-formats-transferring-information-resources-enduring-value.aspx