Photos of Larryblakeley
(Contact Info: larry at larryblakeley dot com)
Important Note: You will need to click this icon to download the free needed to view most of the images on this Web site - just a couple of clicks and you're "good to go."
Capturing and storing critical business documents is the first and most important step in document management. The question then is how to capture documents.
The Art of Capturing Images
Scanning is the first step in document management and is possibly the most crucial. The process has been around for a long time and the technology is not new. Many people have a scanner at home or the office so there is a great deal of familiarity with the concept of low-volume scanning.
This familiarity can lead to the perception that high-volume scanning is a commodity service that can be bought from almost anywhere.
However, there is a world of difference between scanning for small scale, ad-hoc scanning and scanning documents commercially for inclusion in a mission critical enterprise knowledge management system.
Generally, the imaging solution consists of hardware and software.
Document Imaging Workflow
Steps to Document Imaging
The integrity of the scanning process starts with document preparation. Document preparation can involve as much as 40 percent of the total labor in scanning.
Staples, paper clips and other document binders must be removed to create the loose sheets that will be suitable for high-speed scanning.
There has to be blank paper document separators between the documents.
Check the documents to make sure the pages are not upside-down, back-to-front, or in landscape format when they should be portrait. All pages need to be arranged in the same orientation.
Determine the Right Resolution
For standard A4 page black-and-white scanning of plain text, 200dpi is adequate, and the preferred resolution. However, where drawings and diagrams are presented, a higher resolution (normally 300dpi) is typically needed so as to avoid the risk of distortion of important components.
Conservative estimates hold that over a three-year period, labor represents as much as 70 percent of the total ownership costs to prepare, scan, index and output documents, and perform QA. Approximately 80 percent of this labor is evenly divided between document preparation and post-processing activities. From this, it's apparent that document scanners which reduce document preparation and eliminate rescans have the ability to generate substantial reductions in ownership costs over the life of the equipment. At the same time, it's important to note that capital and maintenance costs contribute as little as ten percent to the true cost of scanning.
This strongly suggests that the purchase price of equipment is much less significant than the scanner's ability to contribute to process and productivity improvements over time.
Monitor the Scan Quality
Quality checks of the scanned image should be done during the scanning process itself. While a standard setting is applied to all documents, this setting may not be optimal for some documents. Special attention should be given to items such as ink stamps, pencil writing, dot-matrix printed data, highlighted text and grey background with black text.
Document Indexing, Search, and Retrieval System
An example of a low cost, effective software program: Informatik Docudex
95, 98, ME, NT4, 2000, XP. (MS Access license not required)
Powerful MS Access and ODBC databases
Versatile, flexible, yet user-friendly
Up to 30 customizable indexing fields
Easy-to-use and powerful search functions
Dozens of graphic file formats: including multipage TIFF, DCX, PCX, GIF, JPEG, PNG, BMP,Targa, BMP
Industry standard TIFF compression: Group 3, Group 4, LZW, etc
Multipage TIFF and DCX files
Versatile multi-page scanning
Efficient graphics handling and display
Miniature/thumbnail display of pages
Page insert, replace, delete, append, move functions
Drag-and-Drop page repositioning tool
Size-to-Fit, Fit-to-Width, actual size
Multiple images on the screen
Page navigation, (next, prior, first, last).
Image rotation, inversion
Anti-alias (scale-to-gray) for enhanced image rendition
Graphics file format conversions
Indexing efficiency tools: shortcuts, default entries,
and export functions
Optional file conversion to PDF
Developed with LEADTOOLS
Evaluating Volume Requirements
Document scanner manufacturers design equipment that is best suited to meet a variety of page volumes, feature sets and price requirements. Common scanner market segmentation, introduced in 1999 by organizations such as InfoTrends Research Group, Inc., (Boston, MA) categorizes scanners by price and speed ranges.
The segments are High-Volume Production, Mid-Volume Production, Low -Volume Production, Departmental and Workgroup.
This guide will focus on production scanners.
High-Volume Production scanners achieve speeds of 60 pages per minute (120 images per minute) or greater. Some reach speeds of 200 pages per minute. These speed demons typically handle 10,000-30,000 pages or more in a single workday.
Mid-Volume Production scanners have rated speeds of 42-85 pages per minute, and are designed to scan daily volumes of 5,000-10,000 pages.
Low-Volume Production scanners typically handle 36-50 pages per minute, and are designed for volumes of 500-4,000 documents per day.
Duplex and Simplex Scanners
Duplex scanners capture both sides of a page, in a single pass, to increase productivity and ensure that both sides are scanned together. Simplex scanners scan one side of a page at a time. To scan a double-sided page, each side must be scanned separately, then matched together electronically. This creates the potential for quality control problems B especially in batch environments. A good rule of thumb to consider: if your daily scan volumes contain 30 percent or more of two-sided documents, then it becomes more economical to consider a duplex scanner.
Color and Bitonal Scanners
In the past, color scanning options have generally been considered expensive and particularly lacking in production level throughput. In just the past two years, color production scanners have been introduced featuring an attractive combination of price, image quality, and speed. The industry is beginning to offer more models of production color scanners with varying price and feature sets. Businesses are rapidly adopting color production scanning for several significant reasons.
Most color scanners also offer bitonal and grayscale output options. The result is that companies now have more options when choosing the most appropriate scanner for their application. Production color scanning is now just as valid a business choice as bitonal or grayscale scanning have been for years.
Scanning Speed Versus Throughput
Scanning speeds are typically quoted in either pages per minute (ppm) or images per minute (ipm), which are interchangeable values on a simplex scanner. However, on a duplex scanner, images per minute can be as much as double the speed of pages per minute. Be wary of estimating actual scanner throughput based solely on transport speed, which is typically expressed in inches per second, as this does not account for the gap between documents. Scanning speeds are often published using the landscape scanning mode, which increases the apparent speed rating because of the shorter page length (8.5 inches for landscape mode versus 11 inches for portrait mode). Many of the most popular capture software applications can automatically rotate images 90 degrees for viewing, so scanning speed can be improved by more than 20 percent.
In relation to scanning speed, throughput is more difficult to measure since it is affected by a wide variety of variables including paper type, size and quality; document preparation time; paper handling and feeder characteristics; scanner speed at a given resolution; the inter-document gap and the involvement of the operator.
Most bitonal scanners used in document imaging applications operate at 200, 240,300 or 400 dots per inch (dpi), with lower resolutions providing faster scanning speeds. For many applications, 200 dpi provides acceptable resolution in order to optimize throughput. However, an important distinction must be made between capture resolution and output resolution. For example, image processing algorithms (scaling) are used selectively to output images at 300 dpi from images originally optically captured at 200 dpi, to enhance resolution specifications. In many of these cases, the higher output resolution (300+ dpi) will produce an inferior image compared with an image captured and output with native optical resolutions (200 dpi). Often, the quality of a system's optics and sensors will provide sharper, clearer output captured at lower resolutions than a scanner that relies on software to boost its resolution values. That's why comparing scanners based upon dpi resolution alone can be misleading. Color scanning creates 24 times more visual information then bitonal scanning. Elements such as graphics, stamps, logos, signatures, and highlights are often more readable in color than in bitonal, even at a lower dpi. Color scanning resolutions of 100 and 150 dpi are considered highly comparable to bitonal resolutions of 200 or 300 dpi in both quality and image size. This is possible at a lower resolution due to the bit depth. sixteen million colors are displayed in the color image versus only two colors in the black and white image.
Since the scanner is a single element of document capture, scanning software and related interfaces must have the capability to sustain required throughput rates. The Small Computer System Interface (SCSI) has become the de facto standard for scanner system integration. SCSI works with standard ISIS and TWAIN drivers to facilitate integration with host application software. SCSI adapters are relatively inexpensive. Video interfaces are generally found in older scanner models. A video interface requires a relatively expensive host-compatible card to process the images from the scanner. With the scanner's SCSI interface, compressed images are buffered to overcome bandwidth constraints. It's important that sufficient memory is configured in the scanner to prevent potential throughput bottlenecks. Otherwise, there could be a noticeable drop in speed during a large, batch scanning job.
LizardTech's Express Server http://www.lizardtech.com/solutions/exp/exp_fnb.php
The customers for the Express Server product are those with a need to distribute imagery to remote users via Internet, intranet or secure intranet. Customers who need to deliver time-sensitive imagery such as national security and Homeland defense applications as well as customers with large archives of high quality raster images will find the Express Server invaluable to their workflows.
DjVu Image Format:
Document Express http://www.lizardtech.com/solutions/doc/
DocumentExpress with DjVu is a suite of applications for creating and manipulating highly compressed representations of scanned color documents, in an open format called DjVu. Typical compression ratios are between 300:1 and 1000:1, which brings new life and usefulness to color paper documents. Imagine this: a typical full-color, letter-size or A4-size DjVu file at 300 or 400 dots-per-inch takes a mere 50Kb, that is, the size of typical Web page.
DjVu document images are the smallest in the industry, up to 1,000 times smaller than TIFF files, and anywhere from 10 to 100 times smaller than JPEGs or PDFs depending on how these JPEGs or PDFs were created.
At the heart of DocumentExpress with DjVu and key to its technical superiority, is the groundbreaking DjVu technology. DjVu was developed in the late 1990s by a team of worldclass computer scientist at AT&T labs http://www.research.att.com/, including Yann LeCun http://www.cs.nyu.edu/~yann/, Léon Bottou http://leon.bottou.org/, Patrick Haffner http://www.research.att.com/~haffner/, Paul Howard, Bill Riemers, Yoshua Bengio http://www.iro.umontreal.ca/~bengioy, and many others.