User Tools

Site Tools


scanning

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
scanning [2015/12/01 20:35]
sbw
scanning [2015/12/01 20:41] (current)
sbw
Line 4: Line 4:
 ===== 1. Remove the magazines from the springback binder ===== ===== 1. Remove the magazines from the springback binder =====
  
-The magazines are all stored in springback binders, one year (usually 12 issues) per binder. Springback binders are basically like a big bulldog clip. By bending back both halves of the cover, the folder with the magazines inside can be carefully removed.{{ :​processed_7004.jpg?​nolink&​100|}}+The magazines are all stored in springback binders, one year (usually 12 issues) per binder. Springback binders are basically like a big bulldog clip. By bending back both halves of the cover, the folder with the magazines inside can be carefully removed.
  
-{{:​processed_7005.jpg?​nolink&​200|}}{{:​processed_7006.jpg?​nolink&​200|}}+{{ :​processed_7004.jpg?​nolink&​100|}}{{:​processed_7005.jpg?​nolink&​200|}}{{:​processed_7006.jpg?​nolink&​200|}}
  
 ===== 2. Remove staples from the magazine ===== ===== 2. Remove staples from the magazine =====
Line 19: Line 19:
  
 I use the following settings: I use the following settings:
-scan to PDF +  * scan to PDF 
-resolution: 300 x 300dpi +  ​* ​resolution: 300 x 300dpi 
-paper size: A4 (if you leave it on Auto, sometimes the machine will get it wrong) +  ​* ​paper size: A4 (if you leave it on Auto, sometimes the machine will get it wrong) 
-Black and white text (don't use photo settings, as you will get speckles; don't use greyscale) +  ​* ​Black and white text (don't use photo settings, as you will get speckles; don't use greyscale) 
-If you have a density setting, somewhere in the middle is probably best - too high and you will start to see speckles, too low and the text will be a bit faint. +  ​* ​If you have a density setting, somewhere in the middle is probably best - too high and you will start to see speckles, too low and the text will be a bit faint. 
-If you have a contrast setting, higher is generally better.+  ​* ​If you have a contrast setting, higher is generally better.
  
 High end office scanners normally default to PDF these days. Normally you have about 60 seconds after you have scanned the previous page to scan the next page, in order to keep it as one document. I find that it takes about 15 seconds/​page once you get going. ​ High end office scanners normally default to PDF these days. Normally you have about 60 seconds after you have scanned the previous page to scan the next page, in order to keep it as one document. I find that it takes about 15 seconds/​page once you get going. ​
  
-It may be possible to use the automatic document feeder for more recent magazines. The main risk with using the automatic document feeder is that if something goes wrong, it may damage the pages. However, most of the magazines (anything prior to 1984) are quarto size, not A4, and these ones will need to be scanned manually.+It may be possible to use the automatic document feeder for more recent magazines. The main risk with using the automatic document feeder is that if something goes wrong, it may damage the pages. However, most of the magazines (anything prior to 1984) are quarto size, not A4, which may make automatic scanning more difficult.
  
 Before you start scanning in earnest, check that the quality of the scan is as high as possible. Spend some time trying various settings to see which works best. Any areas that are not text/​pictures should ideally come out white in the scan. Any speckles or colouring reduce the ability to subsequently OCR the documents, as the below examples show. Before you start scanning in earnest, check that the quality of the scan is as high as possible. Spend some time trying various settings to see which works best. Any areas that are not text/​pictures should ideally come out white in the scan. Any speckles or colouring reduce the ability to subsequently OCR the documents, as the below examples show.
  
-1. Black and white – text/line art setting +**1. Black and white – text/line art setting** 
- +{{:​bw_line_art.png?​nolink|}} ​
  
-Result of OCR+**Result of OCR**
  
   the lap of luxury.   the lap of luxury.
Line 41: Line 41:
   But we were all looking forward to it!   But we were all looking forward to it!
  
-2. Black and white – text only setting +**2. Black and white – text only setting** 
-  +{{:​bw_text.png?​nolink|}} ​ 
-Result of OCR+**Result of OCR**
  
   the lap of luxury.   the lap of luxury.
Line 51: Line 51:
  
  
-{{:​processed_0688.jpg?​nolink&​200|}}{{:​processed_0689.jpg?​nolink&​200|}} The margins are not even, so take this into account when you are scanning, and shift the pages one way or the other, otherwise you may lose text off the sides. This is primarily an issue for older magazines which were printed on quarto paper, which is wider than A4. Newer ones are A4 and the text should fit on an A4 page regardless.+{{:​processed_0688.jpg?​nolink&​150 }}{{:​processed_0689.jpg?​nolink&​150 }} The margins are not even, so take this into account when you are scanning, and shift the pages one way or the other, otherwise you may lose text off the sides. This is primarily an issue for older magazines which were printed on quarto paper, which is wider than A4. Newer ones are A4 and the text should fit on an A4 page regardless.
  
 I suggest scanning the various cover sections (front cover, inside front cover, back cover, inside back cover) separately. This makes it cheaper and easier to OCR. For the cover (and any advertisement pages), I use a low density setting, as otherwise it comes out very dark. Again, some trial and error may be needed to get good results. I suggest scanning the various cover sections (front cover, inside front cover, back cover, inside back cover) separately. This makes it cheaper and easier to OCR. For the cover (and any advertisement pages), I use a low density setting, as otherwise it comes out very dark. Again, some trial and error may be needed to get good results.
scanning.txt · Last modified: 2015/12/01 20:41 by sbw