danb35
Posts: 27
Joined: Thu Feb 07, 2013 6:00 pm

Scanner controller

Tue Dec 16, 2014 8:09 pm

I've got two older Epson all-in-one scanner/printers, and neither one prints very well any more. That doesn't bother me much, though, because I now have a nice color laser printer for all my printing needs. However, it means the "copy" function of those printers doesn't work any more, and that's useful thing to have. The scanner part still works fine, but it never worked too well with the network. I wanted to be able to scan, unattended, to a network share, or better yet, to an owncloud folder.

The following is what I've managed so far. My goal is to add a couple of pushbuttons, and maybe a few blinkenlights to indicate status, so that I or my computer-phobic wife can just drop some papers in the document feeder, press a button, and let the system do its thing. I don't have buttons or lights connected yet; I need to run things from the command line. I plan to update this thread with how I add those in the future; for now, this thread serves as a record for myself of what I did, and will hopefully help others who may want to do the same thing.

Starting from a current raspbian image, I needed to install several packages:

Code: Select all

sudo apt-get install libtiff-tools sane-utils imagemagick davfs2 cups tesseract-ocr tesseract-ocr-eng
The latter two packages will let you do text recognition on your scans, but it's slow. More on that later.

Next, the script. This runs from the command line, scans from the document feeder on my scanner, and converts the output to PDF. It's heavily based on (i.e., copied wholesale from, with modifications to make it work for me) the script here: https://gist.github.com/anonymous/311548 You'll notice that the OCR parts are commented out. This script is saved as scan2pdf-noocr.sh in the pi user's home directory:

Code: Select all

#!/bin/bash

SOURCE=""

if [ $# -gt 1 ]
then

outname=$2
pbreak=$1

echo "$pbreak" | egrep "[^0-9,]+"
if [ $? -ne 1 ]
then
echo "Check Sequence List !"
exit 1
fi
else

pbreak=99
outname=$1
SOURCE="--batch-count=1"

fi

startdir=$(pwd)
outdir=/mnt/owncloud/Scans
tmpdir=scan-$RANDOM
lang=eng
mode=Gray

cd /tmp
mkdir $tmpdir
cd $tmpdir
echo "################## Scanning ###################"
scanimage -x 212.8 -y 279.4 --batch=out%02d.tif --format=tiff --mode $mode --resolution 300 -l 3 --source Automatic\ Document\ Feeder

start=1
cnt=1
sc=$(echo "$pbreak" | cut -d"," -f1-99 --output-delimiter=" " | wc -w)
for pb in $(echo "$pbreak" | cut -d "," -f1-99 --output-delimiter=" ")
do
ende=$(expr $start + $pb - 1)
pnr=0
i=1
echo "############ Page-Sequence ($cnt), Pages: $pb, Start: $start, End: $ende ############"
tpages=""
for page in $(ls out*.tif); do
pnr=$(expr $pnr + 1)
if [ $pnr -ge $start -a $pnr -le $ende ]
then
echo "... Converting"
# increase contrast and reduce colordepth 
convert $page -level 15%,85% -depth 2 "b$page" 
# echo "... OCRing"
tpages="$tpages b$page"
i=$(expr $i + 1)
echo -n " "
# tesseract $page $page -l $lang
if [ $sc -gt 1 ]
then
cnts=`printf %02d $cnt`
# cat $page.txt >> $outname.$cnts.txt
else
echo "">/dev/null
# cat $page.txt >> $outname.txt
fi

fi
done

echo "... Converting to PDF"
#Use tiffcp to combine output tiffs to a single mult-page tiff
tiffcp $tpages output.tif
#Convert the tiff to PDF
if [ $sc -gt 1 ]
then
cnts=`printf %02d $cnt`
tiff2pdf -z -p letter output.tif > $outdir/$outname.$cnts.pdf
# mv $outname.$cnts.txt $outdir
else
tiff2pdf -z -p letter output.tif > $outdir/$outname.pdf
# mv $outname.txt $outdir
fi

start=$(expr $start + $pb)
cnt=$(expr $cnt + 1)

done

cd ..
echo "################ Cleaning Up ################"
rm -rf $tmpdir
cd $startdir
It's invoked from the command line with

Code: Select all

./scan2pdf-noocr numpages filename
numpages is the max number of pages to scan (use 99, for example, and it will scan up to 99 pages, or stop when the ADF is empty), and filename is the filename to save without the extension--it will save as filename.pdf. You'll notice that the script has a variable at the beginning for the output directory; you can change this to be wherever you want, but I have it set to a directory under Owncloud, which I have running on my main server.

Now, OCR is a handy thing, but as I mentioned, it's slow on the Pi. Using my scanner over the network to scan a 10-page document in grayscale using this script took 4 minutes, 40 seconds. Adding OCR brought the total time to 48 minutes, 46 seconds. I don't see that it's worth the time, at least for routine scans.

I'm running owncloud on my own server, using a self-signed SSL certificate. To get raspbian to accept that certificate without squawking, I followed the instructions at https://www.brightbox.com/blog/2014/03/ ... tu-debian/ to add my CA certificate to the raspbian installation. It's pretty easy--just add any certificate or certificates you want to /usr/local/share/ca-certificates/, and then run sudo update-ca-certificates. To test, I ran 'curl https://www.myserver.com', and noted that it completed without SSL errors.

Now I can set up the mount of the owncloud share. I created an owncloud user specifically for the raspberry pi, which I creatively called pi. I then created a folder in that user's account which I called Scans, and shared that folder with myself and my wife. Next, I added this line to /etc/fstab:

Code: Select all

https://www.myserver.com/owncloud/remote.php/webdav /mnt/owncloud davfs file_mode=666,dir_mode=777
I also added this line to /etc/davfs/secrets:

Code: Select all

https://www.myserver.com/owncloud/remote.php/webdav pi password
Now I'm ready to mount the owncloud share:

Code: Select all

sudo mount /mnt/owncloud
And now, when I run "./scan2pdf-noocr.sh 99 mydocument" on the pi, it scans whatever's in the ADF over the network, converts it to a PDF, and stores it in the shared owncloud folder. A cleaner way to go, IMO, than plugging a flash drive into the scanner, scanning the document, and then taking the flash drive to my computer (I hate sneakernet if I can avoid it).

From this point, printing a copy of what you've scanned is trivially easy. Set up CUPS to work with your printer using one of the many existing guides on the subject. Then just do "lp filename.pdf" and the file will print to your printer.

Please note that SANE isn't very consistent with scanner options or what they're called. For my Epson Workforce 600, the source has to be specified as "Automatic Document Feeder"; "ADF" wouldn't do. For some other scanners, "ADF" is the way to go. This may take some experimentation with your particular hardware.

ankitgoenka
Posts: 7
Joined: Tue Sep 29, 2015 3:23 pm

Re: Scanner controller

Sat Sep 17, 2016 6:50 am

lovely tutorial

is this similar to https://www.youtube.com/watch?v=obNpPtNltdU

i understand that the video i am sharing is for scan to email
i believe your project is similar but it saves to a shared folder!
plz confirm and oblige.

Return to “Other projects”