pi9
Posts: 44
Joined: Wed Jun 10, 2015 3:32 pm

OCR to read square type fonts

Fri Jan 19, 2018 3:12 am

Hi
I am trying to capture images with a small web cam and then ocr them. I use fswebcam and tesseract. The problem is the font has to be almost perfect for the ocr to read them, otherwise i get a blank file. Is there a different ocr that I can try? I am looking for something that can read fonts similar to "Driver Gothic Font".

Any suggestions?

User avatar
scruss
Posts: 1821
Joined: Sat Jun 09, 2012 12:25 pm
Location: Toronto, ON
Contact: Website

Re: OCR to read square type fonts

Fri Jan 19, 2018 4:41 am

Have you trained tesseract for that font? That is time consuming, but helps accuracy
‘Remember the Golden Rule of Selling: “Do not resort to violence.”’ — McGlashan.

User avatar
DougieLawson
Posts: 34124
Joined: Sun Jun 16, 2013 11:19 pm
Location: Basingstoke, UK
Contact: Website

Re: OCR to read square type fonts

Fri Jan 19, 2018 7:37 am

Do you have control of the printing?
If you do use OCR-B which is optimised for machine reading.
Microprocessor, Raspberry Pi & Arduino Hacker
Mainframe database troubleshooter
MQTT Evangelist
Twitter: @DougieLawson

2012-18: 1B*5, 2B*2, B+, A+, Z, ZW, 3Bs*3, 3B+

Any DMs sent on Twitter will be answered next month.

pi9
Posts: 44
Joined: Wed Jun 10, 2015 3:32 pm

Re: OCR to read square type fonts

Sat Jan 20, 2018 2:31 am

Hi,
I have no control of the printing. Is OCR-B a pi software? I looked for an alternative to tesseract, but it seems it is the only one for the pi.

How do I train tesseract for that font? It doesn't matter if it is time consuming if it will give me a reliable result.

User avatar
DougieLawson
Posts: 34124
Joined: Sun Jun 16, 2013 11:19 pm
Location: Basingstoke, UK
Contact: Website

Re: OCR to read square type fonts

Sat Jan 20, 2018 8:42 am

OCR-B (as Google would have told you) is the original machine readable font. The banks were using that 35 years and more ago for document processing.
Microprocessor, Raspberry Pi & Arduino Hacker
Mainframe database troubleshooter
MQTT Evangelist
Twitter: @DougieLawson

2012-18: 1B*5, 2B*2, B+, A+, Z, ZW, 3Bs*3, 3B+

Any DMs sent on Twitter will be answered next month.

User avatar
scruss
Posts: 1821
Joined: Sat Jun 09, 2012 12:25 pm
Location: Toronto, ON
Contact: Website

Re: OCR to read square type fonts

Sat Jan 20, 2018 3:50 pm

https://github.com/tesseract-ocr/tesser ... -Tesseract (for version 3) is the one you want. Version 4 looks impossibly hard on a Raspberry Pi.

Driver Gothic looks like a spin on DIN 1451 Mittelschrift. It's not expressly an OCR font, but is very simple.

OCR-B, as Dougie said, is a font designed for computer-recognition of text. It was the European standard one. Old Amstrad CPC users like me are fond of it because that's what the Amstrad manual was set in:
Screenshot from 2018-01-20 10-24-15.png
OCR-B
Screenshot from 2018-01-20 10-24-15.png (5.74 KiB) Viewed 526 times
Fun fact: although OCR-B is an international standard and used in your passport, you can no longer get the standard drawings. Someone at ECMA threw theirs out, figuring that someone at ANSI would keep theirs. Unfortunately, someone at ANSI had just done the same …

OCR-A is one of the other early OCR fonts. It was designed for the US standard committee. It's not very pretty, but you know it when you see it:
Screenshot from 2018-01-20 10-28-57.png
OCR-A, but I think that letter p is wrong
Screenshot from 2018-01-20 10-28-57.png (4.88 KiB) Viewed 526 times
Older still are the MICR fonts, still used on the bottom of cheques and on some countries' mail. They're notably for having a pattern of blobs that when read through a magnetic reader produce a different bit pattern. They're typically only digits, but that's all that was needed when they were introduced in the late 1950s. The two standard fonts are E-13B and CMC-7 and are pretty ugly. You can still buy expensive-modified laser printers that will use iron-loaded toner to print cheques that old MICR machines can read.

I'm so old that I remember when cool, futuristic designs used faux-OCR/MICR fonts to look hip and happening. They look pretty awful now, but they graced so many computer manuals and sci-fi novel covers¹ back in the day:
Screenshot from 2018-01-20 10-42-49.png
the terrible old fonts Westminster and Data 70
Screenshot from 2018-01-20 10-42-49.png (9.11 KiB) Viewed 526 times
¹: also, cringeworthily, the late Canadian futurist/cultural theorist Marshall McLuhan's grave
‘Remember the Golden Rule of Selling: “Do not resort to violence.”’ — McGlashan.

User avatar
DougieLawson
Posts: 34124
Joined: Sun Jun 16, 2013 11:19 pm
Location: Basingstoke, UK
Contact: Website

Re: OCR to read square type fonts

Sat Jan 20, 2018 5:04 pm

E13-B MICR is a short character set with just the digits (no alphabetics) and a few special field marker characters. It was printed in ink which included enough iron filings to make it magnetisable. That could be read at 144,000 documents an hour with an IBM 3890 reader/sorter. The reader would magnetise the ink with the first head, then read it with the second.
https://en.wikipedia.org/wiki/IBM_3890
http://www-01.ibm.com/common/ssi/ShowDo ... index.html that was the last ever model of 3890 which they stopped using in 2013.

The last time I operated one of those was in December 1981 when NatWest Bank processed in excess of 5,000,000 documents per day in the run up to Xmas (they'd be lucky to see 5,000,000 cheques in a year now that the credit and debit card rules).
Microprocessor, Raspberry Pi & Arduino Hacker
Mainframe database troubleshooter
MQTT Evangelist
Twitter: @DougieLawson

2012-18: 1B*5, 2B*2, B+, A+, Z, ZW, 3Bs*3, 3B+

Any DMs sent on Twitter will be answered next month.

pi9
Posts: 44
Joined: Wed Jun 10, 2015 3:32 pm

Re: OCR to read square type fonts

Thu Jan 25, 2018 2:59 am

Thanks for the links. That's quite a bit of information to read :)
Thankfully I don't need it to read more than a couple of words.

Return to “Beginners”