Community mailing list archives

community@mail.odoo.com

Re: OCR of incoming invoices

by
Torvald Baade Bringsvor, Torvald B. Bringsvor
- 06/19/2015 05:08:13
Hi Zoltan

Good point you are making there. Yes, the invoices are 80% received as PDF attachments to emails. And I guess that 80% of those PDFs are actually text format PDFs when you look at it. But to recognize the structure of the PDF, even when it is text, is no easy job. I guess what I should have written was data extraction in general, but including OCR preferably.

Also a problem is that while most suppliers, at least the Europeans, send nice semi-standardized PDFs we also get some invoices from China for example which can be an Excel sheet, maybe even listing information from several orders at once (collective invoicing). But I guess we have to leave some work for the accountants...

-Torvald



Torvald Baade Bringsvor
Bringsvor Consulting AS - Odoo (formerly OpenERP) implementation partner


2015-06-19 10:59 GMT+02:00 Zoltan Gabor <zgabor@rdslink.ro>:

Hi Torvald,

My question would be: do you really need OCR for that ? Because O.C.R. stands for Opthical Character Recognition which is used to transform (usually scanned) images into texts.
Now, from what you say the informations are already extracted (in the current system) from pdf's, which for me translates to the fact that these pdf's are provided in the digital format and not on paper to be scanned later and transformed by OCR software. Furthermore, if you ideally would like to extract the info from mail, that clearly means to me that you just need information extraction from digital (but non image) format. This could be achieved with the right libraries (by some integration approach or directly in Odoo probably). But, you have to make sure that the informations are extractable (meaning the pdf does not embed the invoice as an image but as text); you can test this easily only by searching some surely present text in the document. To be noted that invoicing systems usually produce extractable pdf's.  I did something similar with the integration approach and it is a workable way.
If, however, you have to deal with images, then you still can go with the integration approach, though things will be much more complicated, or you can choose a specialised software like Ephesoft (never tried, only aware of its existence), which also has a free community edition, plus integration with Odoo of course.

Regards,
Zoltan

Zoltan Gabor
IT Consultant
Mobile: +40 741 224622
E-Mail: zgabor@rdslink.ro

On 19.06.2015 10:42, Torvald Baade Bringsvor wrote:
<blockquote cite="mid:CAGWx-g8=z-iaSr-SCQG5vniN5d=b_MScsZ+dtc_QHhjpW1nF-g@mail.gmail.com" type="cite">
Hello Community,

Is there anybody out there that does OCR of incoming invoices? Our customer has this functionality in their current solution so that the supplier name and address and the invoice number is recognized from incoming PDFs.

Ideally I'd like this integrated when an invoice arrives by email...

-Torvald


Torvald Baade Bringsvor
Bringsvor Consulting AS - Odoo (formerly OpenERP) implementation partner

_______________________________________________
Mailing-List: https://www.odoo.com/groups/community-59
Post to: mailto:community@mail.odoo.com
Unsubscribe: https://www.odoo.com/groups?unsubscribe


_______________________________________________
Mailing-List: https://www.odoo.com/groups/community-59
Post to: mailto:community@mail.odoo.com
Unsubscribe: https://www.odoo.com/groups?unsubscribe