Free Online OCR: web app delivers editable text from scanned images or PDFs

Do you have PDF documents or images (e.g. JPG, PNG, TIFF) that were created using a scanner, and that you wish you could convert to editable text? Free Online OCR is a web service that can perform optical character recognition (OCR) on your scanned images and/or image based PDF documents, in order to generate a “normal” text that can be subsequently edited or used in other applications.

This free service works as follows: you upload your PDF files on the website, choose the output format (e.g. word or RTF), and then you can download the editable file once the processing is completed. Note that unlike most commercial desktop-based OCR engines, Free Online OCR does not provide any post-processing editing tools once the OCR process is complete.

In most cases, you should not expect quick, single-click conversions though; depending on your source, you may have to wait a long time for your document to process, and there’s no guarantee that it will do so successfully. Moreover, the service will accept PDF files that are a maximum of 20 megs in size, so you may need to split your source into several pieces. The OCR process is highly dependent on the quality of your source, so you may need to manually embellish the quality of the source images (e.g. sharpen), as well as perform a lot of manual post-processing editing and patching up.

Free Online OCR screenshot2

Here’s a quick guide on how to use this service:

- File size constraints: are 20 megs. If your source is larger (which is very likely for scanned PDF ebooks), you can split it into appropriately sized pieces with a program like PDFSam (which will let you define output sizes in megs).

- The quality of your source: has a lot to do with the quality of the output you get. If need be, you can export your PDF to individual JPEG’s using the free PDF-XChange Viewer (download via the widget on the right; portable version also available) then sharpen each image with an image editing app (e.g. Jpeg Enhancer, PhotoScape are two free tools that can do this). Note: this can be quite labour intensive, and is more appropriate for short documents.

Processing times: can be quite long. My advice: start with a few pages (3-4) to see what the output is like, which will spare you a long and potentially fruitless wait, and will give you an idea of whether you should proceed or whether you should try to enhance the source as mentioned above.

Output quality: once your file is OCR’d, you can download via the provided link. You will most certainly have to do a lot of manual fixing, though.

Here’s a list of PROs and CONs in relation to this service:

PROs:

  • Does a reasonable job: OCR quality is good enough, but not necessarily outstanding.
  • Many formats supported: PDFs or images (JPG, PNG, BMP, GIF, or TIFFs) as input, DOC, RTF, TXT, or searchable PDFs as output.
  • Preserves layout and some formatting: including font size, bold and italic, and bulleted lists.
  • Will process ’low quality’ images: i.e. the typical 300 dpi images of most documents.
  • Simple, easy to use interface: clean and straightforward.
  • Maximum upload size: is reasonable enough. At 20 megs you can upload most short documents in their entirety. For longer ebooks you will have to split your source as mentioned above. (And yes you will find ’Upload size’ in the CONs section below as well; I have no qualms contradicting myself!).
  • Confidential: their FAQ purports that your documents are automatically deleted once they are processed. (Note: I am merely reporting this and am not responsible for its veracity).

CONs: (or wish list, of you like)

  • Processing time can be long: I shudder to think about what the wait would be like if the service became very popular.
  • Will make you wait a long time before … failing: makes me wonder: why can’t I see what you got, so I can at least figure out what the state of my document is.
  • Upload size is 20 megs: I know that it’s quite reasonable as far as uploading documents to web services goes. But OCR’d documents tend to be very large almost by definition. Consider this a wish list item: I wish the upload size limit was larger.
  • No post-processing editor: which is to be expected for a web service, of course. But compared with most professional (read: for-pay) desktop OCR apps this is a considerable disadvantage. (
  • No option to email your document back to you: which means you cannot upload your document and move on to other things; you will have to keep your browser window open and wait.

The verdict: overall, an excellent OCR option. As a web service it is more suited for small documents rather than full-length ebooks, which at any rate is probably what most people will want to use this for anyway.

As a web service it has three disadvantages (1) some people may not feel comfortable uploading sensitive documents online, despite the promise of confidentiality, (2) the restriction on max upload size with respect to large documents, and (3) the fact that there are no post-processing tools.

That being said, good free OCR programs and services are few and far between; in fact, this service did a much better job that previously mentioned FreeOCR.net in my tests of it. As such this service is an excellent, welcome addition to repertoire of available, free OCR tools.

Compatibility: any OS running a modern browser.

Go to the Free Online OCR home page.


 
 
 
Samer Kurdi

Samer Kurdi

Has been reviewing software since 2006 when he started Freewaregenius.com
flattr this!
  • Rob

    I spent most of the morning playing with the (few) OCR web-services, only to find this pop up in my RSS reader just when I’ve finished!

    The one that worked best for us was onlineocr.net.

    It happily accepted A4@600dpi which others refused (often without even a useful error message), and has a large list of languages than others (we needed Estonian) as getting the language right is important for good OCR. Results were also good, although YMMV.

    Obvious cons: some features (e.g. PDF/ZIP input) require registration and payment (a small trial available)

    /not-a-shill!

    • Samer

      @ Rob: thanks for the info. It looks like it might be a good service, with one caveat: the 15-images-per-hour limit for the free service would render it more or less crippleware in my book.
      That said I am thinking about a comparative review for free OCR services, and will look into this service further at that point.

  • Rob

    I can see that.

    I guess it all depends on what your use case is. For us it was occasional office use to speed up retyping a few pages of something we only have in dead tree form. As such, the 15 pages-an-hour limit is a non-issue for us, whereas the Estonian language coverage was a requirement. Unfortunately, free-online-ocr only does English (I just threw some Spanish at it, and it failed horribly.)

    An OCR comparative review might be worthwhile — I certainly found your pdf-to-word review helpful in the past.

    /still-not-a-shill-honest!

  • Mark

    Free Online OCR may have some cons but at least it’s a free service and uses authenticate OCR technology. I read that onlineocr.net took their OCR from OmniPage 15, and onlineocr.com uses ABBYY. How would I trust a service which took someone else’s technology? I would never send my documents to hose people.

  • Rob

    Mark, you’re just being ridiculous. People license other peoples’ technology all the time — in fact you have done this dozens of times yourself simply by owning a PC. This does not make you any less trustworthy.

    Likewise, I neither know nor care where Free Online OCR’s OCR technology came from (or indeed anyone else’s), but I do know where their web-server platform came from. And for this they certainly did take “someone else’s technology” (in this case Microsoft’s.) I trust them just as much now as I did 2 minutes ago before looking up this factoid.

    In the tech world, no product or service is built entirely using in-house technology. We are all standing on the shoulders of giants. Yes, even f-o-o.com

  • Joseph

    Hi guys, I don’t wanna be a judge but I think that Mark is closer to the truth. Nowhere on ONLINEOCR.NET’s site is mentioned that it legally uses Omnipage OCR technology. Also the domain ONLINEOCR.NET is registered though ‘Domains by Proxy, Inc’ which hides the information of the registrant and one of the four other domains hosted on the same server is ONLINEOCR.RU. So Rob do you think that ONLINEOCR.NET is paying expensive license fees to the owners of Omnipage?

  • Rob

    re: Domains by Proxy

    no big deal, to be honest. And besides, the level of verification for whois data by most registrars is non-existent anyway. All that Domains by Proxy does is make this explicit.

    re: back-end tech

    We only have Mark’s unverified assertions about what each site is using on the back-end. I’ve tried googling for the same information, but to no avail. As wikipedia would say, “[citation needed]”.

    re: licensing tech

    I have absolutely no idea what licensing agreements are in place between any of these services and their tech providers. And nor do you. And nor does Mark. Similarly, I have absolutely no idea whether f-o-o.com or any of these other services is using a legit copy of Windows or not. And nor do you. And nor does Mark. See also: their anti-virus app, backup system, monitoring service, remote admin utilities, etc.