How to convert a scanned document into editable text

How to convert a scanned document into editable text

Turn a paper document into an editable text document on your PC

We’ve all done it. You’re looking for an important document on your computer but it’s nowhere to be seen. Luckily you’ve found a printout. Simple solution, then: whack it in the scanner and it’s back on your hard-drive. Problem solved? Not exactly. What if you want to edit it? Your scanned document can be opened in image-editing software, but that’s as good as it gets. You won’t be able to open it in a word processor and make changes unless you’ve run it through special Optical Character Recognition (OCR) software that can convert scanned images into fully editable documents in just a few steps. Better still, you can do this using a free program. Read on to discover how.

1. Install SimpleOCR

Visit the SimpleOCR website. Find the DownloadSimpleOCR (Europe) link and click it to save the 9.3MB setup file to your computer. Once it’s downloaded, double-click the InstSocr.exe file, hitting Run and then Continue. Follow the instructions in the installation wizard, making sure you have Run SimpleOCR ticked when you click Finish to launch the program for the first time.

2. Make a choice

A box will pop up, prompting you to select which type of text you want to convert into an editable file. You have a choice between handwritten or machine text. Only the machine text (typed or printed copy) option is free to use, although you can try the handwriting function for up to 14 days after you’ve installed the program. In this guide we’re looking at printed text, so select the Machine Print option. Select English (U.K) from the Language menu, and click Select.

3. Setting the Scanner

Click the Add Page button at the top of the window. This will allow you to import a previously scanned document or scan a new one. If you’re scanning a document, click on Scanner Options. If your document is black text on a white background make sure that Line Art is selected under Colors. Otherwise Gray is the one to go for. Check that the resolution is set to 300, change ‘TWAIN driver ’ to Main and select A4 under ‘Paper Size’. Finally, double-check the application has linked to your scanner by clicking Select Scanner and making sure your device is selected on the list. Then, it’s just a case of saving the settings and starting your scan by clicking OK in the Source window.

4. Convert the text

The scanned text will appear in the main window. To convert it into editable text all you need do is hit the cunningly titled Convert to text button. The screen splits into two: the top pane contains the original text, the converted text in the bottom. If you’re faced with a window full of gobbledegook, it might be best to start again from scratch. Select Delete Page from the Process menu, then repeat the previous step, trying a different option under Colors.

5. Make corrections

If SimpleOCR has done its job, all you’ll need to do is correct whatever transcription mistakes it has made. The program will have already highlighted any suspect words in yellow or flagged up spelling mistakes in red. Click on the suspect word and you’ll be presented with a list of alternatives. You can choose the correct spelling from the list or type in the right word yourself. Navigate through the word list by holding down [Ctrl] and use the right or left arrow keys. You can also leave a word as it is, by clicking the Decide Later button.

6. Save the document

Work your way through the document, the words turning black as you edit them or verify that they’re correct. Once everything is sorted, you can save your new document in Microsoft Word (.doc) format or as plain text (.txt). You can also add further pages to your document by hitting Add Page and repeating the process. New pages will be automatically added to the end of the document.