Python的简单图像识别（验证码识别）

原创于 2017-09-25 20:48:09 发布 · 359 阅读

本内容遵循CC 4.0 BY-SA版权协议

本文介绍了一个简单的Python程序，用于将图片中的文字通过OCR技术转换成文本，并保存到文件中。程序使用了pytesseract和PIL两个库，还介绍了必要的环境配置步骤。

import pytesseract
from PIL import Image
import sys

reload(sys)
sys.setdefaultencoding('utf-8')

class GetDate(object):
    def Get(self):
        image = Image.open(u"C:\\a.png")
        text = pytesseract.image_to_string(image)
        return text
    def Save(self):
        text = self.Get()
        f = open(u"C:\\1.txt","w")
        print text
        f.write(text)
        f.close()

g = GetDate()
g.Save()

首先需要安装几个包才可以使用
1.pytesseract
使用pip可以直接安装：pip.exe install pytesseract
2.PILLOW
同样，pip直接搞定：pip.exe install PILLOW
3.安装识别引擎tesseract-ocr
这个需要在网上下载，安装后添加path环境变量，C:\Program Files (x86)\Tesseract-OCR

搞定收工，关于为何import sys还有下面几句代码可以参考下一篇博客。