preprocess.py 3.4 KB

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485
  1. """
  2. 图像预处理,提高OCR识别率
  3. 1. 图像光照均匀性分析,获取图像光照分布map;
  4. 2. 图像边缘分析、得到图像的边缘分布模型,得到总体的边缘度量权值w1;
  5. 3. 图像模糊度计算,得到图像模糊权值w2.
  6. 4. 对图像进行局部分块处理,利用图像对应分块的map特征、w1、w2的权值,得到图像每个分块的局部二值结果。
  7. 5. 对整幅图像局部二值化结果进行空白填充处理防止字符断裂,对分割的游离点进行分析剔除异常噪点
  8. 6. 直线剔除。
  9. Scaling To The Right Size
  10. Ensure that the images are scaled to the right size which usually is of at least 300 DPI (Dots Per Inch). Keeping DPI
  11. lower than 200 will give unclear and incomprehensible results while keeping the DPI above 600 will unnecessarily
  12. increase the size of the output file without improving the quality of the file. Thus, a DPI of 300 works best for this
  13. purpose.
  14. Increase Contrast
  15. Low contrast can result in poor OCR. Increase the contrast and density before carrying out the OCR process. This can be
  16. done in the scanning software itself or in any other image processing software. Increasing the contrast between the
  17. text/image and its background brings out more clarity in the output.
  18. Binarize Image
  19. This step converts a multicolored image (RGB) to a black and white image. There are several algorithms to convert a
  20. color image to a monochrome image, ranging from simple thresholding to more sophisticated zonal analysis.
  21. Remove Noise and Scanning Artefacts
  22. Noise can drastically reduce the overall quality of the OCR process. It can be present in the background or foreground
  23. and can result from poor scanning or the poor original quality of the data.
  24. Deskew
  25. This may also be referred to as rotation. This means de-skewing the image to bring it in the right format and right
  26. shape. The text should appear horizontal and not tilted in any angle. If the image is skewed to any side, deskew it by
  27. rotating it clockwise or anti clockwise direction.
  28. Layout Analysis (or Zone Analysis)
  29. In order to detect words correctly, it is important to first recognize the zones or the layout (which are also the areas
  30. of interest). This step detects the paragraphs, tables, columns, captions of the images etc. If the software misses out
  31. on any zone or layout, words might be cut in half or not detected at all.
  32. """
  33. import cv2
  34. import numpy as np
  35. from segment.image_operation import utils
  36. # 读取图片,生成预处理的图像
  37. def preprocess(picture, scale, dilate, blur, show=False):
  38. # 预处理图像
  39. img = utils.read_img(picture)
  40. # rescale the image
  41. if scale != 0:
  42. img = cv2.resize(img, None, fx=scale, fy=scale, interpolation=cv2.INTER_CUBIC)
  43. # Convert to gray
  44. img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
  45. # Apply dilation and erosion to remove some noise
  46. if dilate != 0:
  47. kernel = np.ones((dilate, dilate), np.uint8)
  48. img = cv2.dilate(img, kernel, iterations=1)
  49. img = cv2.erode(img, kernel, iterations=1)
  50. # Apply blur to smooth out the edges
  51. if blur != 0:
  52. img = cv2.GaussianBlur(img, (blur, blur), 0)
  53. # Apply threshold to get image with only b&w (binarization)
  54. img = cv2.threshold(img, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
  55. if show:
  56. # cv2.namedWindow('image', cv2.WINDOW_NORMAL)
  57. cv2.imshow('image', img)
  58. cv2.waitKey(0)
  59. cv2.destroyAllWindows()
  60. return img