no-time99 d931143ae4 get word 4 years ago
..
Readme.md d931143ae4 get word 4 years ago
config.py d931143ae4 get word 4 years ago
correct.py d931143ae4 get word 4 years ago
filepath2text.py d931143ae4 get word 4 years ago
img2text.py d931143ae4 get word 4 years ago
pdf2text.py d931143ae4 get word 4 years ago
ppt2text.py d931143ae4 get word 4 years ago
requirements.txt d931143ae4 get word 4 years ago
server.py d931143ae4 get word 4 years ago
toTxt.py d931143ae4 get word 4 years ago
utils.py d931143ae4 get word 4 years ago
word2html.py d931143ae4 get word 4 years ago
word2text.py d931143ae4 get word 4 years ago

Readme.md

英语word格式试卷内容获取

该程序主要是为了获取word格式内容,由word生成html文件,并清洗html返回文本。

Requirements

  • python3.6
  • office2010+
  • word_bin
  • mathtype
  • bottle
  • requests
  • beautifulsoup4

Project Structure

OCR
|   server.py	#服务启动程序
|   filepath2text #route_filename函数为获取word内容的入口,转为html文件,清洗html格式,

Run

线上服务
python server.py