yea you're probably right. i skimmed some tesseract/ocrmypdf docs earlier and didn't see anything too hopeful. if i was more ML pilled i would take up the mantle myself. what i'm envisioning rn is it would take image of a page, recognize footnotes, and crop the image, and OCR would come in after