PDF Curator is a tool designed to convert any PDF file into a structured JSON format. Using OCR, Layout Detection, and Image Captioning techniques, the resulting JSON file captures key document elements, including: Text from individual pages, coordinates of text blocks, chapter headings and chapter lists, coordinates and descriptions of non-text elements , a list of non-text elements, and more...
PDF Curator