For an internal speech recording tool of the Digital Education group we needed to know which word of a text was uttered when. For the purpose of time-stamping the words we built a tool that would first recognize the text on the page (using OCR) and then follow the finger of the speaker as they move it line by line while simultaneously reading the text out loud.
Debug view