I'm trying to get the boundingRect of each word in a HTML.
For example,
<html><body>Lorem ipsum dolor</body></html>
I want the output as [x, y, width, height] - word
[ 8, 8, 44.671875, 19 ] - Lorem
[ 56.5, 8, 43.125, 19 ] - ipsum
[ 103.4, 8, 35.02, 19 ] - dolor
I'm using Chrome DevTools Protocol (CDP) to get the DOMSnapshot which gives the bounding rect for a line as a whole and not for individual words. (my-source-code)
[ 8, 8, 130.46875, 19 ] Lorem ipsum dolor
If I wrap every word in the HTML with a span tag, Chromium provides the desired result. But this solution seems hacky. Is there a better way to do this?
Note:
- The text content can have styles and fonts associated with it. So precomputed width for each character is not an option.
- I can rasterize the page to a PDF using CDP and get word iterator with Foxit or similar libraries. But I'd prefer to do things completely with NodeJS.