Short answer: We can't (at the moment, but it's in the draft for 5.2, ref. comment).
The video element will only decode video and audio streams. It only support <track> for subtitles which forces you to extract the subtitle as a separate file (vtt), or use a manual approach pegging onto currentTime
which open up for the more common srt files, or json etc.
Another option is to burn-in the subtitles to a separate video file and allow the user to toggle between those. You may have to use Media Source Extensions to properly sync the two.
And although it's theoretically possible to manually parse the file dynamically and on the fly, there are numerous of challenges you need to take into account such as buffering, syncing, overhead on bandwidth, performance overhead and so forth. In the end, not worth it IMO.