Can we store metadata on every words and let user modify it and still keep it?

Question

I’m new to lexcal and I’m developing annotation tool for speech-to-text.

I have used draftjs. I use entity and decorator to store and manage metadata on each words. every single words are entities and each entity has timestamp as metadata.

example: I have a cat

" I ", " have ", " a ", " cat " are entities.

entity data of cat is like

offset: 12,
length: 5,
data: {
 original_word: 'cat',
 start: 4.2,
 end: 4,6,
}

So when user select a word and modify the word cat into kitten, I can still get start and end timestamps.

Do you think I can develop similar thing with lexical?

It’s okay for metadata to break when user delete multiple words at a time… No need to be perfect because I need to migrate to some editor library from draftjs anyway sooner or later…

Thanks in advance.

zurfyx · Answer 1 · 2022-05-18T09:07:41.047

Approaches:

Monitor Lexical updates
An (entity) node that extends LexicalTextNode
A wrapper node on top of LexicalTextNode (like Comments)

Monitor Lexical updates

If I understand your problem correctly, that's the preferred approach.

editor.registerUpdateListener(({editorState, previousEditorState, dirtyLeaves, dirtyNodes}) => {
  // You can diff editor state and each of the individual nodes that 
  // were modified (dirtyLeaves and dirtyNodes)
});

Note that a TextNode may contain multiple words.

Tradeoffs:
(+) It's neat. It works well with the existing Lexical nodes and doesn't modify the rendered DOM.
(-) Diffing nodes can be complicated.

An (entity) node that extends LexicalTextNode

Just like MentionNode you can have an Entity node that has your own data.

class MarkedNode extends TextNode {
  __timestamps: {...};
  setTimestamps() { this.getWritable().__timestamps = ... }
  getTimestamps() { return this.__timestamps; }
}

Tradeoffs:
(+) It's easy to replace each word with a MarkedNode.
(-) Replacing TextNodes with your own custom nodes kills all the optimizations around TextNodes that are currently part of the reconciler (merging adjacent text nodes). Having multiple individual spans can also cause accessibility issues or other external plugins like Grammarly to misbehave and it can also drop the efficacy of other plugins that rely on TextNodes.

A wrapper node on top of LexicalTextNode (like Comments)

Just like the previous approach but instead of replacing TextNode, you create your own wrapper (Element) node on top. Given that diffing based on characters can be complicated and slow that is the approach we went for for the CommentPlugin.

Tradeoffs:
(+) It's easy to add custom wrapper nodes.
(-) It can still impact the overall app performance negatively. Overall, it's better than the previous approach because you don't override TextNode but shares most of the downsides.

What about TextNode marks?

First of all, a TextNode can contain multiple words. A mark on a TextNode can potentially refer to 1+ words.

Besides that, allowing arbitrary metadata on TextNodes means that it's hard to optimize later. For example, when merging nodes we look at the properties they have and only merge them whenever they share all properties (mode, details, style etc.). To preserve such optimization we would have to establish rules to determine whether two TextNodes nodes can indeed be merged.

Thanks so much. I'll definitely experiment these. – Ko Ohhashi May 20 '22 at 04:18 — Ko Ohhashi, May 20 '22 at 04:18

Can we store metadata on every words and let user modify it and still keep it?

1 Answers1

Monitor Lexical updates

An (entity) node that extends LexicalTextNode

A wrapper node on top of LexicalTextNode (like Comments)

What about TextNode marks?