-1

so this is an interesting problem I am faced with. I have an array that I receive from a ASR server in a format like this [{start: 0, end: 0.4, word: "Hello"}, {start: 0.4, end: 0.6, word: "my"}, {start: 0.6, end: 1, word: "name"}]. The array is then mapped to an array of strings containing only the word e.g. ["Hello", "my", "name"] finally that array is turned into a string using .join(" ") resulting in a string like this "Hello my name".

The issue is trying to edit these values and maintain the original array's format. I am using a React Native TextInput to allow editing of the string value.

Let's say a user would like to change

"Hello my name"

to

"Hello my name is"

or

"Bellow my name"

In the first case the array would need to be adjusted to [{start: 0, end: 0.4, word: "Hello"}, {start: 0.4, end: 0.6, word: "my"}, {start: 0.6, end: 1, word: "name is"}] and in the second instance the array would have to be adjusted to [{start: 0, end: 0.4, word: "Bellow"}, {start: 0.4, end: 0.6, word: "my"}, {start: 0.6, end: 1, word: "name"}].

I am at a total loss as to how I should approach this. I have been at it for about two weeks now and still no idea. My latest attempt was to map through the original array comparing each item's word value to the edited string's word at the same index. If they matched, I would add them to the final array, if not I would check if the value in the edited string matches one of the changes that I got from the .difference() function in Lodash.js. I might be explaining it poorly so have a look at the code.

  const [value, setValue] = useState('');

  const findChanges = () => {

    let finalWords = [];
    // Each of the elements in the currentWords array has three properties: start, end, word
    let currentWords = data.words;
    if (!Array.isArray(currentWords)) currentWords = JSON.parse(data.words);
    // Extract only word strings from currentWords
    let currentArr = currentWords.map(x => x.word);
    let newArr = value.split(' ');
    let currentChangeEndsAt = undefined;

    let diff = difference(newArr, currentArr);

    const addToFinal = (text: string, startIndex: number, endIndex: number) => {
      if (startIndex === endIndex) {
        return finalWords.push({ ...currentWords[startIndex], word: text });
      }
      let start = currentWords[startIndex].start;
      let end = currentWords[endIndex].end;
      return finalWords.push({ start, end, word: text });
    };

    currentArr.forEach((word, index) => {
      // If index is smaller then we have already handled this word, skip
      if (currentChangeEndsAt > index) {
        return;
      }
      // If currentArr item at index matches newArr item at same index, add word to final
      if (word === newArr[index]) {
        return addToFinal(word, index, index);
      }
      // If currentArr item at index is one of the the differences
      if (diff.indexOf(word) !== -1) {
        // Determine recursively at which index the change ends
        const changeEndsAt = (searchIndex): number => {
          // If next word in currentArr matches the word in newArr at searchIndex return searchIndex
          // Else add 1 to searchIndex and recursively call the func again
          if (currentArr[index + 1] === newArr[searchIndex]) {
            return searchIndex;
          } else if (searchIndex === newArr.length - 1) {
            return searchIndex;
          } else {
            return changeEndsAt(searchIndex + 1);
          }
        };
        // Get substring of the change
        const endsAt = changeEndsAt(index + 1);
        const changeString = newArr.slice(index, endsAt).join(' ');
        const endIndex =
          endsAt > currentWords.length - 1 ? currentWords.length - 1 : endsAt;
        return addToFinal(changeString, index, endIndex);
      }
    });
    return finalWords;
  };

As you can see it is quite a mess and you can probably tell that it won't work for a few reasons. One of the reasons being that if a word was added to the beginning of the string function would just break down. I have thought of maybe implementing an onChangeText handler and maybe setting the current cursor position of the TextInput in state using the onSelectionChange callback but I haven't been able to figure out how I would go from there. If anyone has any pointers on how I might approach this or even where to start that would help tremendously. I apologise for the longwinded question and vague title, I am open to suggestions on improving both. If you have a better idea for either please comment it below. Thanks in advance.

He1nr1ch
  • 11
  • 3
  • What output would you expect if the sentence is modified to simply "Hello"? – Gershom Maes Jun 28 '21 at 19:58
  • Hi @Gershy, that is also where it gets complicated. The expected output would then be `[{start: 0, end: 1, word: "Hello"}]`. As the recording is of fixed length the start and end times will have to be adjusted to accommodate the text. – He1nr1ch Jun 29 '21 at 06:37

1 Answers1

1

If I understand correctly you can simply split the string into components; the 1st component is combined with { start: 0, end: 0.4 }, the 2nd component is combined with { start: 0.4, end: 0.6 }, and all components beyond the second are joined by spaces and combined with { start: 0.6, end: 1 }.

let inp = document.querySelector('input');
let code = document.querySelector('code');

let fn = () => {
  
  let input = inp.value;
  let [ cmp1='', cmp2='', ...cmps ] = input.split(' ');
  let output = [
    { start: 0, end: 0.4, word: cmp1 },
    { start: 0.4, end: 0.6, word: cmp2 },
    { start: 0.6, end: 1, word: cmps.join(' ') }
  ];
  code.textContent = ''
    + '[\n'
    + output.map(item => '  ' + JSON.stringify(item)).join(',\n')
    + '\n]'
  
};
inp.addEventListener('input', fn);
fn();
code { white-space: pre; }
<p>Edit this text:</p>
<input value="Hello my name"/><br/>
<code></code>

I'm not sure I fully understood your question. If this wasn't what you wanted give an example of an input which produced an invalid output, and what the correct output in that scenario should instead be.

Gershom Maes
  • 7,358
  • 2
  • 35
  • 55
  • this approach is interesting but not robust (yet). Seeing as the ASR server could completely misunderstand the array could be of an incorrect size compared to the user's expected output. Ultimately I need a way to figure out which words the user edited so that I can align their changes with the output timing values. So in a case like that for argument's sake let's say the expected output was `"He yelled something so lame"` (I know it doesn't phonetically make sense) the edit would be `[{start: 0, end: 0.4, word: "He yelled"}, {... word: "something"}, {... word: "lame"}]` – He1nr1ch Jun 29 '21 at 06:45
  • the user could edit at any index and the edit could be of any length so where the original array contained only a single word at index x the user could add 4 more words and the function would have to be able to add this edit only to the applicable array item. Hope that makes sense – He1nr1ch Jun 29 '21 at 06:48
  • Hmmm, I don't think what you're looking for is well-defined. If the user edits "Hello my name" to "Hello my nice name", did "my" become "my nice", or did "name" become "nice name"? The answer is, we can never know; it isn't well-defined (AI could take a guess, but it would never be more than a guess). Why not change the ui the user uses so that there are 3 input boxes inline with each other? That way the user can indicate which term they're changing. – Gershom Maes Jun 29 '21 at 14:06
  • See that is my problem, from a UX perspective that is impossible. – He1nr1ch Jun 29 '21 at 16:20
  • Well it unfortunately sounds like your problem has no solution :) Can you explain more about your overall app? What is the purpose of this data with `start` and `end` properties? Perhaps at a higher level things can be consolidated better. – Gershom Maes Jun 29 '21 at 17:13