0

I have a list of word forms produced from a text. This list includes proper names (e.g. John, Mary, Edinburgh). In another field I have a list of proper names. I want to get a list of all word forms without the proper names.

I actually need allWordForms MINUS properNames

Arrays may be used like sets. But we only have the set operations Union and Intersect.

The script so far

on mouseUp
   put field "allWordForms" into tWordForms
   split tWordForms by return
   -- tWordForms is now an array

   put field "properNames" into tProperNames
   split tProperNames by return
   -- tProperNames is now an array

   -- .....
   repeat  
   -- .....
   -- .....
   end repeat

   combine tWordForms using return

   put tWordForms into field "wordFormsWithoutProperNames"

end mouseUp

How does the repeat loop look like?

And here is an example.

The field "allWordForms" contains

Mary
and
John
came
yesterday
They
want
to
move
from
Maryland
to
Petersbourough

`

The field "properNames" contains

John
Mary
Maryland
Peter
Petersbourough

The desired result is to have a copy of the list allWordForms with the proper names removed.

and
came
yesterday
They
want
to
move
from
to
PeeHaa
  • 71,436
  • 58
  • 190
  • 262
z--
  • 2,186
  • 17
  • 33

2 Answers2

1

Here's a possible solution;

on mouseUp
   put field "allWordForms" into tWordForms
   put field "properNames" into tProperNames

   # replace proper names
   repeat for each line tProperName in tProperNames
      replace tProperName with empty in tWordForms
   end repeat

   # remove blank lines
   replace LF & LF with LF in tWordForms   

   put tWordForms into field "wordFormsWithoutProperNames"
end mouseUp

another solution taking your extra info into account;

on mouseUp
   put field "allWordForms" & LF into tWordForms
   put field "properNames" into tProperNames

   repeat for each line tProperName in tProperNames
      replace tProperName & LF with empty in tWordForms
   end repeat

   put tWordForms into field "wordFormsWithoutProperNames"
end mouseUp
splash21
  • 799
  • 4
  • 10
  • I added an example to the question. Your solution gives as result the following strings (including an empty one at the beginning) `and came yesterday They want to move from land to sbourough`. The words `Maryland` and `Petersborough` are chopped which is not the intended result. I want to make sure that it works with wordforms as a whole. This is why I was asking for the repeat loop and suggested to go for arrays. Arrays support the set operations `Union` and `Intersect`. I am looking for the livecode syntax to implement `tAllWordForms minus tProperNames`. – z-- May 05 '13 at 13:25
  • There's no built in array function to achieve your goal, but the second example works as required with your data. – splash21 May 05 '13 at 17:35
  • The second solution works fine. I have tested it on Linux. As it uses LF. Does it work on Windows as well? – z-- May 06 '13 at 04:26
0

You can use the filter container without pattern function:

put field "allWordsForms" into tResult
repeat for each line aLine in field "ProperNames"
   filter tResult without aLine
end repeat
put tResult into field "wordFormsWithoutProperNames"
hliljegren
  • 426
  • 3
  • 9