0

My objects structure in a current program is organized so that a Doc object contains a list of Mention objects, and each Mention object contains a list of Word objects. Words are identified by their position in the Doc's text and also store some other information (its text, its wordnet sense...)

In the processing of the program (through user interaction, etc..) Word objects inside a Mention can be accessed and modified value (for example, update its sense). User interaction with each Mention is a requirement.

The problem I met here is that several Mentions that belong to the same Doc might share some same Words (after all, all words are in the Doc). So when such a Word is updated, how should I update the corresponding Word contained in other Mentions ? In other words, these Words are in the same exact location in the text and should be updated together, but they are stored separately in Mentions. So how should one update change the others ?

One approach I used is when a Word inside a Mention is modified, I retrieve all mentions (from a stored Doc reference) and then update the corresponding Word in any Mention that contains it. This requires a for loop with Equals checks on each update, which is quite a lot processing.

The second approach I think of is not to store separate Word lists in Mentions. Only a single list of Words is store in Doc, and in each Mention the indices of which Words belong to the Mention is store in a list. So when updating a Word, I will call an update function from the Doc's reference to update the Doc's list. However, the problem lies in the function that returns the whole list of Words for a Mention. I have to return a new list of Words, using the indices I have to pick actual Words inside the Doc's list. This is needed because all Words inside that Mention may have been modified by some other Mention(s) just shortly before. Alternatively, I can check if a Word is updated and copy the update. But it still requires a for loop through all Words in the Mention, so it still seems weird (Each time retrieving the list = long operation)

What I want to ask is if there is any better solution to this update problem. Any help is much appreciated :) If it is necessary, I will add part of my code here.

Rohith
  • 226
  • 5
  • 9
ramcrys
  • 643
  • 4
  • 14
  • Why do you have multiple instances of a `Word` object for the same physical word in the text? You should only have one that is shared among all `Mention` instances that need to use it. – Daniel Hilgarth Sep 17 '13 at 11:13
  • Yeah, what you refer to is implemented in my second approach. It sill has the problem when I want to retrieve a "possibly" updated version of words list for a mention. – ramcrys Sep 17 '13 at 11:15
  • No, it is not the same. My point is that each `Mention` has a list of `Word`s - and not their indices - and can simply update that `Word` instance. Since other `Mention`s that contain the same word have the exact same `Word` instance, they will always automatically operate on the latest version. No manual updating is required. – Daniel Hilgarth Sep 17 '13 at 11:17
  • How is that possible ? It seems you are misunderstanding something here. For example: firstly Word w1 = (2,3,"age",1), w2 = (2,3,"age",1) are supposedly the same word in the Doc. Then when you update w1 : w1 = (2,3,"age",3) Then how can w2 is updated ? It belongs to another list, in another Mention object, and its fourth member is still 1. The problem here is that each Mention store a list of words separately. Initially they are the same, but what if they are changed ? – ramcrys Sep 17 '13 at 11:19
  • Do you want the fourth member of `w2` to stay 1, even after the update? Or should it be updated to 3? – Daniel Hilgarth Sep 17 '13 at 11:46
  • Of course I want it to update to 3. That's the whole point of my two attempts mentioned in my first post :) I think it can't be automatically done. Something similar to Object o1 = o2 = someObject; Then we change o1 and o2 is not affected. But then again, when applying to lists, things might take too much time so I feel it is a bit awkward and there should be some other ways to optimize ? – ramcrys Sep 17 '13 at 11:47
  • Please see my answer. As I said, simply share the `Word` instances. – Daniel Hilgarth Sep 17 '13 at 11:51

1 Answers1

2

As I already said in my comments, don't create multiple Word instances for the same word in the document. So, regarding your comment: There never would be a w1 and w2 for a single physical word in the document. There only would be w.

Example:

var w = new Word(2, 3, "age", 1)

var mention1 = new Mention(w);
var mention2 = new Mention(w);

mention1.UpdateWord(); // sets the fourth property of w to 3

mention2.PrintWord(); // prints (2, 3, "age", 3)

This works, because both Mention instances work on the same Word instance.

Daniel Hilgarth
  • 171,043
  • 40
  • 335
  • 443
  • But the problem is I receive my input from another program. In particular, the Mention and their according words are stored in a text file: Mention1 w1_2_3_"age" w2_8_5_after"... sth like that. I have to parse these text to create the Mentions. So after your answer I think I have to have an intial phase of find all words first. Only then could I create each Mention using the specific words for each Mention. I think I really got a bit confused earlier :( – ramcrys Sep 17 '13 at 12:01
  • @ramcrys You can use a dictionary to store the `Word` instance for a word from that and use that stored instance at the next occurrence of the same word in the file. If you don't understand that, please paste such a file at http://pastebin.com so I can show you some code that actually works for your case. – Daniel Hilgarth Sep 17 '13 at 12:05
  • 1
    Sorry, I was really confused when doing it earlier. Now that it is pointed out by you, I can fix it. Thanks for your help a lot. – ramcrys Sep 17 '13 at 12:16
  • @ramcrys: Nice. Good luck doing that :-) – Daniel Hilgarth Sep 17 '13 at 12:23