0

I have been trying to develop a graph structure that will link entities according to co-mentioned features between them, e.g. 2 places are linked if co-mentioned in an article.

I have managed to do so but I have been having problems to iteratively populate an edge with new information keeping the already existing one.

My approach (since I haven't found anything related anywhere) is to append existing information to a list, append the new link in the list and assign that list to the appropriate feature.

    temp = []
    if G.has_edge(i[z],i[j]):
        temp.append(G[i[z]][i[j]]['article'])
        temp.append(url[index])
        G[i[z]][i[j]]['article'] = temp
    else:
        print "Create edge!"
        G.add_edge(i[z],i[j], article=url)
    del temp[:]

As you can see above, as there are many links to be populated, I defined a dedicated list (temp), loaded the old contents of a link's variable called article (if the link does not exist I create a link and add as first value the url that "brought" 2 places together.

My problem is that while I empty the list each time in order to be empty when a new pair comes in when I try to see a link's urls I get something like this:

{'article': [[...], u'http://www.huffingtonpost.co.uk/.../']

It seems like I am keeping only the last link as each time I delete the temporary list's contents but I cannot find a better way to do so without declaring an unnecessary bunch of temp lists.

Any ideas?

Thank you for your time.

Swan87
  • 421
  • 6
  • 23

2 Answers2

1

TL/DR summary: change your entire snippet to

if G.has_edge(i[z],i[j]):
        G[i[z]][i[j]]['article'].append(url[index])
    else:
        G.add_edge(i[z],i[j], article=[url])

Here's what's going on:

When you create the edge the first time you use

G.add_edge(i[z],i[j], article=url)

So it's a string. But later when you do

G[i[z]][i[j]]['article'] = temp

you've defined temp to be a list whose first element is G[i[z]][i[j]]['article']. So G[i[z]][i[j]]['article'] is now a list with two elements, the first of which is the old value for G[i[z]][i[j]]['article'] (a string) and the second of which is the new url (also a string).

Your problem comes at the later steps:

From then on, it's exactly the same thing. G[i[z]][i[j]]['article'] is again a list with two elements, the first of which is its old value (a list) and the second is the new url (a string). So you've got a nested list.

let's trace through with three urls: 'a', 'b', and 'c', and I'll use E to abbreviate G[i[z]][i[j]]. First time through, you get E='a'. Second time through you get E=['a', 'b']. Third time through it gives E=[['a','b'],'c']. So it's always making E[0] to be the former value of E, and E[1] to be the new url.

Two choices:

1) you can handle the creation of temp differently if you've got a string or a list. This is the bad choice.

2)Better: Make it a list the whole time through and then don't even deal with temp. Try creating the edge as (...,article = [url]) and then just use G[i[z]][i[j]]['article'].append(url) instead of defining temp.

So your code would be

if G.has_edge(i[z],i[j]):
        G[i[z]][i[j]]['article'].append(url[index])
    else:
        G.add_edge(i[z],i[j], article=[url])

A separate thing that could also cause you problems is the call

del temp[:]

This should cause behavior different from what I think you're describing. So I think this is a bit different from how it's actually coded. When you set G[i[z]][i[j]] = temp and then do del temp[:], you've made the two lists to be one list with two different names. When you del temp[:] you're also doing it to G[i[z]][i[j]]. Consider the following

temp = []
temp.append(1)
print temp
> [1]    
L = temp
print L
> [1]
del temp[:]
print L
> []
Joel
  • 22,598
  • 6
  • 69
  • 93
  • Thanks for the answer. I am going to try your suggested way now and let you know but my gut says that the first problem I need to solve is the del temp[:]. I thought it was permitted to use a list as a wildcard and add info to the graph links. – Swan87 Apr 17 '15 at 10:36
  • If you use what I've suggested, temp won't even be needed. – Joel Apr 17 '15 at 11:09
0

I think all your previous URLs are in your new list. They are in the [...].

You must use extend instead of append when you get the existing list from the edge.

temp = []
temp.append([1, 2, 3])
temp.append(1)
print(temp)

You will get:

[[1, 2, 3], 4]

But if you do:

temp = []
temp.extend([1, 2, 3])
temp.append(4)
print(temp)

You get:

[1, 2, 3, 4]
TheWalkingCube
  • 2,036
  • 21
  • 26
  • I do not think that is the case since when I try to get all urls of a specific edge I am getting: [[...], u'http://www.huffingtonpost.co.uk/.../'] again. If I use extend when adding a URL as you proposed it will return ['h','t','t','p'...] instead of ['http://www...'] – Swan87 Apr 16 '15 at 14:34
  • What if you do G[i[z]][i[j]]['article'].append(url[index]) instead of creating a temporary list ? – TheWalkingCube Apr 16 '15 at 14:37
  • I tried that just to make sure but as I suspected : AttributeError: 'unicode' object has no attribute 'append'. – Swan87 Apr 16 '15 at 16:16
  • The reason for this `AttributeError` is that the first time through it's a string, while the second time through it's a list. It should be a consistent data type throughout. Make it a list when you first create it. – Joel Apr 17 '15 at 00:16