0

I am exploring a data structure which get expands to sub-elements and resolves to a final element. But I only want to store top two levels.

Example: Lets say I start with New York which breaks into Bronx, Kings, New York, Queens, and Richmond as counties but then finally somehow they resolve to USA.

I am not sure if this is a good example but just to make it clear here is more clear explanation of the problem.

A (expands to) B,C,D -> B (expands to) K,L,M -> K resolves to Z 

I initially wrote it in series of for loops and then use the recursion but in recursion I am loosing some of the elements that get expand and due to that I don't drill down each of the expanded element. I have put the both recursive version and non-recursive. I am looking for some advise on building this data structure, and what is the best way to do.

I call a data base query for every element in the expanded version which returns a list of items. Go until it resolves to single element. With out recursion I don't loose drilling all the way till the final element that others resolve to. But with recursion its not the same. I am also new to python so hopefully this is not a bad question to ask in a site like this.

returnCategoryQuery is a method that returns list of items by calling the database query.

With out recursion

#Dictionary to save initial category with the rest of cl_to
baseCategoryTree = {};
#categoryResults = [];

# query get all the categories a category is linked to
categoryQuery = "select cl_to from categorylinks cl left join page p on cl.cl_from = p.page_id where p.page_namespace=14 and p.page_title ='";
cursor = db.cursor(cursors.SSDictCursor);

    for key, value in idTitleDictionary.iteritems():
        for startCategory in value[0]:
            #print startCategory + "End of Query";
            categoryResults = [];
            try:
                categoryRow = "";
                baseCategoryTree[startCategory] = [];
                print categoryQuery + startCategory + "'";
                cursor.execute(categoryQuery + startCategory + "'");
                done = False;
                while not done:
                    categoryRow = cursor.fetchone();
                    if not categoryRow:
                        done = True;
                        continue;
                    categoryResults.append(categoryRow['cl_to']);
                for subCategoryResult in categoryResults:
                    print startCategory.encode('ascii') + " - " +  subCategoryResult;
                    for item in returnCategoryQuery(categoryQuery + subCategoryResult + "'"):
                        print startCategory.encode('ascii') + " - " + subCategoryResult + " - "  + item;
                        for subItem in returnCategoryQuery(categoryQuery + item + "'"):
                            print startCategory.encode('ascii') + " - " + subCategoryResult + " - "  + item + " - " + subItem;
                            for subOfSubItem in returnCategoryQuery(categoryQuery + subItem + "'"):
                                 print startCategory.encode('ascii') + " - " + subCategoryResult + " - "  + item + " - " + subItem + " - " + subOfSubItem;
                                 for sub_1_subOfSubItem in returnCategoryQuery(categoryQuery + subOfSubItem + "'"):
                                      print startCategory.encode('ascii') + " - " + subCategoryResult + " - "  + item + " - " + subItem + " - " + subOfSubItem + " - " + sub_1_subOfSubItem;
                                      for sub_2_subOfSubItem in returnCategoryQuery(categoryQuery + sub_1_subOfSubItem + "'"):
                                          print startCategory.encode('ascii') + " - " + subCategoryResult + " - "  + item + " - " + subItem + " - " + subOfSubItem + " - " + sub_1_subOfSubItem + " - " + sub_2_subOfSubItem;
            except Exception, e:
                traceback.print_exc();

With Recursion

def crawlSubCategory(subCategoryList):
    level = 1;
    expandedList = [];
    for eachCategory in subCategoryList:
        level = level + 1
        print "Level  " + str(level) + " " + eachCategory;
        #crawlSubCategory(returnCategoryQuery(categoryQuery + eachCategory + "'"));
        for subOfEachCategory in returnCategoryQuery(categoryQuery + eachCategory + "'"):
            level = level + 1
            print "Level  " + str(level) + " " + subOfEachCategory;
            expandedList.append(crawlSubCategory(returnCategoryQuery(categoryQuery + subOfEachCategory + "'")));
    return expandedList;


#Dictionary to save initial category with the rest of cl_to
baseCategoryTree = {};
#categoryResults = [];

# query get all the categories a category is linked to
categoryQuery = "select cl_to from categorylinks cl left join page p on cl.cl_from = p.page_id where p.page_namespace=14 and p.page_title ='";
cursor = db.cursor(cursors.SSDictCursor);

for key, value in idTitleDictionary.iteritems():
    for startCategory in value[0]:
        #print startCategory + "End of Query";
        categoryResults = [];
        try:
            categoryRow = "";
            baseCategoryTree[startCategory] = [];
            print categoryQuery + startCategory + "'";
            cursor.execute(categoryQuery + startCategory + "'");
            done = False;
            while not done:
                categoryRow = cursor.fetchone();
                if not categoryRow:
                    done = True;
                    continue;
                categoryResults.append(categoryRow['cl_to']);
            #crawlSubCategory(categoryResults);
        except Exception, e:
            traceback.print_exc();
        #baseCategoryTree[startCategory].append(categoryResults);
        baseCategoryTree[startCategory].append(crawlSubCategory(categoryResults));
add-semi-colons
  • 18,094
  • 55
  • 145
  • 232
  • @agf I was wondering what you edit..? – add-semi-colons Nov 07 '11 at 02:33
  • This is an odd question. What does "resolve to mean". For instance, the USA contains New York, and New York contains Bronx, Queens, and Richmond. But that is three levels. – Michael Dillon Nov 07 '11 at 04:03
  • @MichaelDillon sorry about that, what I meant by resolve to is that, its the root node. I am starting at the bottom of the tree. – add-semi-colons Nov 07 '11 at 04:07
  • @agf interesting thanks, i do research in online communities and you just game me another idea to explore.. – add-semi-colons Nov 07 '11 at 05:17
  • @agf A user has given a answer but it completely irrelevant, I was trying to put the question as bounty but I cant since there is an answer which has no meaning to this question. I was wondering if you can remove the answer since you are a Expert. Also based on what I have seen once there is an answer experts tend to not answer that question. So I have a very good chance of not getting a good answer Thanks. – add-semi-colons Nov 08 '11 at 16:53
  • Jut ignore the answer and wait until two days has passed since you asked the question; then you can set a bounty. However, unless you add sample data (rip out the SQL and just work from an example categoryResults list?) so people can actually test your code, you're not likely to get a good answer either way. – agf Nov 08 '11 at 19:08

1 Answers1

0

Are you trying to lookup "Queens" and learn that it is in the USA? Have you tried encoding your tree in XML, and using lxml.etree to find an element and then use getpath to return the path in XPath format?

This would meaning adding a fourth top level to your tree, namely World, and then you would search for Queens and learn that the path to Queens is World/USA/NewYork/Queens. The answer to your question would always be the second item in the XPath.

Of course you could always just build a tree from the XML and use a tree search algorithm.

Michael Dillon
  • 31,973
  • 6
  • 70
  • 106
  • I actually find out the tree as I traverse, to give you better idea, if you go to this link, at the bottom of the page click the first category. http://en.wikipedia.org/wiki/New_York_City what I am trying to do is drill each section until i get to a place there aren't any more to explore. – add-semi-colons Nov 07 '11 at 04:16
  • can I still put this question for bounty. I am not sure if I can do that since you have provided an answers. If that the case we we take the is answer off. Since its not really the answer. Sorry about it. May be I was not clear in the question, and thats why may be you gave this answer. – add-semi-colons Nov 07 '11 at 19:47
  • Hi Michael, Its a kind request. I was wondering if you can remove the answer so I can set a bounty on this question. The answer you provided doesn't really match the criteria of the question. But thanks again for trying to solve this issue. – add-semi-colons Nov 08 '11 at 17:08
  • Sorry but I don't understand why you want me to delete my answer. If SO is not allowing you to set a bounty, it is because you don't have enough points yet to offer a bounty. You would do better by simplifying your question, and especially your code, and then asking it again, but this time write a better question title and a clearer explanation. http://stackoverflow.com/privileges/set-bounties When did your score go past 75? – Michael Dillon Nov 09 '11 at 01:58