3

I want to know how to parse the response data(like bellow) of GWT app.

Sample data:

//OK[41,40,0,2,39,38,37,0,36,0,4,1,4,35,19,1,3,34,19,18,1,17,1,-17,33,0,710,1,0,4,0,28,12,11,0,32,31,8,7,30,19,18,1,17,1,2,16,29,0,700,1,0,4,1,28,12,11,0,27,26,8,7,25,19,18,1,17,1,-8,24,0,500,1,150,23,1,22,12,11,0,22,21,8,7,20,19,18,1,17,1,1,16,15,0,410,1,150,14,1,13,12,11,0,10,9,8,7,4,3,1,6,5,0,4,0,0,0,3,2,1,["gov.egov.erule.regs.shared.action.LoadDocumentDetailResult/3665673162","gov.egov.erule.regs.shared.models.DocumentDetailModel/1210760895","java.util.ArrayList/3821976829","","FDA-2010-P-0532","gov.egov.erule.regs.shared.models.DocketType/1323825229","gov.egov.erule.regs.shared.models.MetadataValueModel/1270413309","gov.egov.erule.regs.shared.models.MetadataModel/1441296737","Document

Subtype","doc_sub_type","SUPPORTING & RELATED MATERIALS","1","doc_type","dk_subType_v","Used to further define the type of document","gov.egov.erule.regs.shared.models.MetadataModel$UiControlType/4187881057","com.extjs.gxt.ui.client.data.RpcMap/3441186752","value","java.lang.String/2004016611","CP-Citizen Petition (Supporting & Related Materials)","Status","doc_status","doc_status_v","The current status of the document","Posted","Received Date","receive_date","doc_primary_dates","The date the agency received or created the document","October 04 2010, at 12:00 AM Eastern Daylight Time ","Date Posted","fr_publish_date","Date the document is posted to Regulations.gov","November 10 2010, at 12:00 AM Eastern Standard Time ","pdf","[Ljava.lang.String;/2600011424","FDA","FDA-2010-P-0532-0005","gov.egov.erule.regs.shared.models.DocumentType/2460330259","0900006480b68632","Attachment 4 - \"Information Regarding Cigarettes with Characterizing Flavors Form 3734\" - [BBK Tobacco & Foods, LLP, (Levin Ginsburg Attorneys at Law) - Citizen Petition] "],0,7]

Can you tell me how to deserialize the data?

I want to scrape some information from it, e.g. "Status" (in this case, shoule be "Posted")

Thanks a lot.

Jahan Zinedine
  • 14,616
  • 5
  • 46
  • 70
redice
  • 8,437
  • 9
  • 32
  • 41

1 Answers1

0

Something like

import re
re.search('The current status of the document","(\w+)',your_text).group(1)
>>> 'Posted'

or with json

import json
json.loads('{"a":1,"b":2}')
>>> {u'a': 1, u'b': 2}
user
  • 17,781
  • 20
  • 98
  • 124
  • 1
    It's not a good solution. I also see following response data: "Status","doc_status","PUBLIC SUBMISSIONS","1","doc_status_v","The current status of the document","gov.egov.erule.regs.shared.models.MetadataModel$UiControlType/4187881057","com.extjs.gxt.ui.client.data.RpcMap/3441186752","value","Posted". And i need to scrape many fields from it. – redice Jun 24 '11 at 10:36
  • You need to spell out the problem better. The comment you left to the answer shows a different format. I gave you an example of how to use re module. You will have to suit it to your text pattern – user Jun 24 '11 at 10:38
  • 1
    Sorry, this problem can't be solved by regex. Because the data structure is not regular. I need scrape many information from it, so i want to know how to deserialize it. Thank you all the same. – redice Jun 24 '11 at 10:44
  • 1
    Yes, the data format is JSON. But it is serialized by GWT server side app, i want to know how deserialize it correctly. Here is a article (http://www.gdssecurity.com/l/b/2010/05/06/fuzzing-gwt-rpc-requests/) talking about how to parse the request data, but not referering the response data. – redice Jun 24 '11 at 10:50
  • 1
    Thank you. May you didn't understand my meaning. I always use re and json (or simplejson) module. May i should remove the "Python" tag. In fact this problem has nothing to do with Python. – redice Jun 24 '11 at 11:01
  • I have found a similar question here: http://stackoverflow.com/questions/5712831/gwt-rpc-deserialization-rpc-string/6466103 – redice Jun 24 '11 at 11:04