1

I am using Splunk to analyse SFGov open data (data.sfgov.org), which is a Socrata system.

I am able to download the json data and analyse stuff offline. I am now implementing automated indexing of updates to the datasets (everyday).

I am trying to figure out which Socrata API fields to actually use to get the new records since my last poll.

I know I can use the $where URL option to filter again the :created_at and :updated_at parameters, but is there a rowID or last index or something like that? I will maintain local state on the splunk side on the last fetched row for e.g.

Like if the last row I got last night was 18104, then for tonite's check, I will ask for rows posted > 18104.

Thanks in advance! I am using python for the automation.

------ added 11/02/2016 ---

Currently I am manually testing trying this type of GET (tested it using hurl.it)

https://data.sfgov.org/resource/nwsr-z4mh.json?$where=:created_at between '2016-10-23T18:00:00' and '2016-11-03T00:00:00'&$order=:created_at DESC&$select=:*, *

  • this uses the :created_at system field in the Socrata dataset.
  • It is returning the records created between those timestamps. That seems to be working.

So if I were to put this into python I need to simply save the previous fetch date-time and do a 'between and and hope to get the latest created records.

I prefer a way to refer to a ROW#, but I don't know how to use the id":"row-8aiu.d5x4~8rdi" parameter yet.

1 Answers1

2

It looks like you're doing the right thing already. You'd just want to save the latest :created_at or :updated_at and use that in your $where for your following query.

You can't do a $where=:updated_at > :row-... because row IDs are identifiers, not datetimes.

chrismetcalf
  • 1,556
  • 1
  • 8
  • 7
  • Thanks Chris. I also just validated the created_at approach with SFGov, so it's good to see you take the same view. – Subbu Vincent Nov 03 '16 at 20:35
  • Also one question @chrismetcalf: Is there a definition of row IDs somewhere in socrata docs? I understand I cannot match them against dates, but it will help to know if rowIDs are a random string or if there is a structure to them that can used for filtering criteria. – Subbu Vincent Nov 03 '16 at 20:42
  • For e.g. I am able to extract a row alone by using the row ID in this URL [link] (https://data.sfgov.org/resource/nwsr-z4mh/row-8aiu.d5x4~8rdi.json) but I want understand its literal structure better if there is one at all. – Subbu Vincent Nov 03 '16 at 20:46
  • Row IDs are randomly generated strings, so they don't denote any row ordering. – chrismetcalf Nov 04 '16 at 01:37
  • Got this @chrismetcalf. – Subbu Vincent Nov 09 '16 at 18:36
  • I've been testing with :created_at. https://data.sfgov.org/resource/nwsr-z4mh.json?$where=:created_at%20%3E%20%272016-12-04T18:00:00%27&$order=date%20DESC&$select=:*,* I am seeing an odd response. For a 2016-12-04 created_at, I am seeing records with dates of 2010! Is this normal with Socrata data coming out of SFGov? {":created_at":"2016-12-06T18:47:14.349Z",":id":"row-g7nk.fxxr.d9ww",":updated_at":"2016-12-06T18:47:14.349Z",":version":"rv-nvbt~vu6z.ydz6","date":"2010-04-09T00:00:00.000","desiredoutcome":"Discuss the Mayor’s initiatives in the Mid-Market St. area and – Subbu Vincent Dec 06 '16 at 23:41