0

We use Google Spreadsheets to collect research data and allow users to directly enter data into spreadsheets that have been pragmatically generated. This has been working fairly well until a user enters a blank line in between data rows! They may do this for readability or they may have deleted a row, anyway...

Google's documentation is clear on this:

https://developers.google.com/google-apps/spreadsheets/#retrieving_a_list-based_feed

The list feed contains all rows after the first row up to the first blank row.

So the problem is that I have 'harvester' scripts that rip through these spreadsheets, collecting up data for archival / local databasing. These scripts use the ListFeed, so they stop when reach a blank row and miss data!

The documentation suggests:

If expected data isn't appearing in a feed, check the worksheet manually to see whether there's an unexpected blank row in the middle of the data.

Manually! Gasp, I have hundreds of sheets :) Do you have suggestions for mitigating this situation other than yelling at users whenever I see this happen! Thank you

daryl
  • 1,190
  • 10
  • 19

1 Answers1

1

This is the only way that I think we can even get close with the spreadsheet API. This is NOT complete code, it's within a function I wrote but you get the drift... it's in C#:

Working with example:

--row 1 = header row
--row 2 = data
--row 3 = data
--row 4 = totally blank
--row 5 = data
--row 6-100 = totally blank

In English:

  1. Get the worksheet's ListFeed.Entries.Count. ListFeeds ignore header row so in this example count would be "2".

  2. Get the worksheet's CellFeed in order to cycle through the cells. CellFeeds DO include the header row as row 1, so in the example, from the perspective of a CellFeed, the first blank row must be row 4 (header=1, then 2 data rows, then first blank line which terminates the ListFeed set), therefore we should begin looking through cells at row 5 and beyond for any cell that is NOT empty:



    foreach (WorksheetEntry entry in wsFeed.Entries)
    {
    //Get the worksheet CellFeed:
    CellQuery cellQuery = new CellQuery(entry.CellFeedLink);
    CellFeed cellFeed = service.Query(cellQuery);

    //Get the worksheet ListFeed to compare with the CellFeed:
    AtomLink listFeedLink = entry.Links.FindService(
                                         GDataSpreadsheetsNameTable.ListRel, null
                                     );
    ListQuery listQuery = new ListQuery(listFeedLink.HRef.ToString());
    //need to have service object already created for this... see API docs
    ListFeed listFeed = service.Query(listQuery);

     //Now to check if there is data after the ListFeed
     //set which would indicate a blank line in the data set (not allowed)
        foreach (CellEntry cell in cellFeed.Entries)
        {
             //start looking in cells in the row after what would be the first blank row
             if (cell.Row > listFeed.Entries.Count + 2)
             {
                 if (cell.Value != "")
                 {
                     MessageBox.Show("ERROR:  There appears to be a blank row +
                                     in the middle of the data set in worksheet: " +
                                     entry.Title.Text + ".  Completely blank rows " +
                                     "are not allowed in between data rows.  Each row " +
                                     "within the data set must have at least one " +
                                     "value in at least one cell.  There CAN and " +
                                     "should be blank rows after the data set at " +
                                     "the bottom of the worksheet.");
                     return false;
                 }

             }
        }
    }

user2312341
  • 440
  • 4
  • 9