7

I am writing a C# program that uses Microsoft Scientific Data-Set to read NetCDF files.

using System;
using System.IO;
using sds = Microsoft.Research.Science.Data;
using Microsoft.Research.Science.Data.Imperative;


namespace NetCDFConsoleApp
{
    class Program
    {
        static void Main(string[] args)
        {
            // Gets dataset from file.
            var dataset = sds.DataSet.Open("E:\\Temp\\test.nc?openMode=readOnly");

            // Get the starting DateTime from the meta data.                        
            string dt = (string)dataset.Metadata["START_DATE"];

            //load dataset into array
            Single[,,] dataValues = dataset.GetData<float[,,]>("ACPR"); 

            //Get DateTime from Metadata fields.
            DateTime dt2 = DateTime.ParseExact(dt, "yyyy-MM-dd_HH:mm:ss", null);

            // Latitude grid ranges from = 0 to 215; East Cape is ~ 125-144
            for (int iLatitude = 137; iLatitude < 138; iLatitude++)
            {
                //Longitude ranges from 0 to 165; East Cape is ~ 125-150
                for (int iLongitude = 133; iLongitude < 134; iLongitude++) 
                {
                    //There is normally 85 hours worth of data in a file. But not always... 
                    for (int iTime = 0; iTime < 65; iTime++)
                    {
                        // Get each data point 
                        float? thisValue = dataValues[iTime,iLatitude,iLongitude]; 

                        //Burp it out to the Console. Increment the datetime while im at it. 
                        Console.WriteLine(dt.ToString() + ',' + dt2.ToString() + ',' + iTime.ToString() + ',' + dt2.AddHours(iTime) );
                    }                 
                }
            }

            Console.ReadLine();          

        }           
    }
} 

The files contain predicted rainfall data over a map grid (X,Y). Each grid reference should have 85 hours worth of data.

E:\temp>sds list test.nc
[2] ACPR of type Single (Time:85) (south_north:213) (west_east:165)
[1] Times of type SByte (Time:85) (DateStrLen:19)

But occasionally they might have less (Say 60-70 hours). When that happens my C# programs fails when importing the data.

var dataset = sds.DataSet.Open("test.nc?openMode=readOnly");
Single[,,] dataValues = dataset.GetData<Single[,,]>("ACPR");

I can reproduce the error with the command line.

Here I can successfully extract hours 60-65 for Grid XY: 125,130. The last Value i have in this file is Time=69.

E:\temp>sds data test.nc ACPR[60:65,125:125,130:130]
[2] ACPR of type Single (Time:85) (south_north:213) (west_east:165)
                Name = ACPR
         description = ACCUMULATED TOTAL GRID SCALE PRECIPITATION
         MemoryOrder = XY
         coordinates = XLONG XLAT XTIME
             stagger =
           FieldType = 104
               units = mm

[60,125,130]  13.4926
[61,125,130] 15.24556
[62,125,130]  16.3638
[63,125,130] 17.39618
[64,125,130] 20.00507
[65,125,130] 23.57192

If I try and read past hour 69 I get the following error.

E:\temp>sds data test.nc ACPR[60:70,125:125,130:130]
[2] ACPR of type Single (Time:85) (south_north:213) (west_east:165)
                Name = ACPR
         description = ACCUMULATED TOTAL GRID SCALE PRECIPITATION
         MemoryOrder = XY
         coordinates = XLONG XLAT XTIME
             stagger =
           FieldType = 104
               units = mm

Unhandled Exception: System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
   at nc_get_vara_float(Int32 , Int32 , UInt64* , UInt64* , Single* )
   at NetCDFInterop.NetCDF.nc_get_vara_float(Int32 ncid, Int32 varid, IntPtr[] start, IntPtr[] count, Single[] data)
   at Microsoft.Research.Science.Data.NetCDF4.NetCdfVariable`1.ReadData(Int32[] origin, Int32[] shape)
   at sdsutil.Program.PrintData(Variable v, String range, String format)
   at sdsutil.Program.DoData(String uri, String[] args)
   at sdsutil.Program.Main(String[] args)

E:\temp>

If the file contains the full 85 hours I can request Time 0-100 and it still gives me the 85 values without error.

I am convinced that that issue is NULL/missing data. Is there some way I can specify when importing the data where the variable is not null? or use some of sort try/catch?

Single[,,] dataValues = dataset.GetData<Single[,,]>("ACPR")>> where it's not blank thanks. ;

Edit: I am beginning to suspect that the file isn't formed correctly. Using the SDS viewer The meta data for a good file vs a bad look like this;

Good file

Bad file

Yet the command line shows the meta data as being the same for both.

E:\temp>sds good.nc
[2] ACPR of type Single (Time:85) (south_north:213) (west_east:165)
[1] Times of type SByte (Time:85) (DateStrLen:19)

E:\temp>sds bad.nc
[2] ACPR of type Single (Time:85) (south_north:213) (west_east:165)
[1] Times of type SByte (Time:85) (DateStrLen:19)

E:\temp>
Sir Swears-a-lot
  • 402
  • 7
  • 20
  • I don't know if I can use generic C# functions to handle NULLS or if I need to use something specifically from SDS. – Sir Swears-a-lot Jan 03 '18 at 20:28
  • From the doc: "When you call Scientific DataSet methods in strongly - typed languages such as C#, the Scientific DataSet library does not coerce data types. The data type in a dataset and the type of data that you specify as a type parameter to the GetData method must match exactly. ". `Single[,,] dataValues = dataset.GetData("ACPR"); ` is this Single, float, double etc? what does `var dataValues = dataset.GetData("ACPR");` when you try different types? Do you have the dataset viewer installed and if so can you add `dataset.View();` – Mark Schultheiss Jan 10 '18 at 17:18
  • Note my "From the doc" comment makes assumption on the library in use here, a link to the source of that may be useful. – Mark Schultheiss Jan 10 '18 at 17:31
  • I have SDS 1.3 installed which includes the viewer. Sorry what do you mean by: " can you add dataset.View();" should I add that as a line in the code after var dataset =... ? Re: datatypes. according to the meta data both in the viewer and via command line the datatype the ACPR variable is Single. – Sir Swears-a-lot Jan 10 '18 at 20:09
  • I've updated my post. I am beginning to suspect the files aren't formed correctly. Even though I can extract data from them, I think SDS is getting grumpy because the meta data doesn't match the file contents. – Sir Swears-a-lot Jan 10 '18 at 20:22
  • I tried adding dataset.View(); on line 16 and got a warning saying: 'DataSet' does not contain a definition for 'View' and no extension method 'View' accepting... etc etc – Sir Swears-a-lot Jan 10 '18 at 20:30
  • https://sds.codeplex.com/wikipage?title=DataSet%20Viewer&referringTitle=Documentation has a viewer. I saw that `.View` in some sample code, no idea if it is in the version you have. Near the end of here: https://sds.codeplex.com/SourceControl/latest#Main/src/Core/Core/DataSet.cs is `IsSupported` which is where I base my "type" comments on, I note that "float" is not in the method. Also I saw an updated project that might be of use on github https://github.com/predictionmachines/sdslite – Mark Schultheiss Jan 11 '18 at 16:19

2 Answers2

3

Peter,

Since the error is in the ReadData(Int32[] origin, Int32[] shape) (You pointed out the same); I see two possible solutions:

Before delving into the solution you need to decide if missing data can be treated as 0.0 or does it need to be treated as missing. If missing is different than 0.0 then potentially missing can be encoded as -1.0 if null is unacceptable. Proposing a -1.0 value, for missing data, is assuming that a negative rainfall value is impossible.

If the result, dataValues, contains nulls potentially all you need to do is replace the float with float? in the line:

float thisValue = dataValues[iTime,iLatitude,iLongitude];

to be:

float? thisValue = dataValues[iTime,iLatitude,iLongitude]; 

And if you are home free with float? then this was a happy solution. (You still need to decide how to handle null values.)

Otherwise possible solution 1)

After the call to the Single[,,] dataValues = dataset.GetData<Single[,,]>("ACPR"); make sure that the last index size of the array, dataValues, is 85. Potentially GetData(..) does not populate all 85 fields, especially if first row data contains less than 85 fields. Then, if need be, manually replaced the nulls with 0's or -1.0's.

Then when you retrieve the data, you handle nulls, 0's or -1.0 appropriately:

float? thisValue = dataValues[iTime,iLatitude,iLongitude];
// determine what to do with a null/0.0/-1.0 as a thisValue[..] value, 
// .. potentially continue with the next iteration

Possible solution 2)

If you own the GetData(..) method in Single[,,] dataValues = dataset.GetData<Single[,,]>("ACPR"); then you ensure that it, GetData(..), does the work of providing all 85 values and missing values are given as nulls / 0's / -1.0's. Then when you retrieve the data, you handle nulls, 0's or -1.0 appropriately.

Cheers,

Avi

AviFarah
  • 327
  • 1
  • 10
  • Thanks Avi. I think I get the idea. I will work on it today. I'm not quite sure what you mean by "if you own GetData(..)" It's a function in the SDS libraries i am using, I didn't write it myself i'm just making use of it. – Sir Swears-a-lot Jan 07 '18 at 20:35
  • You are correct: rainfall can't be negative. However I like your suggestion am happy to use -1 and handle these later. But I'm still not sure how to do this. I have updated my code example, i had cut it down for simplicity but unfortunately it had typos. – Sir Swears-a-lot Jan 07 '18 at 23:00
  • I tried your suggestion: float? but the error occurs before that. (Line 20 not line 35). it fails at: Single[,,] dataValues = dataset.GetData("ACPR"); I couldn't figure out how to use a "?" in that context. I can change the Single to float if that helps. – Sir Swears-a-lot Jan 07 '18 at 23:20
  • Tried variations of Single/float, float?, float?[,,] dataValues etc but Visual studio doesn't seem to like that. – Sir Swears-a-lot Jan 07 '18 at 23:31
2

I recommend you try this since you don't know the data type it's trying to return:

Object[,,] dataValues = dataset.GetData<object[,,]>("ACPR");

Then you can check if you have a valid float in the loop.

if ( dataValues[iTime,iLatitude,iLongitude] == null )
{
    float floatValue = 0;
    if (Single.TryParse(dataValues[iTime,iLatitude,iLongitude].ToString(), out floatValue)
    {
        Console.WriteLine(dt.ToString() + ',' + dt2.ToString() + ',' + iTime.ToString() + ',' + dt2.AddHours(iTime) );
    }
}
Ctznkane525
  • 7,297
  • 3
  • 16
  • 40
  • Thanks for your reply. I just tried that but VS doesn't seem to like it. I get 2 warning/errs on the line with TryParse: 1. "cannot convert from object to string", and 2. "Argument must be passed with the out keyword". Sorry I'm an absolute novice when it comes to C#. – Sir Swears-a-lot Jan 08 '18 at 03:19
  • It fails again on Object[,,] dataValues = dataset.GetData("ACPR"); But It now throws a different error: Message=Requested variable does not exist in the data set – Sir Swears-a-lot Jan 09 '18 at 00:10
  • that's probably an underlying/unrelated error...my next recommendation...download the project (https://github.com/predictionmachines/SDSlite/tree/master/ScientificDataSet)...compile with that version...debug through it :-( – Ctznkane525 Jan 09 '18 at 10:40
  • I saw that there was a newer version of SDS but struggled with the installation. I will try again. Thanks. – Sir Swears-a-lot Jan 09 '18 at 20:42
  • you are not installing...in this case you would...go to the page https://github.com/predictionmachines/SDSlite there will be a green bottom top right of the grid...click download zip...extract it locally...youll open the solution in Visual Studio and compile it...then...youll add a reference from the other project as your compiled DLL...then youll be able to step through the code the same way you can step through your own code – Ctznkane525 Jan 09 '18 at 20:48
  • it's bombing here by the way private static int FindVariable(DataSet dataset, Func predicate) { var found = dataset.Where(predicate).ToArray(); if (found.Length == 0) throw new InvalidOperationException("Requested variable does not exist in the data set"); else if (found.Length > 1) throw new InvalidOperationException("Cannot unambiguously identify a variable in the data set"); return found[0].ID; } – Ctznkane525 Jan 09 '18 at 20:54
  • it cannot find a variable named "ACPR" based on what i see...as you are calling this function public static D GetData(this DataSet dataset, string variableName)...which calls this GetData(dataset, FindVariable(dataset, variable => variable.Name == variableName && variable.TypeOfData == dataType && variable.Rank == rank))...which calls findvariable where its bombing. I hope that is the answer to the problem..it cannot find the type of data object BECAUSE ACPR isn't an object...its a float...you'll wanna go back to float and debug – Ctznkane525 Jan 09 '18 at 21:00
  • I've downloaded extracted and compiled SDSLite. working on the next puzzle... adding reference. I've also reverted to Single[,,] dataValues = dataset.GetData("ACPR"). – Sir Swears-a-lot Jan 10 '18 at 20:43