2

I'm in the process of analysing a dataset which looks like the following

#Latitude   Longitude   Depth [m]   Bathy depth [m] CaCO3 [%] ...
-78 -177    0   693 1
-78 -173    0   573 2
.
.

The plan is to be able to read the data into a vector and then break it down into different groups of "Bathy depth" (Bathymetric depth). It also needs to be simultaneously grouped into different ocean basins. For instance, all of the data points within the North Atlantic, that is also between a BathyDepth of 500-1500m, 1000-2000m, 1500-2500m... should be in its own group (maybe a vector, or another object). The idea is to be able to output these into different text files.

I have attempted this in a somewhat stuttered manner. You can see this below

#include <iostream>
#include <sstream>
#include <fstream>
#include <vector>
#include <string>
#include <GeographicLib/Geodesic.hpp> //Library which allows for distance calculations

using namespace std;
using namespace GeographicLib;

//Define each basin spatially
//North Atlantic
double NAtlat1 = 1,  NAtlong1 = 1, NAtlat2 = 2, NAtlong2=2; //Incorrect               values, to be set later
//South Atlantic
double SAtlat1 = 1,  SAtlong1 = 1, SAtlat2 = 2, SAtlong2=2;
//North Pacific and the all others...

struct Point
{
        //structure Sample code/label--Lat--Long--SedimentDepth[m]--BathymetricDepth[m]--CaCO3[%]...
        string dummy;
        double latitude, longitude, rockDepth, bathyDepth, CaCO3, fCaCO3, bSilica, Quartz, CO3i, CO3c, DCO3;
        string dummy2;
        //Use Overload>> operator
        friend istream& operator>>(istream& inputFile, Point& p);
};

//Overload operator
istream& operator>>(istream& inputFile, Point& p)
{
        string row_text;
        //Read line from input, store in row_text
        getline(inputFile, row_text);
        //Extract line, store in ss row_stream
        istringstream row_stream(row_text);
        //Read-in data into each variable 
        row_stream >> p.dummy >> p.latitude >> p.longitude >> p.rockDepth >> p.bathyDepth >> p.CaCO3 >> p.fCaCO3 >> p.bSilica >> p.Quartz >> p.CO3i >> p.CO3c >> p.DCO3 >> p.dummy2;
        return inputFile;
}

int main ()
{
        //Geodesic class.
        const Geodesic& geod = Geodesic::WGS84();

        //Input file
    ifstream inputFile("Data.csv");
    //Point-type vector to store all data
    vector<Point> database;

    //bathDepth515 = depths between 500m and 1500m
    vector<Point> bathyDepth515;
    vector<Point> bathyDepth1020; //Create the rest too
    Point p;
    if (inputFile)
    {
            while(inputFile >> p)
            {
                    database.push_back(p);
            }

            inputFile.close();
    }
    else
    {
            cout <<"Unable to open file";
    }

    for(int i = 0; i < database.size(); ++i)
    {
            //Group data in database in sets of bathyDepth
            if(database[i].bathyDepth >= 500 && database[i].bathyDepth < 1500)
            {
                    //Find and fill particular bathDepths
                    bathyDepth515.push_back(database[i]);
            }
            if(database[i].bathyDepth >= 1000 && database[i].bathyDepth < 2000)
            {
                    bathyDepth1020.push_back(database[i]);
            }
            //...Further conditional statements based on Bathymetric depth, could easily include a spatial condition too...

    //Calculate distance between point i and all other points within 500-1500m depth window. Do the same with all other windows.
    for(int i = 0; i < bathyDepth515.size(); ++i)
    {
            for(int j = 0; j < bathyDepth515.size(); ++j)
            {
                    double s12;
                    if(i != j)
                    {
                            geod.Inverse(bathyDepth515[i].latitude, bathyDepth515[i].longitude, bathyDepth515[j].latitude, bathyDepth515[j].longitude, s12);
                    }
            }
    }
    return 0;
}

Problem 1: I think it is clear to see some of the methodology is not object oriented. For instance, there may be a better way to assign each data Point to a specific ocean basin, instead of manually putting these in at the beginning of my programme, just as I have done when grouping the data by depth. I started to create a basin class, with methods to retrieve the location, and definitions of each basin' lat/long, but did not get very far in finding an intuitive way of doing this. I'd like for someone to give me an idea of how better to build this. My attempt at building a (very fragile) class is as follows

class Basin
{
    public:
            Basin();
            Basin(double latit1, double longit1, double latit2, double longit2);
            double getLatitude();
            ...
    private:
            double NAt, SAt, NPac, SPac; //All basins
            double latitude1, longitude1, latitude2, longitude2; //         Boundaries defined by two latitude markers, and two longitude markers.
};

class NAt: public Basin{...}
//...Build class definitions...

Problem 2: My second concern is the method in which I have created vectors for the different depth windows. This could become very cumbersome if I have to change the way in which I have split the depths, or add more depths. I wouldn't like to have to change almost all parts of my programme just to accommodate how I decide to slide my depth windows. I'd appreciate someone to give me a few ideas on how to go about this. The only solution I could think of to Problem 2 is as follows

//Vector of vectors to store each rows' entries separately. Not sure if this is needed as my `Point` `Struct` already allows for each of the entries in a row to be accessed. 
    vector<vector<Point> > database;
    //Create database vectors for different bathDepth windows, eventhough their names will be the same, once push_back, it doesn't matter
    for(int i = 1*500; i<=8*500; i+=500)
    {
        vector<Point> bathDepthi;
        //Possibly push these back into database. These can then be accessed and populated using if statements.
    }
    //Create vectors individually, creating vector of vectors would be more elegant as I won't have to create more if one were to decide to change the range, I'd just have to change the for loop.

I don't know how much my attempts at this help, but I thought I might give a better insight into what I'm trying to get at. Apologies for such a long post.

EDIT-DESIGN SWITCH FROM std::vector TO std::map

Based on the input from a posted answer by heke, I have tried the following, but I am not sure if this is what the said user meant. I have chosen to use conditional statements to determine whether a point is within a particular basin. If my attempt is correct, I'm still unsure of how to proceed if this is correct. More specifically, I'm not sure how to store to access the partitioned vectors to, say, write them to separate text files (i.e. .txt files for a different bathymetric depths). Should I store the partition iterator into a vector, if so, of what type? The iterator is declared with auto, which confuses me as to how to declare the type of vector to hold this iterator.

My attempt:

std::map<std::string, std::vector<Point> > seamap;
seamap.insert( std::pair<std::string, std::vector<Point> > ("Nat", vector<Point>{}) );
seamap.insert( std::pair<std::string, std::vector<Point> > ("Sat", vector<Point>{}) ); //Repeat for all ocean basins

Point p;

while (inputFile >> p && !inputFile.eof() )
{
    //Check if Southern Ocean
    if (p.latitude > Slat2)
    {
        //Check if Atlantic, Pacific, Indian...
        if (p.longitude >= NAtlong1 && p.longitude < SAtlong2 && p.latitude > SPLIT)
        {
            seamap["Nat"].push_back(p);
        } // Repeat for different basins
    }
    else
    {
        seamap["South"].push_back(p);
    }
}

//Partition basins by depth
for ( std::map<std::string, std::vector<Point> >::iterator it2 = seamap.begin(); it2 != seamap.end(); it2++ )
{
    for ( int i = 500; i<=4500; i+=500 )
            {
                auto itp = std::partition( it2->second.begin(), it2->second.end(), [&i](const auto &a) {return a.bathyDepth < i;} );
            }
}
PaleoPhys
  • 109
  • 8
  • Problem 3 `for(int i = 1*500; i<=8*500; ++(i*500))` - completely opaque magic numbers. –  Aug 28 '18 at 23:04
  • I did that intentionally to remind me of the split between different bathydepths :) – PaleoPhys Aug 28 '18 at 23:07
  • 1
    You are on right track, all that is missing is a some kind of data-structure to hold those iterators returned by std::partition. You need this because A) the LOWER_ITERATOR used in next iteration of that inner loop is the same one returned by std::partition on previous iteration. B) You also need these iterators when you are accessing the data and writing it on the file. For example if you want to write a file with bathys between 1500 and 2500 in Nat, all you have to do is use iterators on nat vector returned by 3rd and 5th iteration of inner loop... – heke Sep 02 '18 at 22:20
  • 1
    Your outer loop is going through these vectors in alphabetical order by their keys (map's data is in alphabetical order.) So maybe saving the iterators to another map with similar keys, but with vectors of vector::iterators as data would work? This way your data writing to files is trivial, just search the correct data from first map using the key of the sea you want to work on, then search iterators from second map, using the same key and select two of those iterators depending on depths you want to write and use them on that data from first map. – heke Sep 02 '18 at 22:29

1 Answers1

2

Problem 1

Check this out: Determining if a set of points are inside or outside a square

You can arrange the point by their basin while reading the data file. You just need a data structure where to but them. std::map might be a good candidate. (use the name of the ocean as the key and vector of points as the value)

Problem 2

Your depth window limits seem to grow with 500m steps, and the ranges seem to overlap. Maybe using std::partition for those vectors inside that map would work?

some pseudo:

  1. partition the vector with <500 as the predicate, using the whole range.
  2. store the iterator returned by step 1 and use it as your lower while still using the end iterator as upper.
  3. now partition with <1000 as the predicate.
  4. same as 2-3. but with < 1500 as predicate and iterator returned by <1000 as the lower.
  5. repeat until all the depths are covered.
  6. do the same for all the sea-vectors stored in that map.

The iterators that you stored can be used to get a certain range from that particular vector.

i.e use the iterator returned by the <500 partition as a lower and iterator returned by <1500 partition and you get a range that has depths between 500 and 1500 in that particular vector.

Now you should have a map full of vectors that you can access by name of the ocean and the range of depth, with intervals of 500m.

edit: some clarification

std::map<std::string, std::vector<point>> seamap;
seamap.insert("Nat", std::vector<point>{}); // do this for every sea.
//read point data from file

while(FILESTREAM_NOT_EMPTY)
{
    //construct a point object here.

    // add points to correct vector

    if (THIS_PARTICULAR_POINT_IS_IN_NAT)
           seamap["Nat"].push_back(point);
    else if (THIS_PARTICULAR_POINT_IS_IN_SAT)
           seamap["Sat"].push_back(point);
    //and so on...
}

after this do the partition thingy on every vector in that seamap. (helper : How to use range-based for() loop with std::map?)

for (int i =500; i<4500;i+500)
{
      auto iter_for_next = std::partition(LOWER_ITERATOR,
                                          UPPER_ITERATOR, 
                                          [&i]( const auto &a){return a < i;});

//here save the iterator for later use in some kind of structure.
}

Hope this helps!

heke
  • 96
  • 4
  • I really appreciate the reply! I've had a look into your link for a solution to Problem 1, but I'm still unsure of how to start. Using `std:map` may be a great idea, but I'm also unsure how to use that. Could you give a few lines of code that would fit into what you're offering as solutions to the problems? (I'm not an experienced programmer) – PaleoPhys Aug 29 '18 at 11:04
  • Great, I'll try this soon. – PaleoPhys Aug 30 '18 at 12:37
  • (1)I'm confused as the vectors are part of a map. Do I need to put the vectors within the maps as the `LOWER_ITERATOR` and the `UPPER_ITERATOR`s? (2)Also, would I store these within their own vectors (maybe using `std::partition_copy`?), or have I got the wrong idea? – PaleoPhys Sep 01 '18 at 16:35
  • I've added an "EDIT" section in my post, which follows my last comment – PaleoPhys Sep 02 '18 at 11:48