3

The function add_ints correctly adds two integer columns

A,B
2,3
5,7
9,11

in a CSV file.

Why does the function add_strings not correctly concatenate two string columns

L,R
"a","b"
"c","d"
"e","f"

into a third column

L,R,C
"a","b","ab"
"c","d","cd"
"e","f","ef"

when starting from a similar CSV file?

using Deedle;
using System.IO;

namespace NS
{
    class TwoColumnOps
    {
        static void Main(string[] args)
        {
            string root = "path/to";
            add_ints(root);
            add_strings(root);
        }
        static void add_ints(string root)
        {
            Deedle.Frame<int, string> df = Frame.ReadCsv(Path.Combine(root, "data_ints.csv"));

            Series<int, int> a = df.GetColumn<int>("A");
            Series<int, int> b = df.GetColumn<int>("B");

            Series<int, int> c = a + b;
            df.AddColumn("C", c);
            df.Print();
        }
        static void add_strings(string root)
        {
            Deedle.Frame<int, string> df = Frame.ReadCsv(Path.Combine(root, "data_strings.csv"));

            Series<int, string> a = df.GetColumn<string>("L");
            Series<int, string> b = df.GetColumn<string>("R");

            // Series<int, string> c = a + b;
            // Series<int, string> c = $"{a} and {b}";
            Series<int, string> c = string.Concat(a, b);

            df.AddColumn("C", c);
            df.Print();
        }
    }
}

The error for all three styles of concatenation is:

Error   CS0029  Cannot implicitly convert type 'string' to 'Deedle.Series<int, string>' 
Vrokipal
  • 784
  • 5
  • 18
  • the int version works as explained in the documentation by saying "If a series contains numeric values (typically double) then we can perform various statistical operations and calculations with the series." You cannot do this with non-numeric values. See documentation: https://bluemountaincapital.github.io/Deedle/csharpseries.html#Statistics-and-calculations – kd345205 Oct 18 '19 at 15:37

6 Answers6

5

The reason why + works on series of numbers, but string.Concat does not work on series of strings is that the series type defines an overloaded + operator for numerical series. This sadly only works on numbers.

For non-numeric series, the easiest option is to use ZipInner to align the two series. This gives you a series of tuples. You can then use Select to transfom the values in an element-wise way:

var df = Frame.ReadCsv("/some/test/file.csv");
var s1 = df.GetColumn<string>("first");
var s2 = df.GetColumn<string>("second");
var added = s1.ZipInner(s2).Select(t => t.Value.Item1 + t.Value.Item2);
df.AddColumn("added", added);
Tomas Petricek
  • 240,744
  • 19
  • 378
  • 553
3

With latest Deedle 2.1.0. + is overridden for string concatenation in Series and Scalar, Series and Series, Series and Frame. Frame.strConcat works on Frame of string values https://github.com/fslaborg/Deedle/pull/483

Your code shall work now.

Deedle.Frame<int, string> df = Frame.ReadCsv(Path.Combine(root, "data_strings.csv"));

Series<int, string> a = df.GetColumn<string>("L");
Series<int, string> b = df.GetColumn<string>("R");
Series<int, string> c = a + b;
zyzhu
  • 266
  • 2
  • 7
1

Third time is a charm, hopefully. See the screenshot for matching output. I dont prefer the iterative approach, but the result is correct. I tried to see if any methods or extensions would work, but found none. On the bright side, this opens the door for any mutation you want for scaling, concatenation, etc for each row to build a new column. I hope this helps.

static void add_strings(string root)
    {
        Deedle.Frame<int, string> df = Frame.ReadCsv("data_strings.csv");

        Series<int, string> a = df.GetColumn<string>("L");
        Series<int, string> b = df.GetColumn<string>("R");

        RowSeries<int, string> rs = df.Rows;

        SeriesBuilder<int, string> c = new SeriesBuilder<int, string>();
        for (int i = 0; i < rs.KeyCount; i++)
        {
            c.Add(i, a[i] + b[i]);
        }

        df.AddColumn("C", c);
        df.Print();
    }

enter image description here

kd345205
  • 319
  • 1
  • 8
  • 1
    Essentially Deedle didn't overload + operator on Series<'K, string>. It would avoid the for loop if + is overloaded. Since .Net already supports + on strings, we shall add + operator on Deedle's series. I will add an issue there. – zyzhu Oct 18 '19 at 15:49
0

I apologize for providing multiple answers, I'm still new to trying to chip in and offer answers. FWIW: In light of the new comment from zyzhu citing the addition of a new overload; I thought I would offer one more solution to get you by. I think that overloading the '+' operator for a string will be a fine addition. I also think there is much more to be desired here that can be accomplished by creating a mutator method and taking a delegate to allow the user to define the mutation. Its possible that the user may want more than simple mutations and might want to do some actual calculations or other changes. Consider this extension method and its examples and please excuse the lack of error checking or support for anything other than primitive types...

public static class FrameMutator
{
    /// <summary>
    /// For a frame of type Frame<TRow,TCol> mutate its rows of type TVal and create a new column with the results
    /// </summary>
    /// <typeparam name="TRow">Row Type</typeparam>
    /// <typeparam name="TVal">Value Type</typeparam>
    /// <typeparam name="TCol">Column Type</typeparam>
    /// <param name="myFrame"></param>
    /// <param name="mutatorMethod">delegate for transformation</param>
    /// <returns>Series<K, V></returns>
    public static Series<TRow, TVal> Mutate<TRow,TVal,TCol>(this Frame<TRow, TCol> myFrame, Func<TVal[], TVal> mutatorMethod)
    {
        SeriesBuilder<TRow, TVal> result = new SeriesBuilder<TRow, TVal>();
        foreach (TRow key in myFrame.Rows.Keys)
        {
            TVal colResult = mutatorMethod(myFrame.Rows[key].GetValues<TVal>().ToArray());
            result.Add(key, colResult);
        }

        return result.ToSeries();
    }
}

This extension can be used as follows...

static void add_ints(string root)
    {
        Deedle.Frame<int, string> df = Frame.ReadCsv("data_ints.csv");

        Series<int, int> a = df.GetColumn<int>("A");
        Series<int, int> b = df.GetColumn<int>("B");

        //creates a column with the average of the row (not so useful with int)
        Series<int, int> avgCol = df.Mutate<int, int, string>(avgMutator);
        Series<int, int> c = a + b;

        df.AddColumn("C", c);
        df.AddColumn("D", avgCol);
        df.Print();
    }
    static void add_strings(string root)
    {
        Deedle.Frame<int, string> df = Frame.ReadCsv("data_strings.csv");

        Series<int, string> a = df.GetColumn<string>("L");
        Series<int, string> b = df.GetColumn<string>("R");

        //creates a column of concatenanted values
        Series<int,string> concatCol = df.Mutate<int,string,string>(ConcatMutator);
        //creates a column of concatenated and UPPER values
        Series<int, string> upperCol = df.Mutate<int, string, string>(ToUpperMutator);

        df.AddColumn("C", concatCol);
        df.AddColumn("D", upperCol);

        df.Print();
    }      

    private static string ConcatMutator(string[] inputs) => string.Concat(inputs);

    private static string ToUpperMutator(string[] inputs)
    {
        IEnumerable<string> uppers = inputs.Select(e => e.ToUpper());
        return string.Concat(uppers);
    }
    private static int avgMutator(int[] inputs) => (int)Math.Round(inputs.Average(), 0);
kd345205
  • 319
  • 1
  • 8
-1

I've never used deedle but your data is two string columns. Both columns consist of string data not numbers so it seems this line:

Deedle.Frame<int, string> df = Frame.ReadCsv(Path.Combine(root, "data_strings.csv"));

should be:

Deedle.Frame<string, string> df = Frame.ReadCsv(Path.Combine(root, "data_strings.csv"));

Looking at the documentation here: https://bluemountaincapital.github.io/Deedle/csharpframe.html they say that Deedle infers the data types and in all of their examples they are just using 'var' rather than an explicit type. Try just using:

var df = Frame.ReadCsv(Path.Combine(root, "data_strings.csv"));

And then you can debug and see what df looks like with a debugger. Good luck!

kd345205
  • 319
  • 1
  • 8
  • The data types are correct in the question. I avoided `var` for clarity. The first type parameter `int` is the row index. The second is the column type: `string` in the `add_strings` function. – Vrokipal Oct 18 '19 at 13:44
  • The type parameters of `Frame` are types of row and column indices, so `int` and `string` is right. – Tomas Petricek Oct 18 '19 at 21:46
-1

Sorry for the confusion on the first answer. It seems there is no good way to add series together. I tried the 'Merge" method and it threw errors. I recreated this locally and while it seems kinda hacky, this works...

static void add_strings(string root)
    {
        Deedle.Frame<int, string> df = Frame.ReadCsv("data_strings.csv");

        Series<int, string> a = df.GetColumn<string>("L");
        Series<int, string> b = df.GetColumn<string>("R");

        // Series<int, string> c = a + b;
        // Series<int, string> c = $"{a} and {b}";
        int rowCount = a.ValueCount + b.ValueCount;
        int[] keys = Enumerable.Range(0, rowCount).ToArray();
        Series<int, string> c = new Series<int, string>(keys, a.Values.Concat(b.Values));

        df.AddColumn("C", c);
        df.Print();
    }

enter image description here

kd345205
  • 319
  • 1
  • 8
  • I'm not sure what you're trying to do. You are aware that `rowCount` and `keys` have size 6, not 3; is that right? Anyway, this leads to the following Series in `c`: {series [ 0 => a; 1 => c; 2 => e; 3 => b; 4 => d; 5 => f]}. The concatenation should instead be {series [ 0 => ab; 1 => cd; 2 => ef]}. – Vrokipal Oct 18 '19 at 14:54
  • I read your a+b to be that you wanted to add the series together to create one big series in memory (concat or merge). Now I understand that you want to create a new series containing the concatenation of the first 2 series, row by row. I'll see what I can do. Hopefully I understand your goal now at least. – kd345205 Oct 18 '19 at 15:05
  • I see. I updated the question with an example of the output. – Vrokipal Oct 18 '19 at 15:25