0

I need to do a full outer join on 2 datatables dinamically, I don't know what columns are in the datatables but they will be the same in each table, I also only have the name of the column I need to do the join on in a variable. Is there a way of doing this?

What I need to do is join 2 datatables in a C# script. I'm using a Dataflow in an SSIS to get data from a couple of files, and at the end I need to compare the 2 final sets of data. I need to to this on whatever 2 datatables as long as they have the same columns, so I can't finish the process in an SSIS as I need to specify the columns.

The GetData() I just use it in case I need to compare 2 tables but donnesPROD and donnesDEV are filled from object variables in the SSIS.

Here's my code so far :

DataTable donnesPROD = GetData(connectionPROD, sql_request);
DataTable donnesDEV = GetData(connectionDEV, sql_request);

Here's the code for GetData :

DataTable GetData(string cs, string query)
        {
            OleDbConnection conn = new OleDbConnection(cs);
            conn.Open();
            OleDbCommand cmd = new OleDbCommand(query, conn);

            DataTable dt = new DataTable();
            dt.Load(cmd.ExecuteReader());
            conn.Close();
            return dt;
        }

I have the list of the columns in another datatable, and I have the name of the primary key in a string variable key. From here I need to be able to do a fullouterjoin of donnesPROD and donnesDEV on key. Can this be done this way? Or is there a way of generating the script code it self dynamically and then execute it?

d_anass
  • 109
  • 2
  • 12
  • It is not clear at all what you are asking or what you are expecting as an answer. For any specific problem you are having please include a [mcve]. Please also read [ask] a good question. Make sure that your questions are specific and not overly broad. – Igor Sep 22 '17 at 20:50
  • Take a look at this approach: https://www.codeproject.com/Articles/37444/Full-Outer-Join-of-Two-DataTables-C-Code – Mateus Schneiders Sep 22 '17 at 20:56

1 Answers1

1

You have two options.

Conditional joins

If you don't know the specific column name, but you do have some idea what the column name might be, you could do a conditional join like this:

CREATE PROCEDURE ExampleDynamicJoin(@JoinColumn AS VarChar(40))
AS
BEGIN
    SELECT *
    FROM   TableA
    JOIN   TableB ON (@JoinColumn = 'ColumnA' AND TableA.ColumnA = TableB.ColumnA)
                  OR (@JoinColumn = 'ColumnB' AND TableA.ColumnB = TableB.ColumnB)
                  OR (@JoinColumn = 'ColumnC' AND TableA.ColumnC = TableB.ColumnC)
END

You may not get the best performance out of this (the conditional joins will confuse the query engine and it may not pick the best index, if it picks one at all). If the table is very large you could also do something like this. It is a bit painful-looking but will get better performance:

CREATE PROCEDURE ExampleDynamicJoin(@JoinColumn AS VarChar(40))
AS
BEGIN
    IF (@JoinColumn = 'ColumnA') BEGIN
        SELECT *
        FROM   TableA
        JOIN   TableB ON TableA.ColumnA = TableB.ColumnA
    END
    IF (@JoinColumn = 'ColumnB') BEGIN
        SELECT *
        FROM   TableA
        JOIN   TableB ON TableA.ColumnB = TableB.ColumnB
    END
    IF (@JoinColumn = 'ColumnC') BEGIN
        SELECT *
        FROM   TableA
        JOIN   TableB ON TableA.ColumnC = TableB.ColumnC
    END
END

If TableA or TableA are part of a larger query, and you'd end up duplicating tons of SQL, you could always extract the resultset for just TableA and TableB into a temporary table, then use the temporary table in the larger query.

Dynamic SQL

If you don't have the foggiest about the column name and there are tons of possibilities, you could construct the SQL as a string and join that way. You should validate the column name that is passed in; not only will that make sure the column actually exists, but it will prevent the dynamic SQL from being constructed when @JoinColumn contains an injection attack, since legal column names do not contain SQL statements. Example:

CREATE PROCEDURE ExampleDynamicJoin(@JoinColumn AS VarChar(40))
AS
BEGIN
    DECLARE @Sql AS VarChar(MAX)

    IF NOT EXISTS 
    (
        SELECT 0 
        FROM syscolumns c 
        JOIN sysobjects o ON o.id = c.id
        WHERE  o.Name = 'TableA'
        AND    c.Name = @JoinColumn
    )
    RAISERROR  (15600,-1,-1, 'ExampleDynamicJoin');  //Throw error if column doesn't exist

    SET @Sql = 
       'SELECT *
        FROM   TableA
        JOIN   TableB ON TableA.' + @JoinColumn + ' = TableB.' + @JoinColumn

    sp_ExecuteSql @Sql
END

Or, if you don't use stored procedures,

DataTable ExampleDynamicJoin(string joinColumn)
{
    if (!ValidateColumn(joinColumn)) throw new ArgumentException();

    var sql = String.Format(
      @"SELECT * 
        FROM TableA 
        JOIN TableB ON TableA.{0} = TableB.{0}", 
        joinColumn
    );

    using (var connection = GetConnectionFromSomewhere())
    {
        using (var cmd = new SqlCommand
        {   
            CommandText = sql,
            CommandType = CommandType.Text,
            Connection = connection
        })
        {
            var reader = cmd.ExecuteReader();
            var table = new DataTable();
            table.Load(reader);
            return table;
        }
    }
}

When using dynamic SQL you should always use parameters if possible. But you can't use parameters as a column name, so in this case you have to concatenate. When concatenating, ALWAYS white list the inputs. That is why I included a function named ValidateColumn which could look like this:

bool ValidateColumn(string columnName)
{
    switch columnName.ToUpper()
    {
        case "COLUMNA":
        case "COLUMNB":
        case "COLUMNC":
            return true;
        default:
            return false;
    }
}
John Wu
  • 50,556
  • 8
  • 44
  • 80
  • The conditional join solution is only good for almost trivially small tables as the join conditions are not SARGable, and therefore will always result in full table or index scanning. – Dave Markle Sep 22 '17 at 21:43
  • Thank you John Wu for all your examples. I will update the question, but what I need to do is join 2 datatables in a C# script. I'm using a Dataflow in an SSIS to get data from a couple of files, and at the end I need to compare the 2 final sets of data. I need to to this on whatever 2 datatables as long as they have the same columns, so I can't finish the process in an SSIS as I need to specify the columns. – d_anass Sep 25 '17 at 14:24