0

I need your help on below situation (using Pig Latin).

I have an input data in following format:

DATE,ERROR,COUNT
20110212,ORA-00600,9
20110212,ORA-03135,1
20110226,ORA-20000,1
20110305,ORA-03135,1
20110319,ORA-01555,1
20110319,ORA-03135,18
20110326,ORA-00600,1
20110409,ORA-00600,1
20110416,ORA-20000,1
20110423,ORA-01555,2
20110507,ORA-00600,1
20110507,ORA-01555,2
20110514,ORA-00600,1
20110514,ORA-03135,4

I am looking for output like below.

DATE,ORA-00600,ORA-03135,ORA-20000....
20110212,9,1,0....
20110514,1,4,0...
20110416,0,0,1...

Basically, I am looking to Pivot the input data; I could not use nested FOREACH (with filter) because Error codes (like ORA-00600, ORA-03135..) is not constant, it's dynamic.

Please suggest.

user7337271
  • 1,662
  • 1
  • 14
  • 23
sndipz
  • 1
  • It looks like similar question was asked before. Take a look at http://stackoverflow.com/questions/11578815/pivoting-in-pig/11594331#11594331 – alexeipab Dec 26 '13 at 19:48
  • alexeipab, its similar but in this case the no. of columns is not predefined while pivoting.. – sndipz Jan 11 '14 at 17:50

1 Answers1

0

I am not surprised at how many questions, regarding the PIVOT functionality in SQL, gets asked. I too have had some projects where I needed the use of a PIVOT result. Spreadsheets are a perfect example (and a huge want in the corporate IT world).

Pivot tables are great, but if you work in a dynamic manufacturing environment, as I do, they often become cumbersome and complex. Needs, in this type of realm, are always subjective. And can cause a programmer to pull his hair out from the roots.

It was for this reason that I decided to create an extension on the DataTable object, which yields a PIVOT table that I have found both quick and useful.

Yes, I am aware of the SQL version, and am aware still that there are ways to make the PIVOT in SQL more dynamic (for changing columns). But I have learned to keep things simple. With simplicity comes stability, useability, extensibility, and scalability.. Which are all HUGE pluses in the real corporate world.

I share this with you now, in hopes you can modify it to serve your needs. It's dynamic and very fast.. and best of all SIMPLE.

/// <summary>
/// Creates a PIVOT table based on values in existing table
/// </summary>
/// <param name="PKey">Field used as PIVOT tables Primary (row) Key.</param>
/// <param name="FKey">Field used to create unique column names in PIVOT table.</param>
/// <param name="VKey">Field used as the value for each associated column in a PIVOT table record.</param>
/// <returns>DataTable</returns>
public static DataTable Pivot(this DataTable tbl, string PKey, string FKey, string VKey)
    {
        DataTable retVal = null;
        DataTable keyTbl = null;
        DataTable fldTbl = null;
        DataColumn dc = null;
        DataRow newRow = null;

        Type pkType = null;
        Type valType = null;

        string strPkeyFilter = null;
        string prevFilter = null;

        try
        {
            // Get previous filter (save)
            prevFilter = tbl.DefaultView.RowFilter;

            if (tbl.Rows.Count > 0)
            {
                // Get unique Primary Keys and Field Names
                keyTbl = tbl.DefaultView.ToTable(true, PKey);
                fldTbl = tbl.DefaultView.ToTable(true, FKey);

                if (fldTbl.Rows.Count > 0)
                {
                    // Get Primary Key data type and create column on Pivot table.
                    pkType = tbl.Columns[PKey].DataType;
                    retVal = new DataTable(tbl.TableName);
                    retVal.Columns.Add(new DataColumn(PKey, pkType));
                    retVal.PrimaryKey = new DataColumn[] { retVal.Columns[PKey] };

                    // Get VALUE columns data type
                    valType = tbl.Columns[VKey].DataType;

                    // Create Primary Key filter
                    strPkeyFilter = "{0} = {1}";
                    if(pkType.Equals(typeof(string)))
                        strPkeyFilter = "{0} = '{1}'";

                    foreach (DataRow dr in fldTbl.Rows)
                    {
                        // Create additional rows in Pivot table based on unique Field Names
                        dc = new DataColumn(Convert.ToString(dr[FKey]), valType);
                        dc.AllowDBNull = true;
                        dc.DefaultValue = DBNull.Value;
                        retVal.Columns.Add(dc);
                    }

                    // Iterate through Primary Key collection
                    foreach (DataRow dr in keyTbl.Rows)
                    {
                        // Set Primary Key row filter on base table
                        tbl.DefaultView.RowFilter = string.Format(strPkeyFilter, PKey, dr[PKey]);
                        if (tbl.DefaultView.Count > 0)
                        {
                            // Create new row on Pivot table and set Primary Key
                            newRow = retVal.NewRow();
                            newRow[PKey] = dr[PKey];

                            // Iterate through records and find VALUE for each FIELD column
                            foreach (DataRowView drv in tbl.DefaultView)
                                newRow[Convert.ToString(drv[FKey])] = drv[VKey];

                            retVal.Rows.Add(newRow);
                        }
                    }

                    // Return previous row filter on base table
                    tbl.DefaultView.RowFilter = prevFilter;
                    // Commit chages to Pivot table (if any)
                    retVal.AcceptChanges();
                }
            }
        }
        catch (Exception)
        {

            throw;
        }
        finally
        {
            if (keyTbl != null)
                keyTbl.Dispose();
            if (fldTbl != null)
                fldTbl.Dispose();
        }

        return retVal;
    }

The usage is simple, as it's an extended method on the DataTable object:

DataTable dt = myTbl.Pivot("DOC_ID", "UI_FIELD", "UI_VALUE");

So if myTbl is layed out like:

DOC_ID  |  ENTRY_DATE  |  UI_FIELD  |  UI_CAPTION  |  UI_VALUE
----------------------------------------------------------------
  1        01/01/2015    EMPL_NAME    Empl. Name     John Doe
  1        01/01/2015    EMPL_PHONE   Empl. Phone    801-555-1212
  1        01/01/2015    EMPL_MNGR    Empl. Mangr.   Jane Doe

It will return:

DOC_ID  |  ENPL_NAME  |  EMPL_PHONE  | EMPL_MNGR
-------------------------------------------------
  1        John Doe     801-555-1212   Jane Doe

The benefit (in a manufacturing environment anyway) of laying out UI entry data in vertical rows is to better accommodate UI field additions, or subtractions.. Which happens a LOT! Because the last thing you want to do is modify the data schema. Especially when a hundred or so (known and unknown) applications are using it as well.

In any case, and for what it's worth, I hope this heads you in the desired direction. I use this quite often.