-1

My problem is similar but more complex than this thread How to Consolidate Data from Multiple Excel Columns All into One Column.

Here is the sample excel

Date       Measure1  A    B     Date       Measure2    A    B   C   Date.....
11/11/11   1234     1     2     11/12/12   5678        1    3   3   12/12/12  ...
12/11/12   234     34    234    12/12/13   345        342   23  33  12/12/13  ...
........

There are hundreds columns in the excel. One date column followed by a measurement column, then some other columns. Now I only want date column, measure name column and value column. The result excel file should like

Date      Measure Name      Value
11/11/11  Measure1          1234
11/12/12  Measure2          5678
12/12/12  ....
....
12/11/12  Measure1          234
12/12/13  Measure2          123

How could I achieve it by VBA? Since I have thousands files like this, VBA seems like the best way to consolidate those files and load into database.

I always get

  Run-time error '1004'
  Application -defined or object -defined eror"

at

  w.Sheets("DataSort").Range("A1").Resize(k, UBound(Arr2, 2)) = Arr2

Here is my code

Sub convertExcel()
Dim Arr1, Arr2()
Dim Rnum As Integer, Cnum As Integer, Tnum As Integer
Dim i As Integer, j As Integer, k As Integer
'Rnum = row number; Cnum = column number; Tnum as  total number

Application.ScreenUpdating = False
Set w = Workbooks.Open("FileNAME~~~~")
Rnum = w.Sheets("Data").Cells(Rows.Count, 1).End(xlUp).Row

Cnum=208
Tnum = Rnum * Cnum / 2
w.Sheets.Add.Name = "DataSort"

Arr1 = Range("A1:GZ" & Rnum)
ReDim Arr2(1 To Tnum, 1 To 3)

For j = 2 To Cnum
  If w.Sheets("Data").Cells(1, j) = "Date" Then
     For i = 2 To Rnum
    If Arr1(i, j) <> "" Then
        k = k + 1:
        Arr2(k, 1) = Arr1(i, j)
        Arr2(k, 2) = Arr1(1, j)
        Arr2(k, 3) = Arr1(i, j + 1)
    End If
    Next
    End If
Next


w.Sheets("DataSort").Range("A1").Resize(k, UBound(Arr2, 2)) = Arr2

w.Close True
Application.ScreenUpdating = True
End Sub
Community
  • 1
  • 1
Decula
  • 494
  • 4
  • 16
  • What is the code you are using so far? This site isn't a code writing service and showing your efforts so far goes a long way in getting better and more useful answers. – enderland Jan 14 '14 at 18:07
  • @enderland my code is manually read column index and first row for the name of measurement. But my iteration doesn't work currently. Since those measurements are confidential, so I don't have valuable code to show. – Decula Jan 14 '14 at 18:11
  • You are going to have a hard time getting help with code no one else can see.. – enderland Jan 14 '14 at 21:47
  • For every `Measurex`, are the counts of dates the same or they also vary from `Measure` to `Measure`? – WGS Jan 16 '14 at 03:58
  • on top of what BK201 said, are the column interval between date and measure column constant? – L42 Jan 16 '14 at 04:55
  • See my answer below. This is assuming that @L42's concern is to the positive: that `Date` is beside `Measure`. – WGS Jan 16 '14 at 05:59
  • @BK201 yes, the measureX always follow data directly. – Decula Jan 16 '14 at 18:17
  • @Decula: Did the answer below help? – WGS Jan 16 '14 at 23:05

1 Answers1

1

As I have a lot of time on my hands today, I decided to devote some time here. I find it a bit challenging, but in the end, it's just proper sequencing of events.

Following is the logic I employed:

  • Delete all non-Date and non-MeasureX columns.
  • Store all the column names with Measure in them in a dictionary (totally unnecessary but, hey, it's quick) as keys.
  • Iterate over the keys of the first dictionary and create a second dictionary to store the date-value pairs as key-value pairs.
  • Every iteration, we print out the key-value pairs in a second sheet.

Kindly read all the comments in the code as well. Also, take note of my set-up below. Finally, test this on a copy of your workbook.

Set-up:

In Sheet2, I have an unabridged data set copied roughly from your example with 1508 columns and 1500 rows of data, excluding the header. After deleting the unwanted columns, the data will be reduced to 734 columns and 1500 rows of data. On testing, the deletion takes about 11-13 seconds on my end. Your mileage may vary here.

Using this filtered data, working on it using a second dictionary takes roughly 8-9 seconds on my end to finish. The whole process basically finishes around ~20 seconds.

Screenshots:

Sheet2 (Sheet with original data):

enter image description here

Sheet3 (Output sheet):

enter image description here

Code:

Sub KamehameWave()

    Dim Sht2 As Worksheet, Sht3 As Worksheet
    Dim Dict As Object, Cell As Range
    Dim Dict2 As Object, Cell2 As Range
    Dim RngToDelete As Range

    Set Sht2 = ThisWorkbook.Sheets("Sheet2") 'Modify accordingly.
    Set Sht3 = ThisWorkbook.Sheets("Sheet3") 'Modify accordingly.

    Application.ScreenUpdating = False

    With Sht2
        '-----------------------------------BK201's Notes-----------------------------------'
        ' The following block will delete unneeded columns. Basically, it will only keep    '
        ' columns that either have "Date" or "MeasureX" in their headers. All else will be  '
        ' discarded. As said in the post, do this on a copy of your worksheet.              '
        '-----------------------------------BK201's Notes-----------------------------------'
        Start = Timer()
        For Each Cell In .Rows(1).Cells
            If InStr(1, Cell.Value, "Date") = 0 And InStr(1, Cell.Value, "Measure") = 0 Then
                If Not RngToDelete Is Nothing Then
                    Set RngToDelete = Union(RngToDelete, .Columns(Cell.Column))
                Else
                    Set RngToDelete = .Columns(Cell.Column)
                End If
            End If
        Next Cell
        RngToDelete.Delete
        Debug.Print Timer() - Start
        Start = Timer()
        '-----------------------------------BK201's Notes-----------------------------------'
        ' The following block will create a dictionary and store all the names of columns   '
        ' with "Measure" in them. This is just so you have a reference. An array or a       '
        ' collection will do as well. I prefer to use this though as I find it easier.      '
        '-----------------------------------BK201's Notes-----------------------------------'
        Set Dict = CreateObject("Scripting.Dictionary")
        For Each Cell In .Rows(1).Cells
            CheckIfMeasure = InStr(1, Cell.Value, "Measure")
            If CheckIfMeasure > 0 Then
                If Not Dict.Exists(Cell.Value) And Not IsEmpty(Cell.Value) Then
                    Dict.Add Cell.Value, Empty
                End If
            End If
        Next Cell
        '-----------------------------------BK201's Notes-----------------------------------'
        ' What we'll do next is to iterate over each "MeasureX" column. We'll iterate over  '
        ' the values on these columns and store them in a *second* dictionary, with their   '
        ' respective dates being the keys.                                                  '
        '-----------------------------------BK201's Notes-----------------------------------'
        For Each Key In Dict
            MColIndex = Application.Match(Key, .Rows(1), 0)
            MColLRow = .Cells(Rows.Count, MColIndex).End(xlUp).Row
            Set MCol = .Range(.Cells(2, MColIndex), .Cells(MColLRow, MColIndex))
            Set Dict2 = CreateObject("Scripting.Dictionary")
            For Each Cell2 In MCol
                If Not Dict2.Exists(Cell2.Value) And Not IsEmpty(Cell2.Value) Then
                    Dict2.Add Cell2.Offset(0, -1).Value, Cell2.Value
                End If
            Next Cell2
        '-----------------------------------BK201's Notes-----------------------------------'
        ' The final phase is to get the next empty row in the output sheet and dump all the '
        ' key-value pairs from our second dictionary there. Since we're also iterating      '
        ' through the keys of the first dictionary, the list will append properly to        '
        ' accommodate each key's own dictionary.                                            '
        '-----------------------------------BK201's Notes-----------------------------------'
            TColNRow = Sht3.Range("A" & Rows.Count).End(xlUp).Row + 1
            Sht3.Range("A" & TColNRow).Resize(Dict2.Count, 1).Value = Application.Transpose(Dict2.Keys)
            Sht3.Range("B" & TColNRow).Resize(Dict2.Count, 1).Value = Key
            Sht3.Range("C" & TColNRow).Resize(Dict2.Count, 1).Value = Application.Transpose(Dict2.Items)
        Next Key
        Debug.Print Timer() - Start
    End With

    Application.ScreenUpdating = True

End Sub

Results after running the code:

enter image description here

enter image description here

First number is runtime of deletion, second is for the transposition. Not bad considering that I am at half a million data points. Sorting the data is on your court.

Let us know if this helps.

WGS
  • 13,969
  • 4
  • 48
  • 51
  • plus one for the effort! :D i mean this should do it. and also the well outlined comments. – L42 Jan 16 '14 at 06:08
  • @L42: Thanks. I can see some improvements here and there, but the code is robust enough. Here's to hoping it works, as this can still break with 1 or 2 conditions. – WGS Jan 16 '14 at 06:11
  • haha also for the sub name lol... `Sub KamehameWave()` yeah. btw, it should work. :) – L42 Jan 16 '14 at 06:14