1

I have a spreadsheet (over 100,000 rows) with 10 columns of data. Two of the columns have comma separate value entries. I need a macro (or series of macros) or VBA script that can automatically duplicate the existing rows of data yet only have a single entry for each such comma separated value entry.

So today I have in a single row, columns A-D:

  A      B        C                       D
John | Smith | Virginia | Apples, Bananas, Grapes, Mangoes

And I want:

  A      B        C         D
John | Smith | Virginia | Apples  
John | Smith | Virginia | Bananas  
John | Smith | Virginia | Grapes  
John | Smith | Virginia | Mangoes  

I need the macro to be "smart enough" to only create duplicate rows for the number of entries in the CSV cell. So, in my example, I had 4 fruit names. If I had 17 fruit names, I'd want 17 rows, each with a single instance of each fruit. If there are two identical fruit names, that's okay - I can live with two duplicate rows of the same exact fruit name.

Advice on how to accomplish this? I'm tried to parse text to columns but don't know enough about macro programming to do this.

pnuts
  • 58,317
  • 11
  • 87
  • 139
user1724444
  • 21
  • 1
  • 2
  • 3
    You can use `Split(fruitValue,",")` to create an array of fruit names from the cell with multiple values, then loop through that array and copy the line for each value. It will be much faster if you do this using arrays instead of writing the new content cell-by-cell. – Tim Williams Jan 15 '14 at 00:33
  • agree with Tim. once you have the array, get the `Ubound` and insert the same rows. then pass the array to range in one go, filling down the contents in `A,B,C`. – L42 Jan 15 '14 at 00:54
  • You should store the splited rows on a new sheet so you can confirm changes, and easier for row tracking. Otherwise, you will have hard time calculating the row to work on, unless you start from bottom and insert rows shifting data down. – PatricK Jan 15 '14 at 01:24
  • As an example of Tims suggestion, look at this. http://stackoverflow.com/questions/8560718/split-comma-separated-entries-to-new-rows/8561481#856148 – brettdj Jan 15 '14 at 01:24
  • What's not clear about this request is if `John|Smith|Virginia` changes while going down. The approach to that will be twofold--one involving the split, and one involving groupings. Still, though... *arrays*. :D – WGS Jan 15 '14 at 01:33

2 Answers2

1

For kicks, here it is with the de-duping

Converts data from A:D to E:H

enter image description here

Sub SliceNDice()
    Dim objRegex As Object
    Dim X
    Dim Y
    Dim lngRow As Long
    Dim lngCnt As Long
    Dim tempArr() As String
    Dim strArr
    Set objRegex = CreateObject("vbscript.regexp")
    objRegex.Pattern = "^\s+(.+?)$"
     'Define the range to be analysed
    X = Range([a1], Cells(Rows.Count, "d").End(xlUp)).Value2
    ReDim Y(1 To 4, 1 To 1000)
    For lngRow = 1 To UBound(X, 1)
         'Split each string by ","
        tempArr = Split(X(lngRow, 4), ",")
        For Each strArr In tempArr
            lngCnt = lngCnt + 1
             'Add another 1000 records to resorted array every 1000 records
            If lngCnt Mod 1000 = 0 Then ReDim Preserve Y(1 To 4, 1 To lngCnt + 1000)
            Y(1, lngCnt) = X(lngRow, 1)
            Y(2, lngCnt) = X(lngRow, 2)
            Y(3, lngCnt) = X(lngRow, 3)
            Y(4, lngCnt) = objRegex.Replace(strArr, "$1")
        Next
    Next lngRow
     'Dump the re-ordered range to columns E:H
    [e1].Resize(lngCnt, 4).Value2 = Application.Transpose(Y)
    ActiveSheet.Range("E:H").RemoveDuplicates Columns:=Array(1, 2, 3, 4), _
        Header:=xlNo
End Sub
brettdj
  • 54,857
  • 16
  • 114
  • 177
1

Not for points.

Since I have some time on my hands, I want to demo what the others above are saying. However, I'll add a little bit more. Note however, that @brettdj's code is much better than this, but at least this is quite simpler, if altogether not that equipped to solve 100,000 rows (that, I personally leave to you).

The logic:

  1. We split the string using , as a delimiter. We store the result into an array.
  2. We invoke a dictionary and use it to store unique values only. We trim the strings in the array as well.
  3. We then use very simple movements to copy your row a number of times equal to the number of unique fruits now stored in our dictionary. This will give us enough space to post down our new list of fruits.
  4. We transpose the dictionary contents into the resized original location.

Code:

Sub FruitNinja()

    Dim FrootWhere As Range, Dict As Object
    Dim Frooty As String, Froots() As String

    Set FrootWhere = Range("D1")

    Frooty = FrootWhere.Value
    Froots = Split(Frooty, ",")

    Set Dict = CreateObject("Scripting.Dictionary")

    For i = LBound(Froots) To UBound(Froots)
        If Not Dict.Exists(Froots(i)) Then
            Dict.Add Trim(Froots(i)), Empty
        End If
    Next i

    FrootWhere.EntireRow.Copy
    Cells(FrootWhere.Row + 1, 1).Resize(Dict.Count - 1, 1).EntireRow.Insert
    FrootWhere.Resize(Dict.Count, 1).Value = Application.Transpose(Dict.Keys)

    Set FrootWhere = Nothing
    Set Dict = Nothing
    Application.CutCopyMode = False

End Sub

Set-up:

enter image description here

Result:

enter image description here

The concept of my approach is actually very simple. The way I'll do it given your data, if not using the better answer above, is to pass in a range to this sub, for how many relevant ranges you have. Basically, I'll be calling this from another sub.

The upside of this code is that it's pretty easy to check, debug, modify, and manipulate. The downside to this is that it'll be slow versus a large number of rows, it can be error prone in the weirdest of ways, and that it's hard to maintain versus a large number of conditions.

Hope this helps you. :)

WGS
  • 13,969
  • 4
  • 48
  • 51