1

I am trying to read data from a text file using excel/VBA. However when i import the file all the words are in the same column. The text is generated from PDF while preserving the table layout. So the words are space delimited, but the spacing is not consistent. I want code to run through cells and separate the words. Two things are true however for the cells

  1. single words have one spacing
  2. words are separated by two or more spaces

Screenshot 1 Screenshot 2

Uri Goren
  • 13,386
  • 6
  • 58
  • 110

3 Answers3

1

This code takes the first 100 cells in column A, split their content by space, and pastes it to column B

Dim A_row As Integer, B_row as Integer, i As Integer, words()
For A_row = 1 To 100' Last row t o consider
    words = Split(Range("A" & A_row), " ")
    For i = LBound(words) To UBound(words)
        B_row = B_row + 1
        Range("B" & B_row) = words(i)
    Next i
Next A_row

I'm sure you could get the gist, and modify it to your needs

Uri Goren
  • 13,386
  • 6
  • 58
  • 110
  • Thanks @uri, I tried this but it says type mismatch error at 'Split' part of code – isomericharsh Apr 15 '19 at 09:25
  • `code` Dim A_row As Integer, B_row As Integer, i As Integer Dim words As Variant For A_row = 1 To 2 ' Last row t o consider words = Split(Range("A" & A_row), " ") For i = LBound(words) To UBound(words) B_row = B_row + 1 Range("B" & B_row) = words(i) Next i Next A_row `Code` Change words as variant and used double space instead single space. It does the job. – isomericharsh Apr 15 '19 at 09:46
1

I understand that the core challenge is to split by 2+ spaces, but not by one.

Try if this helps you on that:

Const marker As String = "[!°$(])"
Dim rx, s As String, t As String, parts
Set rx = CreateObject("vbscript.regexp")

s = "One Cell   Red  Green"

rx.Pattern = " {2,}" ' match two or more spaces
rx.Global = True ' find all, not only the first match
t = rx.Replace(s, marker)

parts = Split(t, marker)
MsgBox Join(parts, vbCrLf)
KekuSemau
  • 6,830
  • 4
  • 24
  • 34
  • Thanks @kekusemau modified the code as below `code` s = Range("A" & A_row) rx.Pattern = " {2,}" ' match two or more spaces rx.Global = True ' find all, not only the first match t = rx.Replace(s, marker) Range("A" & A_row).Value = t parts = Split(t, marker) `Code` It replaces the original cell value with 2 or more space replaced with the string constant. But how do we get the values to split in subsequent columns? Much appreciate your help – isomericharsh Apr 15 '19 at 10:09
  • Can you help me with a variant in above. If i want to replace single space between words to Underscore. For example, if a cell has value "Total outstanding Amount Name of Company" the regex replace should result in "total_outstanding_amount Name_of_company". I can find the pattern using rx.Pattern = "(\w)+ (?=\w)", how to substitute, right now i tried using t = rx.Replace(s, "_").. which results in "l_g_amount e_f_company" – isomericharsh May 06 '19 at 08:52
1

Thanks @Uri Goren @Kenusemau. Posting Answer for anyone else looking for the same issue.

Sub Macro2()
Const marker As String = "#$"
Dim rx, s As String, t As String, parts
Set rx = CreateObject("vbscript.regexp")

For A_row = 1 To 2 ' Last row t o consider
    s = Range("A" & A_row)
        rx.Pattern = " {2,}" ' match two or more spaces
        rx.Global = True ' find all, not only the first match
        t=rx.Replace(s, marker)
        Range("B" & A_row).Value = t
        'parts = Split(t, marker)
        'Range("B" & A_row).Value = Join(parts, vbCrLf)
                Range("B" & A_row).Select
                Selection.TextToColumns _
                Destination:=Range("C" & A_row), _
                DataType:=xlDelimited, _
                TextQualifier:=xlDoubleQuote, _
                ConsecutiveDelimiter:=False, _
                Tab:=True, _
                Semicolon:=False, _
                Comma:=False, _
                Space:=False, _
                Other:=True, _
                OtherChar:="#$"
Next A_row
End Sub