0

I am working with several large Filemaker databases that contains very dirty data recorded over the course of 20+ years. Unfortunately, every database is essentially a single flat file with uncontrolled text fields with no validation or value lists of any kind. Not surprisingly different users were VERY inconsistent in the way they recorded certain types of data over that long of a time span (i.e., what and how things are entered into fields varies wildly). I need to clean several important fields in the database before I can pull the data into a sensible relational structure.

One field in particular contains long strings of text that are generally delimited by spaces. Is there a way to split records based on the delimiter in this field, duplicating all other fields for each record? For example, Contents is a space delimited field (which creates its own issues, but is beside the point here) that would be the basis for creating individual records. The number of delimited entries in the field varies from one to hundreds.

enter image description here

would become:

enter image description here

Splitting some of the fields into individual records is the only way I'm going to be able to fix a couple of the issues given the massive amount of inconsistency in data entry and the different ways entries will need to be parsed into new fields. Any help would be appreciated.

lambertj
  • 127
  • 1
  • 2
  • 12

2 Answers2

1

There is certainly a way to do it, you would need to write a script for it.

To prepare the data you could do a Replace Field Contents on this field and replace all spaces with a paragraph symbol. That will enable you to use FileMaker’s value functions to extract the values.

The script should contain an outer loop, looping through the found set of records. The records should be sorted, so that new records are inserted sequentially and not at the end of the found set.

For each record, check the relevant field for the number of values using the Valuecount function.

If more than one value, grab the field contents in a variable and loop through the values, duplicating the record for each iteration, replacing the field with the current value index from the variable using the Getvalue function. Decrement the value index at the end of the loop.

Remember to have exit conditions for the loops and always work on a backup when doing such operations.

AndreasT
  • 2,317
  • 1
  • 17
  • 14
  • That general approach makes sense. Any chance you could provide an example of what the script would look like? – lambertj Mar 06 '18 at 17:55
  • I don’t have a ready example, but my description should get you a long way. If you try to set it up, I can help you further with any difficulties you may encounter. – AndreasT Mar 06 '18 at 22:50
0

This was a bit of a workaround, but I eventually found a much easier solution by exporting the Box_id and Contents fields to Excel and running the following VBA code:

Sub SliceNDice()
Dim objRegex As Object
Dim X
Dim Y
Dim lngRow As Long
Dim lngCnt As Long
Dim tempArr() As String
Dim strArr
Set objRegex = CreateObject("vbscript.regexp")
objRegex.Pattern = "^\s+(.+?)$"
 'Define the range to be analysed
X = Range([a1], Cells(Rows.Count, "b").End(xlUp)).Value2
ReDim Y(1 To 2, 1 To 1000)
For lngRow = 1 To UBound(X, 1)
     'Split each string by ","
    tempArr = Split(X(lngRow, 2), " ")
    For Each strArr In tempArr
        lngCnt = lngCnt + 1
         'Add another 1000 records to resorted array every 1000 records
        If lngCnt Mod 1000 = 0 Then ReDim Preserve Y(1 To 2, 1 To lngCnt + 1000)
        Y(1, lngCnt) = X(lngRow, 1)
        Y(2, lngCnt) = objRegex.Replace(strArr, "$1")
    Next
Next lngRow
 'Dump the re-ordered range to columns C:D
[c1].Resize(lngCnt, 2).Value2 = Application.Transpose(Y)
End Sub

This solution was modified from this previous question.

lambertj
  • 127
  • 1
  • 2
  • 12