0

Background

I am not a programmer or technical person I have a project where I need to convert a large text file to an access database. The text file is not in traditional flat file format so I need some help pre processing. The files are large (millions of records) between 100MB and 1GB and seem to be choking all of the editors I have tried (word pad, notepad, vim, em editor)

The following is a sample of the source text file:

product/productId:B000H9LE4U

product/title: Copper 122-H04 Hard Drawn Round Tubing, ASTM B75, 1/2" OD, 0.436" ID, 0.032" Wall, 96" Length

product/price: 22.14

review/userId: ABWHUEYK6JTPP

review/profileName: Robert Campbell

review/helpfulness: 0/0

review/score: 1.0

review/time: 1339113600review/summary: Either 1 or 5 Stars. Depends on how you look at it.

review/text: Either 1 or 5 Stars. Depends on how you look at it.1 Star because they sent 6 feet of 2" OD copper pipe.0 Star because they won't accept returns on it.5 stars because I figure it's actually worth $12-15/foot and since they won't take a return I figure I can sell it and make $40-50 on this deal

product/productId: B000LDNH8I

product/title: Bacharach 0012-7012 Sling Psychrometer, 25?F to 120?F, red spirit filled

product/price: 84.99

review/userId: A19Y7ZIICAKM48

review/profileName: T Foley "computer guy"

review/helpfulness: 3/3

review/score: 5.0

review/time: 1248307200

review/summary: I recommend this Sling Psychrometer

review/text: Not too much to say. This instrument is well built, accurate (compared) to a known good source. It's easy to use, has great instructions if you haven't used one before and stores compactly.I compared prices before I purchased and this is a good value.

Each line represents a specific attribute of a product, starting at "product/productId:"

What I need

I need to convert this file to a character delimited field (i think @ symbol work) by stripping out each of the codes (i.e. product/productId:, product/title:, etc and replacing with the @ and replacing the line feeds.

I want to eliminate the review/text: line

The output would look like this:

B000H9LE4U@Copper 122-H04 Hard Drawn Round Tubing, ASTM B75, 1/2" OD, 0.436" ID, 0.032" Wall, 96" Length@22.14@ABWHUEYK6JTPP@Robert Campbell@0/0@1.0@1339113600@Either 1 or 5 Stars. Depends on how you look at it.

B000LDNH8I@Bacharach 0012-7012 Sling Psychrometer, 25?F to 120?F, red spirit filled@84.99@A19Y7ZIICAKM48@T Foley "computer guy"@3/3@5.0@1248307200@I recommend this Sling Psychrometer

B000LDNH8I@Bacharach 0012-7012 Sling Psychrometer, 25?F to 120?F, red spirit filled@84.99@A3683PMJPFMAAS@Spencer L. Cullen@1/1@5.0@1335398400@A very useful tool

I now would have a flat file delimited with "@" that I can easily import into access.

Sorry for the ramble. I am open to suggestions, but don't understand programming enough to write using the editor language. Thanks in advance

PaulFrancis
  • 5,748
  • 1
  • 19
  • 36
SDG
  • 1
  • 1
  • 1

2 Answers2

0

I'm not sure I understand how you want to map pf your textfile to data base fields.

That's the first thing you need to decide.

Once you've done that I'd suggest putting your text file into columns corresponding to the database columns. Then you should be able to import it into Access.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
  • Thanks. That's the part I am having problems with. The editor I am using chokes on the file when I manually try to search and replace text strings and remove line feeds. I thought there might be some way to write a simple program (like python, or use see, but I don;t know how to do it) Was looking for some guidance. – SDG Apr 09 '15 at 20:01
0

This is a method I just put together and it comes with no guarantee. It reads the data (you have provided as sample) and displays in the right format as you need.

Public Sub ReadFileAndSave(filePath As String, breakIdentity As String, Optional sepStr As String = "@")
'******************************************************************************
'   Opens a large TXT File, reads the data until EOF on the Source,
'       then reformats the data to be saved on the Destination
'   Arguments:
'   ``````````
'       1. The Source File Path - "C:\Users\SO\FileName.Txt" (or) D:\Data.txt
'       2. The element used to identify new row - "new row" (or) "-" (or) "sam"
'       3. (Optional) Separator - The separator, you wish to use. Defauls to '@'
'*******************************************************************************
    Dim newFilePath As String, strIn As String, tmpStr As String, lineCtr As Long

    'The Destination file is stored in the same drive with a suffix to the source file name
    newFilePath = Replace(filePath, ".txt", "-ReFormatted.txt")

    'Open the SOURCE file for Read.
    Open filePath For Input As #1

    'Open/Create the DESTINATION file for Write.
    Open newFilePath For Output As #2

    'Loop the SOURCE till the last line.
    Do While Not EOF(1)
        'Read one line at a time.
        Line Input #1, strIn
        'If it is a blank/empty line SKIP.
        If Len(strIn) > 1 Then
            lineCtr = lineCtr + 1
            'Create a String of the same ID.
            tmpStr = tmpStr & Trim(Mid(strIn, InStr(strIn, ":") + 1)) & sepStr
            'If a new row needs to be inserted, the BREAK IDENTITY is analyzed.
            If InStr(strIn, breakIdentity) <> 0 And lineCtr > 1 Then
                'Once the new row is triggered, dump the line in the Destination.
                Print #2, Left(tmpStr, Len(tmpStr) - Len(Mid(strIn, InStr(strIn, ":") + 1)) - 1) & vbCrLf
                'Prepare the NEXT ROW
                tmpStr = Trim(Mid(strIn, InStr(strIn, ":") + 1)) & sepStr
            End If
        End If
    Loop

    'Print the last line
    Print #2, Left(tmpStr, Len(tmpStr) - 1) & vbCrLf

    'Close the files.
    Close #1
    Close #2
End Sub

Again, this code works on my system and I have not tested the bulk of the matter, so it might be slower in yours. Let me know if this works alright for you.

PaulFrancis
  • 5,748
  • 1
  • 19
  • 36