0

I have a big array of data coming from external aggregating system. The data part that is related to my question is array of strings. Examples (not real ones but quite illustrative):

  1. Model: TOYOTA COROLLA VIN: ABC123 Year: 2012 Color: Black
  2. White KIA RIO of 2013 year, transmission: 4AT
  3. Type:TruckModel:MANYear:2010VIN:QWE123Registration number:AZ12345
  4. 30 cows of Milky breed numbered #137
  5. 25 cows of Shello breed numbered #783

The overall number of strings is nearly 100M. And main purpose of them are to be shown to the users on the web site.

As you can see, all strings contain some patterns of key-value pairs naturally or can be transformed to such form. When aggregator takes this data from another systems, it drops delimiters somehow. I encountered over 20 of such key-value pairs in one string.

The first problem is how to restore delimiters (\r\n) at places where they had been dropped. Another problem is how to replace , with \r\n only where it is real delimiter of key-value pairs and not part of a value. Commas inside value part are not escaped.

These two problems lead to pattern extraction and then replacement via regexes. At first I planned to extract patters by hand, but It is very time consuming and does not cover some edge cases as I experienced.

I look for programmatic solutions for this problems.

Strings are stored in MSSQL table as a part of a larger database. Data processing platform is written in C#.

v.karbovnichy
  • 3,183
  • 2
  • 36
  • 47
  • 1
    Please create a [mcve] to get quick answers to your problems. Regex problems should contain: 1. Input String, 2. Desired Output String, 3. Regex that you tried. – Mohammad Yusuf Feb 04 '17 at 12:28
  • @MYGz I look for ideas rather than implementation. And besides that, it is not single-regex question. – v.karbovnichy Feb 04 '17 at 12:32
  • sure, if you can create a small example from your real data, that would help you more, you will get real solutions for that. Just saying. Because people tend to not invest time in long questions, if you can break down into small problems, you can advance towards your goal quicker. – Mohammad Yusuf Feb 04 '17 at 12:36

0 Answers0