-1

In USQL I am reading csv file. It is telemetry data, so csv file may have any bad data. I am reading that csv file and creating JObject in USQL. I want to replace invalid characters from all values of json objects, so that my script will not fail because of bad data. I want to generate json output.

I am thinking to replace "\\" and "\"" with blank before creating JObject. Please let me know whether this is correct way to remove bad data. If anyone has better solution, let me know. Also apart from "\" and "\"" do I need to remove any other characters. My Usql code is mentioned below -

@Data=
SELECT new JObject(
                        new JProperty("Name", Name),                                                  
                        new JProperty("Description", Description)                      
                   ).ToString() AS Document
FROM @InputData; 
teo van kot
  • 12,350
  • 10
  • 38
  • 70
  • personally, I would read the csv into a data table and then serialize the datatable into a json object. By reading it into your datatable you can then validate each cell as you read it and use a regex to remove what you class as an illegal character .If you want a full code example I can make one for you shortly as I am not directly free right now. – Simon Price Sep 26 '17 at 09:53
  • I am mainly looking for USQL or C# solution – Mahesh Dounde Sep 26 '17 at 10:19
  • Can you provide some sample data sort of how it looks before and precisely how you want it to look after? – wBob Sep 26 '17 at 10:46
  • If Name has string "test\"ing", then using my code above it will give you parsing error. In that case I want to remove \" from that string which may cause issue. Like \" I want to know all other characters (invalid characters) which may case issue and replace them with blank. Mainly I am looking into values of all columns ex. name and description bacause, I have hard coded name of properties while creating JObject so name of properties will not have any issue. I just want to validate all values and correct them by replacing bad characters with blank. – Mahesh Dounde Sep 27 '17 at 08:16

1 Answers1

0

This is how I would do it. This will build up your DataTable and then using JSON.Net, this will serialize it all for you, you just need to build a regex expression to find and replace what you want to remove.

void Main()
{
    var dt = CSVtoDataTable(@"c:\temp\test.csv");

    foreach (DataRow row in dt.Rows)
    foreach (DataColumn col in dt.Columns)
    {
        var str = row[col].ToString();
        row[col] = RegexReplace(str);
    }

    string json = JsonConvert.SerializeObject(dt);
}

public DataTable CSVtoDataTable(string filepath)
{
    DataSet ds = new DataSet("Temp");

    using (OleDbConnection conn = new OleDbConnection($"Provider=Microsoft.Jet.OleDb.4.0; Data Source = {Path.GetDirectoryName(filepath)}; Extended Properties = \"Text;HDR=YES;FMT=Delimited\""))
    {
        conn.Open();
        OleDbDataAdapter adapter = new OleDbDataAdapter("SELECT * FROM " + Path.GetFileName(filepath), conn);
        adapter.Fill(ds);
    }
    return ds.Tables[0];

}

public string RegexReplace(string s)
{
    return Regex.Replace(s, @"\b[a-z]\w+", "*****");
}
Simon Price
  • 3,011
  • 3
  • 34
  • 98
  • Thanks. Yes we can do like this, but wanted to know all bad characters which may cause issue while creating JObject. Ex. \" in "test\"ing" is bad character. – Mahesh Dounde Sep 27 '17 at 08:19
  • how is the system going to know what your bad characters are unless you use a regex to find them? – Simon Price Sep 27 '17 at 09:33