4

I need to extract data from a file, see code below.

@rows =
EXTRACT booking_date string,
        route string,
        channel string,
        pos string,
        venta string,
        flight_date string,
        ancillary string,
        paxs int?
FROM "/ventas/ventas1.csv"
USING Extractors.Text(delimiter:';', silent:true);

@output =
    SELECT booking_date,
           channel,
           Convert.ToDouble(venta.Replace(",", ".")) AS venta,
           paxs
    FROM @rows;

My problem is that the numbers are in Spanish format, meaning "100,234" instead of "100.234". Does anyone know how to change the format in Extractors.Text, or how transform strings in integers in U-SQL?

frictionlesspulley
  • 11,070
  • 14
  • 66
  • 115
JSI
  • 49
  • 3

1 Answers1

0

Import the column as a string and replace the full stops, eg something like this:

@input =
    EXTRACT venta string,
            paxs int
    FROM @inputFile
    USING Extractors.Text(delimiter : ';');


@output =
    SELECT Convert.ToInt32(venta.Replace(".", "")) AS venta
    FROM @input;
wBob
  • 13,710
  • 3
  • 20
  • 37
  • Hi, would `venta.Replace(".","")` instead be: `Decimal.Parse("100,234".Replace(",", ".")) AS venta` ? Ensuring the comma is replaced with a full stop and converted to decimal? – Alex KeySmith Feb 04 '19 at 01:08
  • Hi Alex, I interpreted their (admittedly) slightly confused note as, the integer data was in the flat file with the full stops, which is the Spanish equivalent of commas, and what they were trying to do was import it as *decimal* and convert it to int. What I have suggested is, import it as *string*, then convert it to int. I've added a `Convert.ToInt32` to the script to make that clear. Make sense? – wBob Feb 04 '19 at 11:52
  • Cool fair enough, I can see how it can be interpreted like that and yours make sense. I've interpreted it as "100,234" should be decimal "100.234", I may have misinterpreted it :-), let's await JSI's feedback. – Alex KeySmith Feb 04 '19 at 17:28
  • 2
    Going from experience, it's unlikely JSI will ever check here again : ) – wBob Feb 04 '19 at 20:51
  • Message Error while evaluating expression Convert.ToDecimal(venta.Replace(",", ".")) – JSI Feb 05 '19 at 00:00
  • Can you please post some of your actual values and expected results? Just a few will do. Thanks. – wBob Feb 05 '19 at 00:02
  • i feel you @wBob :) – DraganB Feb 05 '19 at 09:23
  • Example of the Input FIle – JSI Feb 05 '19 at 10:20
  • booking_Date;route;channel_lvl1;POS;Venta_total;Flight_date;Anc_amount;Paxs 2016-12-20;FCOAGP;MOVIL;MA;75,71;20170101;5,00;1 2018-10-01;FLRMAD;WEB;ES;88,13;20181216;73,49;1 2018-11-10;BCNORY;WEB;US;158,32;20190712;,00;4 2017-08-18;OVDACE;OTA;ES;379,02;20170923;28,97;3 – JSI Feb 05 '19 at 10:28
  • @rows = EXTRACT booking_date string, route string, channel string, pos string, venta string, flight_date string, ancillary string, paxs int? FROM "/ventas/ventas1.csv" USING Extractors.Text(delimiter:';', silent:true); – JSI Feb 05 '19 at 10:31
  • Can you provide expected results please? If I import these values as strings I get four rows and the `paxs` column contains values like `5,00,1`, `73,49,1`, `,00,4`, `28,97,3`. Spanish numbering, as I understand it, uses comma as the decimal separator meaning it is impossible for there to be more than one comma per value as in your example. This looks more like an array with three items in. To put it another way, if I take your value of `28,97,3`, what value do you want that to be? – wBob Feb 05 '19 at 14:39
  • Please tell us what the answer was for you. Please consider answering my questions. Please consider either upvoting the answer or marking it as the answer. – wBob Feb 10 '19 at 21:29