3

I have custom extractor, and I'm trying to log some messages from it.

I've tried obvious things like Console.WriteLine, but cannot find where output is. However, I found some system logs in adl://<my_DLS>.azuredatalakestore.net/system/jobservice/jobs/Usql/.../<my_job_id>/.

How can I log something? Is it possible to specify log file somewhere on Data Lake Store or Blob Storage Account?

arghtype
  • 4,376
  • 11
  • 45
  • 60

2 Answers2

6

A recent release of U-SQL has added diagnostic logging for UDOs. See the release notes here.

// Enable the diagnostics preview feature
SET @@FeaturePreviews = "DIAGNOSTICS:ON";


// Extract as one column
@input =
    EXTRACT col string
    FROM "/input/input42.txt"
    USING new Utilities.MyExtractor();


@output =
    SELECT *
    FROM @input;


// Output the file
OUTPUT @output
TO "/output/output.txt"
USING Outputters.Tsv(quoting : false);

This was my diagnostic line from the UDO:

Microsoft.Analytics.Diagnostics.DiagnosticStream.WriteLine(System.String.Format("Concatenations done: {0}", i));

This is the whole UDO:

using System.Collections.Generic;
using System.IO;
using System.Text;
using Microsoft.Analytics.Interfaces;

namespace Utilities
{
    [SqlUserDefinedExtractor(AtomicFileProcessing = true)]
    public class MyExtractor : IExtractor
    {
        //Contains the row
        private readonly Encoding _encoding;
        private readonly byte[] _row_delim;
        private readonly char _col_delim;

        public MyExtractor()
        {
            _encoding = Encoding.UTF8;
            _row_delim = _encoding.GetBytes("\n\n");
            _col_delim = '|';
        }

        public override IEnumerable<IRow> Extract(IUnstructuredReader input, IUpdatableRow output)
        {
            string s = string.Empty;
            string x = string.Empty;
            int i = 0;

            foreach (var current in input.Split(_row_delim))
            {
                using (System.IO.StreamReader streamReader = new StreamReader(current, this._encoding))
                {
                    while ((s = streamReader.ReadLine()) != null)
                    {
                        //Strip any line feeds
                        //s = s.Replace("/n", "");

                        // Concatenate the lines
                        x += s;
                        i += 1;

                    }

                    Microsoft.Analytics.Diagnostics.DiagnosticStream.WriteLine(System.String.Format("Concatenations done: {0}", i));

                    //Create the output
                    output.Set<string>(0, x);
                    yield return output.AsReadOnly();

                    // Reset
                    x = string.Empty;

                }
            }
        }
    }
}

And these were my results found in the following directory:

/system/jobservice/jobs/Usql/2017/10/20.../diagnosticstreams

diagnostic output

wBob
  • 13,710
  • 3
  • 20
  • 37
  • Thanks, that's exactly what I was looking for! I didn't find a way to change destination file, but that's ok for now. – arghtype Oct 20 '17 at 23:13
  • Can you please file a feature request with the motivation why you may want to change the destination file at http://aka.ms/adlfeedback? – Michael Rys Oct 31 '17 at 09:57
  • Is there any way to log directly from USQL without a code-behind? – jatal Nov 01 '17 at 16:29
2

good question. I have been asking myself the same thing. This is theoretical, but I think it would work (I'll updated if I find differently).

One very hacky way is that you could insert rows into a table with your log messages as a string column. Then you can select those out and filter based on some log_producer_id column. You also get the benefit of logging if part of the script works, but later parts do not assuming the failure does not roll back. Table can be dumped at end as well to file.

For the error cases, you can use the Job Manager in ADLA to open the job graph and then view the job output. The errors often have detailed information for data-related errors (e.g. row number in file with error and a octal/hex/ascii dump of the row with issue marked with ###).

Hope this helps,

J

ps. This isn't a comment or an answer really, since I don't have working code. Please provide feedback if the above ideas are wrong.

jatal
  • 790
  • 1
  • 10
  • 19
  • Thanks, this should work, we've already considered this. But it looks more like some hack, I prefer more natural solution. – arghtype Oct 18 '17 at 18:27
  • totally agree, this is quite hacky. I asked Michael Rys (MSoft ADLA/UDO Guru) about this in the past, but I bet he'd have the current best practice at hand. Would be great if we had some ability to log to an output file without output statements... e.g. something like daemon/nohup 2>&1 from the ADLA Job wrapper to an auto-generated Console.out file that is present in the job output. – jatal Oct 18 '17 at 18:35
  • 1
    As Bob mentions in his answer, we do have a DiagnosticStream capability now. The tools team is working on adding support for it in the tooling as well. – Michael Rys Oct 31 '17 at 09:56