0

I have created Azure webjobs that contains methods for file creation and Appending data to that file on Datalake Store. I am done with all its development part publishing webjobs etc. Now i am going to write unit tests to test whether the data i am sending is successfully appended to file or not All I need to know is how to perform such kind of unit test any idea?

what I currently thought of doing it is by cleaning all the data from my datalake file and then sending a test data to it. so on the basis of one of the column data of the whole data i sent, i will check whether it got appended or not. Is there any way that can give a quick status of whether my test data is written or not?

Note: Actually i want to know how to delete a particular row of a csv file on data lake but i dont want to use usql to search for the required row. (I am not directly sending data to Datalake it is written via Azure service bus queue which then triggers webjobs to append data to a file on datalake.)

2 Answers2

0

Aside from looking at the file, I can see few other choices. If only your unit test is writing to the file, then you can send appends of variable lengths and then see whether the size of the file is updated appropriately as a result of successful appends. You can always read the file and see whether you data made it as well.

Amit Kulkarni
  • 910
  • 4
  • 11
  • But @Amit there is one problem the file in which we are appending data is basically created by webjobs so at a time multiple users will be sending data so if my test case will run in production we can no longer consider this size as a measuring parameter. – Nasima Akhtar Jul 28 '17 at 07:54
  • Yes if you want to test in production where there are other actors appending to the file then you cannot rely on length. Here is an optimum way. (1) query length of the file (2) send append and wait for the stipulated time (3) read new data, i.e. from the offset onward, to see whether your data actually made it to the file. That way you only read a part of the file than all of it. – Amit Kulkarni Jul 31 '17 at 22:23
  • Yes exactly I want to know how i can get that offset and in this case i don't need the length or the size thing. Actually i want to know how to delete a particular row of a csv file on data lake but i dont want to use usql to search for the required row. – Nasima Akhtar Aug 01 '17 at 10:48
  • You cannot delete a row (or any part) from a file. ADLS is an append-only file system. Data once committed cannot be erased or updated. If you're testing in production, your application needs to be aware of test rows and ignore them appropriately. What you can do is read a file at a specific offset. The read APIs in all SDKs have this functionality. That is what I was referring to above. – Amit Kulkarni Aug 01 '17 at 17:36
0

I solved my problem in the way that i Got the length of my file on Datalake store using:

var fileoffset = _adlsFileSystemClient.FileSystem.GetFileStatus(_dlAccountName, "/MyFile.csv").FileStatus.Length;

after getting length i sent my test data to the datalake and after that i again got the length of a file using same code. so the first length i.e before sending test data it was my offset and the length got after sending test data was my destination length i.e from offset to the destination length i read my datalake file using:

Stream Stream1 = _adlsFileSystemClient.FileSystem.Open(_dlAccountName, "/MyFile.csv", totalfileLength, fileoffset);

After getting my data in a stream I tried searching for the test data i sent using following code:

Note:I had a a column of guids in file on the basis of which i search my sent guid in a filestream. make sure to convert your search data to byte and then pass it to the function ReadOneSrch(..).

 static bool ReadOneSrch(Stream fileStream, byte[] mydata)
    {
        int b;
        long i = 0;
        while ((b = fileStream.ReadByte()) != -1)
        {
            if (b == mydata[i++])
            {

                if (i == mydata.Length)
                    return true;

            }
            else
                i = b == mydata[0] ? 1 : 0;
        }

        return false;
    }