2

I have a U-SQL script that executes successfully from VS Code with my personal credentials, but fails when triggered from a Data Factory pipeline. My personal account has Owner rights on the Azure subscription. ADF uses Service Principal authentication with Data Lake Analytics & Store.

I am using Data Factory V2 and Data Lake Gen1 with the Default Integration Runtime. ADLA Firewall is disabled.

The U-SQL script is very simple, it just reads data from a CSV file and tries to write it in another CSV file. This is the whole script:

@companies = 
    EXTRACT
        Id string,
        Name string
    FROM @InputFile
    USING Extractors.Csv(skipFirstNRows: 1);

OUTPUT @companies
    TO @OutputFile
    USING Outputters.Csv(outputHeader: true);

The parameters InputFile and OutputFile contain the ADL paths to the input and output data. These parameters are passed from Data Factory. The first stage of the script ("Extract") executes successfully, and the graph shows that the error occurs in the "PodAggregate" stage. A similar error occurs if I try to write the output to a managed table instead of a CSV file.

The high level error message in Data Factory is:

Error Id: VertexFailedFast, Error Message: Vertex failed with a fail-fast error.

Data Lake Analytics gives the more detailed error:

E_STORE_USER_ERROR: A user error has been reported when reading or writing data.

Component: STORE

Description: Operation 'Open::Wait' returned error code -2096559454 'Forbidden. ACL verification failed. Either the resource does not exist or the user is not authorized to perform the requested operation.' for stream 'adl://[myadl].azuredatalakestore.net/adla/tmp/8a1495dc-8d80-44b9-a724-f2a0a963b3c8/stack_test/Companies.csv---6F21F973-45B9-46C7-805F-192672C99393-9_0_1.dtf%23N'.

Details:

Stream name 'adl://[myadl].azuredatalakestore.net/adla/tmp/8a1495dc-8d80-44b9-a724-f2a0a963b3c8/stack_test/Companies.csv---6F21F973-45B9-46C7-805F-192672C99393-9_0_1.dtf%23N'

Thu Jul 19 02:35:42 2018: Store user error, Operation:[Open::Wait], ErrorEither the resource does not exist or the current user is not authorized to perform the requested operation 7ffd8c4195b7 ScopeEngine!?ToStringInternal@KeySampleCollection@SSLibV3@ScopeEngine@@AEAA?AV?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@XZ + 11b7
7ffd8c39a96d ScopeEngine!??0ExceptionWithStack@ScopeEngine@@QEAA@W4ErrorNumber@1@AEBV?$initializer_list@VScopeErrorArg@ScopeCommon@@@std@@_N@Z + 13d 7ffd8c3abe3e ScopeEngine!??0DeviceException@ScopeEngine@@QEAA@AEAVBlockDevice@1@AEBV?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@J@Z + 1de
7ffd8c3f8c7b ScopeEngine!?GetTotalIoWaitTime@Statistics@Scanner@ScopeEngine@@QEAA_JXZ + 133b 7ffd8c3f87dc ScopeEngine!?GetTotalIoWaitTime@Statistics@Scanner@ScopeEngine@@QEAA_JXZ + e9c
7ffd9157780d ScopeCodeGenEngine!ScopeEngine::CosmosOutput::IssueWritePage + 4d d:\data\yarnnm\local\usercache\f675bad0-3d48-4f08-9933-d7cb614ec7a8\appcache\application_1531519980045_88416\container_e194_1531519980045_88416_01_000001\wd\scopeio.h line:6063 7ffd9156be7d ScopeCodeGenEngine!ScopeEngine::TextOutputStream,ScopeEngine::CosmosOutput>::Write + 2bd d:\data\yarnnm\local\usercache\f675bad0-3d48-4f08-9933-d7cb614ec7a8\appcache\application_1531519980045_88416\container_e194_1531519980045_88416_01_000001\wd\scopeio.h line:7828 7ffd91579290 ScopeCodeGenEngine!ScopeEngine::TextOutputPolicy::SerializeHeader + 30 d:\data\yarnnm\local\usercache\f675bad0-3d48-4f08-9933-d7cb614ec7a8\appcache\application_1531519980045_88416\container_e194_1531519980045_88416_01_000001\wd__scopecodegenengine__.dll.cpp line:514 7ffd91574ba7 ScopeCodeGenEngine!ScopeEngine::Outputer,ScopeEngine::BinaryInputStream,ScopeEngine::ExecutionStats>,SV1_Extract_out0,ScopeEngine::ScopeUnionAll,ScopeEngine::BinaryInputStream,ScopeEngine::ExecutionStats>,SV1_Extract_out0>,3>,SV1_Extract_out0,1>,SV1_Extract_out0,ScopeEngine::TextOutputPolicy,ScopeEngine::TextOutputStream,ScopeEngine::CosmosOutput>,0,ScopeEngine::ExecutionStats,ScopeEngine::DummyStatsWriter>::DoOutput + 27 d:\data\yarnnm\local\usercache\f675bad0-3d48-4f08-9933-d7cb614ec7a8\appcache\application_1531519980045_88416\container_e194_1531519980045_88416_01_000001\wd\scopeoperators.h line:5713 7ffd91582258 ScopeCodeGenEngine!SV2_PodAggregate_execute + 658 d:\data\yarnnm\local\usercache\f675bad0-3d48-4f08-9933-d7cb614ec7a8\appcache\application_1531519980045_88416\container_e194_1531519980045_88416_01_000001\wd__scopecodegenengine__.dll.cpp line:722 7ffd8c36571d ScopeEngine!??1OutputFileInfo@ScopeEngine@@QEAA@XZ + 60d
7ffd8c397aa0 ScopeEngine!?RunUserCode@Vertex@ScopeEngine@@SA_N_NAEBV?$function@$$A6AXXZ@std@@@Z + 1b0
7ffd8c397a4e ScopeEngine!?RunUserCode@Vertex@ScopeEngine@@SA_N_NAEBV?$function@$$A6AXXZ@std@@@Z + 15e
7ffd8c397915 ScopeEngine!?RunUserCode@Vertex@ScopeEngine@@SA_N_NAEBV?$function@$$A6AXXZ@std@@@Z + 25 7ffd8c365c7f ScopeEngine!??1OutputFileInfo@ScopeEngine@@QEAA@XZ + b6f
7ffd8c3950c4 ScopeEngine!?Execute@Vertex@ScopeEngine@@SA_NAEBVVertexStartupInfo@2@PEAUVertexExecutionInfo@2@@Z + 3f4 7ff731d8ae8d scopehost!(no name) 7ff731d8adbd scopehost!(no name) 7ffd8c4274d9 ScopeEngine!?Execute@VertexHostBase@ScopeEngine@@IEAA_NAEAVVertexStartupInfo@2@@Z + 379 7ff731d8d236 scopehost!(no name) 7ff731d6a966 scopehost!(no name) 7ff731d98dac scopehost!(no name) 7ffd9e4713d2 KERNEL32!BaseThreadInitThunk + 22
7ffd9e5e54e4 ntdll!RtlUserThreadStart + 34

The Service Principal account has Owner permissions on Data Lake Analytics. The SP account also has (default) rwx permissions on the /stack_test subdirectory in Data Lake Store and all its files and children, and x permission on the root directory. The error message seems to say that the SP account is missing permissions on the destination file (/stack_test/Companies.csv), but I can explicitly see that it has rwx on that file. Which permissions am I still missing?

For reference, the script and the Data Factory resources necessary to reproduce this problem can be found at: https://github.com/lehmus/StackQuestions/tree/master/ADF_ADLA_Auth.

Lazer
  • 501
  • 4
  • 14
  • 1
    I can confirm that we have an Data Lake Analytics instance that started throwing the same errors yesterday. It does work if the job is published and executed with a personal account, but the service principle appears to perhaps have lost permissions to the tmp folder? If you get the same error, perhaps there was a service upgrade yesterday? It started between 11:47 (worked) and 12:45 (failed) UTC times. – SondreB Jul 20 '18 at 08:44
  • 1
    Yes, this issue seems to be related to the tmp folder. Apparently the U-SQL jobs have started to use /adla/tmp as their temporary storage. Since my ADF user has only x (default) permission on the root, it does not have write permission on the temporary folder. I can create this folder manually and grant write access to tmp to run the job successfully once. A more permanent fix would be to grant default write access to the root, but this is not a good practice. – Lazer Jul 20 '18 at 11:07
  • @SondreB, did you create a support ticket? – Fang Liu Jul 20 '18 at 13:39
  • @FangLiu We have created a ticket with MS. Seems the change in how ADLA employs tmp folders is unannounced. As a work around, we have temporarily granted write access on root, but are awaiting a better solution from MS. – larsts Jul 21 '18 at 13:38
  • @larsts What is the status of your ticket, has this issue been addressed? – Lazer Oct 19 '18 at 11:48
  • @Lazer The issue has been addressed and the ticket closed. ASFAIK the version introducing the error situation was rolled back by Microsoft (although they were never able to supply a date for when this was rolled back/rectified). We have in any case reverted our work around, having W on root, and everything is running smoothly. – larsts Nov 07 '18 at 11:15

0 Answers0