0

I have a cluster with nodes that are windows based. I followed this Using SMB CSI Driver on Amazon EKS Windows nodes | Microsoft Workloads on AWS but when I deployed the Windows pod (step 5.6), the pods are in pending state. This is the Warning I got:

Reason: FailedMount

From: kublet

Message:

  1. MountVolume.MountDevice failed for volume "pv-smb" : rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial unix C:\\var\\lib\\kubelet\\plugins\\smb.csi.k8s.io\\csi.sock: connect: No connection could be made because the target machine actively refused it."

  2. Unable to attach or mount volumes: unmounted volumes=[smb], unattached volumes=[smb kube-api-access-5v5p6]: timed out waiting for the condition

I would appreciate if anyone would help me out on this. Thank you :)

EDIT: After checking the connectivity and security group which was fixed, ended up with another error:

MountVolume.MountDevice failed for volume "pv-smb" : rpc error: code = Internal desc = volume(FSx_id) mount "//Fsx_id.AD_DNS_name/share" on "\var\lib\kubelet\plugins\kubernetes.io\csi\smb.csi.k8s.io\da35e2ac08d4bd6b3f917c217d32fc33bb4c2b87b9068efb5845c8eb666d8d5d\globalmount" failed with NewSmbGlobalMapping(\Fsx_id.AD_DNS_name\share, c:\var\lib\kubelet\plugins\kubernetes.io\csi\smb.csi.k8s.io\da35e2ac08d4bd6b3f917c217d32fc33bb4c2b87b9068efb5845c8eb666d8d5d\globalmount) failed with error: rpc error: code = Unknown desc = NewSmbGlobalMapping failed. output: "New-SmbGlobalMapping : The network path was not found. \r\nAt line:1 char:190\r\n+ ... ser, $PWord;New-SmbGlobalMapping -RemotePath $Env:smbremotepath -Cred ...\r\n+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\r\n + CategoryInfo : NotSpecified: (MSFT_SmbGlobalMapping:ROOT/Microsoft/...mbGlobalMapping) [New-SmbGlobalMa \r\n pping], CimException\r\n + FullyQualifiedErrorId : Windows System Error 53,New-SmbGlobalMapping\r\n \r\n", err: exit status 1

1 Answers1

1

Some possible reasons:

- missing driver: C:\var\lib\kubelet\plugins\smb.csi.k8s.io\csi.sock exists on the Windows nodes. You can SSH into the Windows nodes and check if the file is present. If it's missing, it indicates an issue with the CSI driver installation.

- network connection, firewall, security group issues: Test the connectivity to the SMB share from a Windows node. You cand use tools like Test-NetConnection: Test-NetConnection -Port .

As far as I understand from the error message it is probably security groups and network access issue.

If you already tested these and checked security rules, please provide more details to troubleshoot.

--- After Edit of Question Above ---

  • Verify FSx access permissions
  • Check credentials and authentication: Ensure that the credentials used to access the FSx file system are valid and have the necessary permissions.
  • Review the SMB configuration: Double-check the configuration parameters for the SMB (Server Message Block) mount. Make sure the share path is accurate, and all necessary SMB-related settings, such as authentication methods and access control, are properly configured.
  • check this tutorial for step by step instructions to see if you are missing something. https://aws.amazon.com/blogs/storage/accessing-smb-file-shares-remotely-with-amazon-fsx-for-windows-file-server/
Fuat Ulugay
  • 521
  • 2
  • 14
  • I checked my Windows nodes and I can see the driver is present so the problem will be mostly with the second point you mentioned. I am new to this so could you please let me know what I should do? How do I check the connectivity to the SMB share? How do I get the details of the SMB share that I need to check the connectivity to? Also, what should I make sure to be there in the security group for this to be working? – Aadit Unni May 26 '23 at 06:44
  • On nodes run command Test-NetConnection. The smb ports are 139 and 445. Example command `Test-NetConnection ipaddress -Port 445` Run the command in powershell to see if the connection works. For Security group try to create or use existing security group assign it to FSx service and allow traffic also from the security group of nodes. This may solve the issue if it is related to security groups. Please check also this link https://docs.aws.amazon.com/fsx/latest/WindowsGuide/limit-access-security-groups.html – Fuat Ulugay May 26 '23 at 08:15
  • I checked and solved the issue. Could test the connection and security group is sorted but then I am getting a new error: (posting as edit to my post) – Aadit Unni May 26 '23 at 09:33
  • I am adding more details to answer check there please. – Fuat Ulugay May 27 '23 at 09:45
  • Hey, I figured out the problem, although I haven't reached the stage I want to. So the issue was with the Volume attribute source in PV. I put in directly the IP of my FSX to fix. Another issue is with the container image I am trying to use to test the mount. I get: Error: failed to create containerd container: rootpath on mountPath C:\Windows\TEMP\ctd-volume1919878911\755, volume c:/drived: CreateFile C:\Windows\TEMP\ctd-volume1919878911\755\c:: The filename, directory name, or volume label syntax is incorrect. Image: windowsimagemicrosoft/sdk-3.5-windowsservercore-ltsc2019:20200824 – Aadit Unni May 29 '23 at 10:17