3

It's been quite long I've been struggling to run a custom shell script in an Azure VM. Shell commands are working fine but when I bundle them to a shell script it fails. I have defined the shell script in settings section.

Terraform Code:

resource "azurerm_resource_group" "test" {
  name     = "acctestrg"
  location = "West US"
}

resource "azurerm_virtual_network" "test" {
  name                = "acctvn"
  address_space       = ["10.0.0.0/16"]
  location            = "West US"
  resource_group_name = "${azurerm_resource_group.test.name}"
}

resource "azurerm_subnet" "test" {
  name                 = "acctsub"
  resource_group_name  = "${azurerm_resource_group.test.name}"
  virtual_network_name = "${azurerm_virtual_network.test.name}"
  address_prefix       = "10.0.2.0/24"
}

resource "azurerm_public_ip" "pubip" {
  name                         = "tom-pip"
  location                     = "${azurerm_resource_group.test.location}"
  resource_group_name          = "${azurerm_resource_group.test.name}"
  public_ip_address_allocation = "Dynamic"
  idle_timeout_in_minutes      = 30

  tags {
    environment = "test"
  }
}

resource "azurerm_network_interface" "test" {
  name                = "acctni"
  location            = "West US"
  resource_group_name = "${azurerm_resource_group.test.name}"

  ip_configuration {
    name                          = "testconfiguration1"
    subnet_id                     = "${azurerm_subnet.test.id}"
    private_ip_address_allocation = "dynamic"
    public_ip_address_id          = "${azurerm_public_ip.pubip.id}"
  }
}

resource "azurerm_storage_account" "test" {
  name                     = "mostor"
  resource_group_name      = "${azurerm_resource_group.test.name}"
  location                 = "westus"
  account_tier             = "Standard"
  account_replication_type = "LRS"

  tags {
    environment = "staging"
  }
}

resource "azurerm_storage_container" "test" {
  name                  = "vhds"
  resource_group_name   = "${azurerm_resource_group.test.name}"
  storage_account_name  = "${azurerm_storage_account.test.name}"
  container_access_type = "private"
}

resource "azurerm_virtual_machine" "test" {
  name                  = "acctvm"
  location              = "West US"
  resource_group_name   = "${azurerm_resource_group.test.name}"
  network_interface_ids = ["${azurerm_network_interface.test.id}"]
  vm_size               = "Standard_A0"

  storage_image_reference {
    publisher = "Canonical"
    offer     = "UbuntuServer"
    sku       = "16.04-LTS"
    version   = "latest"
  }

  storage_os_disk {
    name          = "myosdisk1"
    vhd_uri       = "${azurerm_storage_account.test.primary_blob_endpoint}${azurerm_storage_container.test.name}/myosdisk1.vhd"
    caching       = "ReadWrite"
    create_option = "FromImage"
  }

  os_profile {
    computer_name  = "hostname"
    admin_username = "testadmin"
    admin_password = "Password1234!"
  }

  os_profile_linux_config {
    disable_password_authentication = false
  }

  tags {
    environment = "staging"
  }
}

resource "azurerm_virtual_machine_extension" "test" {
  name                 = "hostname"
  location             = "West US"
  resource_group_name  = "${azurerm_resource_group.test.name}"
  virtual_machine_name = "${azurerm_virtual_machine.test.name}"
  publisher            = "Microsoft.OSTCExtensions"
  type                 = "CustomScriptForLinux"
  type_handler_version = "1.2"

  settings = <<SETTINGS
  {
  "fileUris": ["https://sag.blob.core.windows.net/sagcont/install_nginx_ubuntu.sh"],
    "commandToExecute": "sh install_nginx_ubuntu.sh"
  }
SETTINGS

  tags {
    environment = "Production"
  }
}

I've removed any sudo from the commands in the script as Azure runs all commands as root. FYR, the shell script below:

Shell Code:

#!/bin/bash

echo "Running apt update"
apt-get update
echo "Installing nginx"
apt-get install nginx

The error I'm facing is nothing more than a timeout message which is as below:

Error:

azurerm_virtual_machine.test: Creation complete after 3m21s (ID: /subscriptions/b017dff9-5685-4a83-80d3-...crosoft.Compute/virtualMachines/acctvm)
azurerm_virtual_machine_extension.test: Creating...
  location:             "" => "westus"
  name:                 "" => "hostname"
  publisher:            "" => "Microsoft.OSTCExtensions"
  resource_group_name:  "" => "acctestrg"
  settings:             "" => "  {\n  \"fileUris\": [\"https://sag.blob.core.windows.net/sagcont/install_nginx_ubuntu.sh\"],\n\t\"commandToExecute\": \"sh install_nginx_ubuntu.sh\"\n  }\n"
  tags.%:               "" => "1"
  tags.environment:     "" => "Production"
  type:                 "" => "CustomScriptForLinux"
  type_handler_version: "" => "1.2"
  virtual_machine_name: "" => "acctvm"
azurerm_virtual_machine_extension.test: Still creating... (10s elapsed)
azurerm_virtual_machine_extension.test: Still creating... (20s elapsed)
azurerm_virtual_machine_extension.test: Still creating... (30s elapsed)
azurerm_virtual_machine_extension.test: Still creating... (40s elapsed)
azurerm_virtual_machine_extension.test: Still creating... (50s elapsed)
azurerm_virtual_machine_extension.test: Still creating... (1m0s elapsed)

Error: Error applying plan:

1 error(s) occurred:

* azurerm_virtual_machine_extension.test: 1 error(s) occurred:

* azurerm_virtual_machine_extension.test: compute.VirtualMachineExtensionsClient#CreateOrUpdate: Failure sending request: StatusCode=200 -- Original Error: Long running operation terminated with status 'Failed': Code="VMExtensionProvisioningError" Message="VM has reported a failure when processing extension 'hostname'. Error message: \"Malformed status file [ExtensionError] Invalid status/status: failed\"."

Terraform does not automatically rollback in the face of errors.
Instead, your Terraform state file has been partially updated with
any resources that successfully completed. Please address the error
above and apply again to incrementally change your infrastructure.

I can confirm that the script is accessible to everyone as I can download it with wget. Not sure what's wrong. Have digged a lot over the web but everywhere I ended up finding an open bug or issue. Also, there's not much content available for Azure with Terraform. Any help is appreciated !

Shui shengbao
  • 18,746
  • 3
  • 27
  • 45
jagatjyoti
  • 699
  • 3
  • 10
  • 29

2 Answers2

5

Yes, you need -y in your script.

apt-get install nginx -y

When you execute Azure custom script extension, the script should be automatic, could not wait for manual input of parameters.

In your script, if you don't add -y, the script hangs and wait your input yes. Azure custom script extension wait for minutes then you get time out error.

Update from comment:

I was unable to find the location where the tar/script will be downloaded. Please can you throw some light here.

All the execution output and error of the scripts are logged into the download directory of the scripts /var/lib/waagent//download//, and the tail of the output is logged into the log directory specified in HandlerEnvironment.json and reported back to Azure

The operation log of the extension is /var/log/azure///extension.log file.

More information about this see this link.

Shui shengbao
  • 18,746
  • 3
  • 27
  • 45
  • can you please suggest a way wherein I can pull the whole tar from storage account and begin a build using gradle on the VM using Terraform ? – jagatjyoti Jan 19 '18 at 09:51
  • @J.Mishra If my understanding is right, in your script you could use `wget ` to download it and `tar xvf <.tar>`. – Shui shengbao Jan 19 '18 at 09:54
  • Store tar file and your script in the same location is a good idea. – Shui shengbao Jan 19 '18 at 09:54
  • Store your script file in the tar file, it is not possible. – Shui shengbao Jan 19 '18 at 09:57
  • Right now I'm storing the scripts in Blob inside a container, likewise I can keep the tar and script in the same container and access them. Please correct me if I'm wrong. Does Microsoft provide any debug tools for extensions where I just login to the machine and find out where my script failed ? If yes, how can I resume the script execution from there ? I was unable to find the location where the tar/script will be downloaded. Please can you throw some light here. – jagatjyoti Jan 19 '18 at 10:00
  • @J.Mishra This is right. The download path is `/var/lib/waagent/custom-script/download/0/` the log path is `/var/log/azure/custom-script/handler.log`. See this link https://learn.microsoft.com/en-us/azure/virtual-machines/linux/extensions-customscript#troubleshooting – Shui shengbao Jan 19 '18 at 10:02
  • You are correct except the directory in between has changed to `Microsoft.OSTCExtensions.CustomScriptForLinux-1.2.2.0` from `custom-script`. The old thing is also reflected in the docs you provided earlier. Many a times I hit shabby documentation with Azure. – jagatjyoti Jan 19 '18 at 10:36
  • Thanks. If you check '/var/log/waggent.log' you will find the path. – Shui shengbao Jan 19 '18 at 10:44
  • Sorry, debug for this [link](https://github.com/Azure/azure-linux-extensions/tree/master/CustomScript). I update my answer. – Shui shengbao Jan 22 '18 at 01:51
0

It looks like the problem is in your script and not in the terraform file per se

Problem

When you run your install_nginx_ubuntu.sh script in an Ubuntu VM, this is the output that's happening on the box (just showing the last part):

0 upgraded, 14 newly installed, 0 to remove and 162 not upgraded.
Need to get 3,000 kB of archives.
After this operation, 9,783 kB of additional disk space will be used.
Do you want to continue? [Y/n]

So Terraform is simply just waiting for user input which is causing the process to timeout.

Solution

The solution is simply to automatically approve the installation of the linux package, which should be familiar to linux users. So change the following in install_nginx_ubuntu.sh

apt-get install nginx -y

Possible Lessons to be Learnt Over and Above the Question

You might want to check out how to debug Terraform. I feel that if you at least saw some more verbose feedback then you would have been in a position to figure stuff out.

Shiraaz.M
  • 3,073
  • 2
  • 24
  • 40