0

I've been following the documentation on dotnet spark to get started with the library on Windows. This guide can be found:

On the GitHub: https://github.com/dotnet/spark/blob/master/docs/getting-started/windows-instructions.md

On Microsoft documentation: https://learn.microsoft.com/en-us/dotnet/spark/tutorials/get-started

I can't seem to create a spark session with C#. I have spark installed on command line and can run it inside of command line. Here's the code I've been using, same as the guide.

using Microsoft.Spark.Sql;

namespace HelloSpark
{
    class Program
    {
        static void Main(string[] args)
        {
            var spark = SparkSession.Builder().GetOrCreate();
            var df = spark.Read().Json("people.json");
            df.Show();
        }
    }
}

When I run the program inside of Visual Studio, I get the error:

System.Net.Internals.SocketExceptionFactory.ExtendedSocketException: 
'No connection could be made because the target machine actively refused it 127.0.0.1:5567'

On the line:

var spark = SparkSession.Builder().GetOrCreate();
Michael Rys
  • 6,684
  • 15
  • 23
  • from console make sure you have a instantiated spark instance. if spark environment defined in system environment settings type spark-shell and instantiate spark and run your code again. – oetzi Sep 10 '19 at 15:02
  • `SPARK_HOME` is defined in environment variables as well as `HADOOP_HOME`, and I have started the spark-shell before running the code. Same error. – amherst_tratliff Sep 10 '19 at 15:58
  • i think visual studio is not letting you to connect from debugger. pass json file with args and try calling your exe with spark-submit as here spark-submit \ --class org.apache.spark.deploy.dotnet.DotnetRunner \ --master local \ \ debug than, you may attach exe with visual studio debugger In this debug mode, DotnetRunner does not launch the .NET application, but waits for it to connect. Leave this command prompt window open. Now you can run your .NET application with any debugger to debug your application. – oetzi Sep 10 '19 at 17:19
  • ref: https://github.com/dotnet/spark/blob/master/docs/developer-guide.md#debugging-spark-net-application – oetzi Sep 10 '19 at 17:20

2 Answers2

1

I ended up reinstalling the package (Microsoft.Spark) and then running the debug command given here in powershell. After running the command I was able to attach the Visual Studio debugger to that process, and could successfully create a spark session with the C# code.

1

You are trying to debug your solution directly from visual studio. You need to create a deployment environment. In order to do that you need to launch as it says on the section "Run your .NET for Apache Spark app" from the second link that you posted yourself.

Open powershell and type:

spark-submit `
--class org.apache.spark.deploy.dotnet.DotnetRunner `
--master local `
microsoft-spark-2.4.x-<version>.jar `
dotnet HelloSpark.dll

If you just need to debug, type:

spark-submit `
--class org.apache.spark.deploy.dotnet.DotnetRunner `
--master local `
microsoft-spark-2.4.x-<version>.jar
debug

And work from Visual studio as usual.

Wilmer E. Henao
  • 4,094
  • 2
  • 31
  • 39