Questions tagged [.net-spark]

Questions pertaining to usage of Apache Spark (and related distributions) in the context of Microsoft's .NET runtime and associated languages such as C# and F#. Feel free to add platform specific and language specific tags as well.

Tag Definition

Tag is used for questions pertaining to usage of Apache Spark (and related distributions) in the context of Microsoft's .NET runtime and associated languages such as C# and F#.

Related platform/code offerings

.NET for Apache Spark is currently an open source offering at the .NET Foundation. See https://github.com/dotnet/spark and https://dot.net/spark for details.

Refinement Usage of Tag

You can refine the tag's usage by adding tags narrowing down the relevant Apache Spark related distribution and services and the specific language(s) relevant to the question.

24 questions
0
votes
1 answer

Time Efficient gap filling data in dataframe using .NET for Spark

I would like to fill gaps in my DataFrame using .NET for Spark. The current DataFrame (rawData) contains data on a minute interval between reportFrom and reportTo DateTime reportFrom = new DateTime(2021, 3, 4, 0, 0, 0); DateTime reportTo = new…
V. J.
  • 35
  • 5
0
votes
1 answer

Spark Dataframe API to Select multiple columns, map them to a fixed set, and Union ALL

I have a CSV source file with this schema defined. ["Name", "Address", "TaxId", "SS Number", "Mobile Number", "Gender", "LastVisited"] From this CSV, these are the operations I need to do: Select a subset of columns, one at a time, and map all of…
Abhay Sibal
  • 129
  • 1
  • 12
0
votes
2 answers

HDInsight/Spark Activity in Azure Data Factory v2 does not have option to specify --files parameter for spark-submit

I have created a HDInsight Cluster (v4, Spark 2.4) in Azure and want to run a Spark.Ne app on this cluster through an Azure Data Factory v2 activity. In the Spark Activity it is possible to specify path to the jar, --class parameter and arguments…
0
votes
1 answer

Cannot use Spark.Net UDFs and HDInsight cluster

I have tried to run a simple application in prod env containing the code from https://github.com/dotnet/spark/blob/master/examples/Microsoft.Spark.CSharp.Examples/Sql/Batch/Basic.cs The applications runs fine and emits output to stdout until it this…
0
votes
1 answer

How to perform distributed combinatorial (N choose K) in Spark .NET?

I have a project where I have a large C(100,20) number of combinations with minor work being done for each combination set. I am using Spark .NET with visual studio as my technology (see setup…
CPGAdmin
  • 29
  • 5
0
votes
2 answers

Is .NET for Apache Spark in Preview?

I have read many articles while exploring Azure Data Factory and Azure Databricks. I stumbled upon a article(https://learn.microsoft.com/en-us/dotnet/spark/how-to-guides/databricks-deploy-methods) where it is mentioned in the notes that .NET for…
0
votes
1 answer

Is there a way to change the export filename using .NET SPARK?

I'm trying to export a Dataframe to a CSV file using .NET SPARK, but my export file has the default name 'part-00000-{GUID}', what i wanted was to manipulate the file's name according to my business rules, ex:'ABC_20200504.csv'. This is my…
0
votes
2 answers

How to correctly instantiate a spark session with dotnet spark?

I've been following the documentation on dotnet spark to get started with the library on Windows. This guide can be found: On the GitHub: https://github.com/dotnet/spark/blob/master/docs/getting-started/windows-instructions.md On Microsoft…
0
votes
1 answer

Method not implemented exception on Take method in Microsoft.Spark

I am trying to setup spark with the new Microsoft.Spark library. The method DataFrame.PrintSchema works fine, however the method DataFrame.Take() gives an System.NotImplementedException. Allot of other methods also give this exception. I took a look…
Jan-Wiebe
  • 61
  • 2
  • 11
1
2