Questions tagged [collect]

Use this tag for questions related to the act of gathering/collecting/grouping data/results from several nodes/places/computers to one (or main) resource(s).

Use this tag for questions related to the act of gathering/collecting/grouping data/results from several nodes/places/computers to one (or main) resource(s).

For example, in Distributed Computing, the slaves would do their local computation and eventually, one would like the master to collect the (local) results.

354 questions
2
votes
1 answer

How to collect a list within a list

I have class Stores which contains array of Items. public class Store extends NamedEntity{ String webAddress; Item[] items; public Store(String name, String webAddress, Set items) { super(name); this.webAddress =…
JurajC
  • 99
  • 2
  • 8
2
votes
3 answers

Transformation after grouping operation using Collectors in java

I have a simple practice code where I have a simple list of Person objects. What I wanted to do was to partition them based on their age being an even number of an odd number and then transform their names to upper case if they belonged to the even…
2
votes
1 answer

How to avoid using of collect in Spark RDD in Scala?

I have a List and has to create Map from this for further use, I am using RDD, but with use of collect(), job is failing in cluster. Any help is appreciated. Please help. Below is the sample code from List to rdd.collect. I have to use this Map data…
Anu S
  • 21
  • 6
2
votes
2 answers

SCALA: How to use collect function to get the latest modified entry from a dataframe?

I have a scala dataframe with two columns: id: String updated: Timestamp From this dataframe I just want to get out the latest date, for which I use the following code at the moment: df.agg(max("updated")).head() // returns a row I've just read…
Eve
  • 604
  • 8
  • 26
2
votes
1 answer

pyspark collect_set of column outside of groupby

I am trying to use collect_set to get a list of strings of categorie_names that are NOT part of groupby. My code is from pyspark import SparkContext from pyspark.sql import HiveContext from pyspark.sql import functions as F sc =…
Oscar Foley
  • 6,817
  • 8
  • 57
  • 90
2
votes
1 answer

Getting unique keys of a Map-Column returns different results after every execution

I have a pyspark dataframe with a column of MapType(StringType(), FloatType()) and I would get a list of all keys appearing in the column. For example having this dataframe: +---+--------------------+ | ID| …
olileo
  • 21
  • 2
2
votes
1 answer

Building a list from two observable sources in RxKotlin/RxJava using collectInto

I have a Category data class and a Plan data class. Each Category has a list of plan ids. There are Categories and Plans stored via Room. I am trying to construct a local List where I add each category to a list, and then add each of it's…
Tyler Pfaff
  • 4,900
  • 9
  • 47
  • 62
2
votes
1 answer

Creating an indicator array based on other data frame's column values in PySpark

I have two data frames: df1 +---+-----------------+ |id1| items1| +---+-----------------+ | 0| [B, C, D, E]| | 1| [E, A, C]| | 2| [F, A, E, B]| | 3| [E, G, A]| | 4| [A, C, E, B, D]| +---+-----------------+ and…
carpediem
  • 371
  • 3
  • 11
2
votes
1 answer

Collect metrics from the openstack environment and show it in grafana

Let me define first what is my goal: I want to have pretty Grafana dashboards about our openstack clusters. We have 5 datacenters with around 3-4000 physical machine and 15k vm-s. My task is to create some pretty Grafana dashboards fo mysql things,…
Badb0y
  • 331
  • 2
  • 21
2
votes
3 answers

Is collectingAndThen method enough efficient?

I have recently started using collectingAndThen and found that it is taking a bit long time comparatively to the other coding procedures, which i used for performing the similar tasks. Here is my code: …
KayV
  • 12,987
  • 11
  • 98
  • 148
2
votes
1 answer

Pyspark sparse vector dataframe to scipy.spare without collecting

I have this sparse Spark dataframe: In [50]: data.show() +---------+-------+---------+-------+-------+--------+ | pid| 111516| 387745|1211811|1857606| 2187005| +---------+-------+---------+-------+-------+--------+ | 65197201| 0.0| …
xv70
  • 922
  • 1
  • 12
  • 27
2
votes
1 answer

Aggregate List into HashMap using Stream API

I have a MultivaluedMap and a list of strings, and I would like to see which of those string are keys in the MultivaluedMap. For each of the strings that are keys in the MultivaluedMap, I want to construct a new Thing out of the value of that key,…
marinatedpork
  • 179
  • 1
  • 9
2
votes
1 answer

How to Accumulate/Collect/Sum list?

I'm really new using Python. I need to achieve the following. I have a list [ ['1604201722','16/04/2017','22', 100.0, 10.0, 110.0],
['1604201722','16/04/2017','22', 100.0, 10.0, 110.0],
['1604201719','16/04/2017','19', 100.0, 10.0,…
2
votes
2 answers

How to use collect in Laravel 5.3?

I need use collect in Laravel 5.3 but I need help. For Example: $collection = collect([ 'Apple' => [ ['name' => 'iPhone 6S', 'price' => '200'], ['name' => 'iPhone 7S', 'price' => '250'], ], 'Samsung' => [ ['name' => 'Galaxy…
mySun
  • 1,550
  • 5
  • 32
  • 52
2
votes
2 answers

Does reading multiple files & collect bring them to driver in spark

Code snippet : val inp = sc.textFile("C:\\mk\\logdir\\foldera\\foldera1\\log.txt").collect.mkString(" ") I know above code reads the entire file & combine them in one string & executes it driver node(single execution. not parallel one). val inp =…
user7264473
  • 163
  • 1
  • 10