Highest Voted 'semi-join' Questions

2

votes

2 answers

Best JOIN performance on Google BigQuery

I have two large tables, a base table and a table I'm using to filter. I want the keys where 'my_filter' is true. Which one gives me the best performance Google BigQuery? Standard Join select [fields] from base_table join filter_table on…

asked Sep 07 '17 at 00:54

Josh Jelin

23
1
4

1

vote

3 answers

How do I rewrite my query to be based on the existence of a given foreign key, rather than the number of times that key occurs?

In the most general of cases, I have a query like below: SELECT tutor_school.name, count(*), tutor_school.schoolid FROM tutor_school, tutor_attends, tutor_tutors_in WHERE tutor_school.schoolid = tutor_attends.schoolid and tutor_school.schoolid…

sql semi-join

asked Oct 04 '11 at 23:16

RonLugge

5,086
5
33
61

1

vote

0 answers

Py4JJavaError: An error occurred while calling o10495.join. : java.lang.StackOverflowError

In Azure Synapse notebook, after running quite a number of functions, I'm trying to do a semi join of two dataframes where DF1 has one column called ID and the DF2 has five columns: ID, SID, Name, Term, Desc. Now the issue is everytime I start the…

pyspark semi-join

asked Dec 15 '22 at 01:49

Mathuchock

11
2

1

vote

3 answers

How to semi_join two dataframes by string column with one being colon-separated

I have two dataframes, dfa and dfb: dfa <- data.frame( gene_name = c("MUC16", "MUC2", "MET", "FAT1", "TERT"), id = c(1:5) ) dfb <- data.frame( gene_name = c("MUC1", "MET; BLEP", "MUC21", "FAT", "TERT"), id = c(6:10) ) which look like…

r dplyr fuzzyjoin anti-join semi-join

asked Nov 13 '19 at 15:01

MonkeyBack

61
6

1

vote

2 answers

MYSQL query on products within variant price range without duplicates

I am trying to query my products table such that my query returns a product set within a certain variant price range -- products have many variants and variants belong to products. I have the following query which is working but it is very slow --…

mysql semi-join

asked Jan 14 '19 at 15:07

robskrob

2,720
4
31
58

1

vote

2 answers

semi_join in R but pull back duplicates

I'm having issues with semi_join from dplyr. Ideally I would like to do a semi join on dfA against dfB. dfA has duplicate values, and so does dfB. I want to pull back all values from dfA that have any matches against dfB even duplicates in dfA. dfA …

r join dplyr inner-join semi-join

asked Jan 10 '17 at 16:26

Matt W.

3,692
2
23
46

1

vote

1 answer

Oracle semi-join with multiple tables in SQL subquery

This question is how to work around the apparent oracle limitation on semi-joins with multiple tables in the subquery. I have the following 2 UPDATE statements. Update 1: UPDATE (SELECT a.flag update_column FROM a, b WHERE a.id =…

sql oracle semi-join

asked Jun 03 '15 at 23:37

user1410910

1,050
6
12

1

vote

1 answer

MYSQL Query double JOIN

This is my query: SELECT count(*) AS COUNT FROM `psttodo-in` p INNER JOIN `deelgebied` d ON d.`Segmentcode` = p.`Segment No_ PST` AND d.`Deelgebied` = p.`Deelgebied` INNER JOIN m2m ON m2m.`deelgebied` = d.`deelgebiedID` WHERE …

php mysql sql inner-join semi-join

asked Apr 22 '14 at 16:17

nielsv

6,540
35
111
215

1

vote

1 answer

sql semi join optimize

there are two tables t1 { id, name, ...} t2 { t1_id , date_time, parameter, value, ...} t1 and t2 are oracle partitioned table. t2 is large. i want to fetch t1 which matches a time range from t2 : select id, name, ... from t1…

sql oracle optimization semi-join

asked Jun 27 '12 at 09:17

ArdenZhao

133
3
16

0

votes

1 answer

Re-attaching a column to a dataframe with fewer rows (i.e. Ignoring a column that isn't removed in the final dataframe)

I have two large datasets, one that has it's grouping column removed, as well as any duplicates, and the original data. My problem is that I then need to reattach the grouping column from the original data after a bunch of data-wrangling/…

r dataframe semi-join

asked Apr 20 '23 at 04:17

MM1

478
15

0

votes

1 answer

I want to select all records from one dataframe where its value exists/not exists in another dataframe. How to do this using pyspark dataframes?

I have the two pyspark dataframes. I want to select all records from voutdf where its "hash" does not exist in vindf.tx_hash How to do this using pyspark dataframe.? I tried a semi join but I am ending up with out of memory errors. voutdf =…

python pyspark semi-join

asked Dec 04 '19 at 08:50

user11534659

3
3

0

votes

1 answer

Decorrelate a query using semi join

I am new to query optimization,how to use semi join while implementing decorrelation I can't totally understand. Consider the query SELECT A, B FROM r WHERE r.B < SOME ( SELECT B FROM s …

database multiset semi-join

asked Apr 22 '19 at 13:28

Md. Shafiqul Islam

1
1

0

votes

1 answer

Merging two data frames horizontally by ID and keep only matches from the second one

I have two data frames which I want to merge horizontally: dat_a a b c 1 1 1 A 2 2 1 A 3 3 1 B 4 4 1 B dat_b a b c 1 3 1 C 2 3 1 C 3 3 1 D 4 4 1 D I want to only keep those rows from dat_a which have a match in dat_b for columns a and b. So…

r join semi-join

asked Apr 08 '19 at 16:51

Banjo

1,191
1
11
28

0

votes

1 answer

Mysql query still to slow for 100 million records in 10gb database size with index

I have a pretty huge data set of products and users and their time of usage. There are around 100 million rows and takes around 10 GB of disk space. the dataset is on the following order: userid itemid purchase_date 1 1 …

mysql database mariadb query-optimization semi-join

asked Jan 20 '19 at 14:50

Arnabhik

21
3

0

votes

2 answers

Using semi_join to find similarities but returns none mistakenly

I am trying to find the similar genes between two columns that I can later work with just the similar genes. Below is my code: top100_1Beta <- data.frame(grp1_Beta$my_data.SYMBOL[1:100]) top100_2Beta<-…

r dataframe dplyr semi-join

asked Nov 29 '18 at 23:53

Liz M.

25
3

Questions tagged [semi-join]