Questions tagged [semi-join]
37 questions
2
votes
2 answers
Best JOIN performance on Google BigQuery
I have two large tables, a base table and a table I'm using to filter. I want the keys where 'my_filter' is true.
Which one gives me the best performance Google BigQuery?
Standard Join
select
[fields]
from base_table
join filter_table
on…

Josh Jelin
- 23
- 1
- 4
1
vote
3 answers
How do I rewrite my query to be based on the existence of a given foreign key, rather than the number of times that key occurs?
In the most general of cases, I have a query like below:
SELECT tutor_school.name, count(*), tutor_school.schoolid
FROM tutor_school, tutor_attends, tutor_tutors_in
WHERE
tutor_school.schoolid = tutor_attends.schoolid and
tutor_school.schoolid…

RonLugge
- 5,086
- 5
- 33
- 61
1
vote
0 answers
Py4JJavaError: An error occurred while calling o10495.join. : java.lang.StackOverflowError
In Azure Synapse notebook, after running quite a number of functions, I'm trying to do a semi join of two dataframes where DF1 has one column called ID and the DF2 has five columns: ID, SID, Name, Term, Desc. Now the issue is everytime I start the…

Mathuchock
- 11
- 2
1
vote
3 answers
How to semi_join two dataframes by string column with one being colon-separated
I have two dataframes, dfa and dfb:
dfa <- data.frame(
gene_name = c("MUC16", "MUC2", "MET", "FAT1", "TERT"),
id = c(1:5)
)
dfb <- data.frame(
gene_name = c("MUC1", "MET; BLEP", "MUC21", "FAT", "TERT"),
id = c(6:10)
)
which look like…

MonkeyBack
- 61
- 6
1
vote
2 answers
MYSQL query on products within variant price range without duplicates
I am trying to query my products table such that my query returns a product set within a certain variant price range -- products have many variants and variants belong to products. I have the following query which is working but it is very slow --…

robskrob
- 2,720
- 4
- 31
- 58
1
vote
2 answers
semi_join in R but pull back duplicates
I'm having issues with semi_join from dplyr. Ideally I would like to do a semi join on dfA against dfB. dfA has duplicate values, and so does dfB. I want to pull back all values from dfA that have any matches against dfB even duplicates in dfA.
dfA …

Matt W.
- 3,692
- 2
- 23
- 46
1
vote
1 answer
Oracle semi-join with multiple tables in SQL subquery
This question is how to work around the apparent oracle limitation on semi-joins with multiple tables in the subquery. I have the following 2 UPDATE statements.
Update 1:
UPDATE
(SELECT a.flag update_column
FROM a, b
WHERE a.id =…

user1410910
- 1,050
- 6
- 12
1
vote
1 answer
MYSQL Query double JOIN
This is my query:
SELECT count(*) AS COUNT
FROM `psttodo-in` p
INNER JOIN `deelgebied` d
ON d.`Segmentcode` = p.`Segment No_ PST`
AND d.`Deelgebied` = p.`Deelgebied`
INNER JOIN m2m
ON m2m.`deelgebied` = d.`deelgebiedID`
WHERE
…

nielsv
- 6,540
- 35
- 111
- 215
1
vote
1 answer
sql semi join optimize
there are two tables
t1 { id, name, ...}
t2 { t1_id , date_time, parameter, value, ...}
t1 and t2 are oracle partitioned table. t2 is large.
i want to fetch t1 which matches a time range from t2 :
select id, name, ... from t1…

ArdenZhao
- 133
- 3
- 16
0
votes
1 answer
Re-attaching a column to a dataframe with fewer rows (i.e. Ignoring a column that isn't removed in the final dataframe)
I have two large datasets, one that has it's grouping column removed, as well as any duplicates, and the original data. My problem is that I then need to reattach the grouping column from the original data after a bunch of data-wrangling/…

MM1
- 478
- 15
0
votes
1 answer
I want to select all records from one dataframe where its value exists/not exists in another dataframe. How to do this using pyspark dataframes?
I have the two pyspark dataframes.
I want to select all records from voutdf where its "hash" does not exist in vindf.tx_hash
How to do this using pyspark dataframe.?
I tried a semi join but I am ending up with out of memory errors.
voutdf =…

user11534659
- 3
- 3
0
votes
1 answer
Decorrelate a query using semi join
I am new to query optimization,how to use semi join while implementing decorrelation I can't totally understand.
Consider the query
SELECT A, B
FROM r
WHERE r.B < SOME (
SELECT B
FROM s
…
0
votes
1 answer
Merging two data frames horizontally by ID and keep only matches from the second one
I have two data frames which I want to merge horizontally:
dat_a
a b c
1 1 1 A
2 2 1 A
3 3 1 B
4 4 1 B
dat_b
a b c
1 3 1 C
2 3 1 C
3 3 1 D
4 4 1 D
I want to only keep those rows from dat_a which have a match in dat_b for columns a and b.
So…

Banjo
- 1,191
- 1
- 11
- 28
0
votes
1 answer
Mysql query still to slow for 100 million records in 10gb database size with index
I have a pretty huge data set of products and users and their time of usage.
There are around 100 million rows and takes around 10 GB of disk space.
the dataset is on the following order:
userid itemid purchase_date
1 1 …

Arnabhik
- 21
- 3
0
votes
2 answers
Using semi_join to find similarities but returns none mistakenly
I am trying to find the similar genes between two columns that I can later work with just the similar genes. Below is my code:
top100_1Beta <- data.frame(grp1_Beta$my_data.SYMBOL[1:100])
top100_2Beta<-…

Liz M.
- 25
- 3