Questions tagged [self-join]

A table can be joined to itself and this is called a self-join. You can use a self-join to create a result set that joins records in a table with other records in the same table for example to filter a specific subset of records from within a single table without returning duplicate or similar rows.

Self-joins are useful in queries or subqueries such as the following:

  • a limited result set of rows is needed
  • a relationship between rows is needed
  • a calculation among rows is needed

Self-joins are also useful when converting vendor specific syntax to vendor agnostic syntax

References

1264 questions
5
votes
2 answers

How to implement self-join/cross-product with hadoop?

It is common task to make some evaluation on pairs of items: Examples: de-duplication, collaborative filtering, similar items etc This is basically self-join or cross-product with the same source of data.
yura
  • 14,489
  • 21
  • 77
  • 126
5
votes
3 answers

Self join issue

I have a table called tblAccInfo, below is the table data. I need output like below. Input PolicyNumber BankAc StorageDate VerNum 6003210400 123 2012-01-01 1 6003210400 164 2012-01-03 2 6003210400 860 …
Simhadri
  • 911
  • 7
  • 18
  • 32
5
votes
1 answer

SQL Query Performance Degrades Depending on the Order of the Search Values

I have a dual self-join query where the performance is severely degraded when the search values are swapped. -- 500,000 i/o & 500ms execution select fooA.ID , fooB.ID from foo AS fooA INNER JOIN bar AS barA ON fooA.barID = barA.barID INNER…
Josh Bond
  • 242
  • 1
  • 5
4
votes
2 answers

mysql update with a self referencing query

I have a table of surveys which contains (amongst others) the following columns survey_id - unique id user_id - the id of the person the survey relates to created - datetime ip_address - of the submission ip_count - the number of…
bigstylee
  • 1,240
  • 1
  • 12
  • 22
4
votes
1 answer

Alternatives to a self FULL OUTER JOIN to obtaint match between two entries

in my db model I have two entities: Entitlement and Entitlement_Data. Each Entitlement is identified by an incrementing ID (IDENTITY), and can have multiple Entitlement_Data entries, each with a different Type - wich can be either 0 (Weekly) or 1…
Andrea Pigazzini
  • 359
  • 1
  • 14
4
votes
2 answers

Estimated size of the self-join operation on a relation R, given a histogram for R

Query optimizers typically use summaries of data distributions to estimate the sizes of the intermediate tables generated during query processing. One popular such summarization scheme is a histogram, whereby the input range is partitioned into…
user966892
4
votes
1 answer

How to perform a conditional join on statement containing both AND and OR operators?

I have a large dataset and would like to form all pairs of rows satisfying some condition and then calculate some variables based on which parts of the condition were satisfied. The following MWE illustrates what I would like to…
s.willis
  • 347
  • 2
  • 11
4
votes
3 answers

Replacing self joins by window functions

I am working with following sample data; dt | ship_id | audit_id | action 2022-01-02 | 1351 | id1 | destroy 2022-01-01 | 1351 | id1 | create 2021-12-12 | 3457 | id2 |…
jay
  • 1,319
  • 6
  • 23
  • 42
4
votes
1 answer

Increase parallelism of reading a parquet file - Spark optimize self join

I want to perform a self join in order to generate candidate matching pairs. Currently, this is not working as this operation is way too slow. Unfortunately, I cannot broadcast the data frames as they are too large. First I aggregate the number of…
Georg Heiler
  • 16,916
  • 36
  • 162
  • 292
4
votes
1 answer

Translate SQL self join query to data.table syntax

I have the dataset with the following structure: Classes 'data.table' and 'data.frame': 214175 obs. of 12 variables: $ app_id : int 35949 49453 49970 50913 5| __truncated__ ... $ customer_id : int 35948 49452 49452 50912 5|…
Artem Klevtsov
  • 9,193
  • 6
  • 52
  • 57
4
votes
5 answers

Most efficient method for persisting complex types with variable schemas in SQL

What I'm doing I am creating an SQL table that will provide the back-end storage mechanism for complex-typed objects. I am trying to determine how to accomplish this with the best performance. I need to be able to query on each individual simple…
smartcaveman
  • 41,281
  • 29
  • 127
  • 212
4
votes
2 answers

Self join in criteria query

so I have a table called 'User' having fields as userId userName supervisorId I want to fetch the userName of the supervisor of a particular user. CriteriaBuilder criteriaBuilder = em.getCriteriaBuilder(); CriteriaQuery criteriaQuery =…
archMage
  • 41
  • 1
  • 7
4
votes
1 answer

How to do self join with dplyr using different columns?

I want to to implement a classic self-join problem in R using dplyr but unable to do so. Input: Dataframe with empid, empname and managerid Output: Dataframe with managername for each empid library(dplyr) empid = c(1, 2, 3, 13, 11, 9, 8, 7,…
ar7
  • 51
  • 4
4
votes
7 answers

What is wrong with this join of a table to itself?

I have a table called TempAllAddresses with the following columns - ID, Address, State. I want to populate a new table with Address, State, and Count. Count should represent how many records there are in the TempAllAddresses table that have an…
froadie
  • 79,995
  • 75
  • 166
  • 235
4
votes
2 answers

Flatten hierarchy on self-join table

I have data in a self-join hierarchical table where Continents have many Countries have many Regions have many States have many Cities. Self-joining table structure: |-------------------------------------------------------------| | ID | Name …