SQL BigQuery - I have duplicate rows on the primary key that I need to remove (I don't want to permanently delete from the table). I have to GROUP BY several other fields to aggregate results, and am having issues since I have to include the primary key but cannot group by it and remove duplicates at the same time.
I saw examples on stack but am having issues applying to this case. I have tried the below, and that is not resulting in any groupings because I have to group by the ID to leave it in the select statement.
SELECT Distinct (ID),Event_ID,Event_Name,Sum(Quantity), Sum(Price*Quantity)
FROM table
GROUP BY ID, Event_ID, Event_Name
I tried removing ID from the group by clause and adding "Min (Distinct (ID))," as the first select line, but that was not removing the duplicates.
I also tried
SELECT Distinct (ID),Event_ID, Event_Name, Sum(Quantity) ,Sum(Price*Quantity),
FROM table
INNER JOIN
(SELECT ID as DeduplicatedID
FROM table
GROUP BY ID) deduplicatedtable ON table.ID = deduplicatedtable.ID
GROUP BY ID, Event_ID, Event_Name
Thanks in advance!
of the group at the ID level, however some of these IDs have duplicates, which provide incorrect TotalQuantity and TotalRevenue numbers when grouped by the Event
of what I would want the grouped data to look like, but this is incorrect and includes the duplicate IDs when the quantity and rev fields are summed.