1

I would like to create a column in table "starting_pitcher_game_log" for "HomeAway_ID" conditional on values of two columns in another table, "FLD_team_ID" and "Away_Team_ID" such that when "FLD_team_ID" and "Away_Team_ID" are equal to each other, the value for the new column should be "Away" and when they are not equal to each other, the value for the new column should be "Home". The new column's values should provide game-site information ("Home" or "Away") for the "Starting_Pitcher" column in the current table "starting_pitcher_game_log". The "FLD_team_ID" column includes values for the team that the starting pitcher identified in column "Starting_Pitcher" plays for. So, in other words I'd like each row of the "Starting_Pitcher" column, which corresponds to a different game that a given starting pitcher played in, to designate the site of the game, "Home" or "Away", of that game.

Here is the code for the query that goes through, but all but two values for that column are "Home" even though half should be "Home" and half should be "Away"

ALTER TABLE retrosheet.starting_pitcher_game_log ADD HomeAway_ID VARCHAR (4);
UPDATE retrosheet.starting_pitcher_game_log as b, retrosheet.events as g
SET b.HomeAway_ID = IF(g.`AWAY_TEAM_ID`=g.`FLD_TEAM_ID`,"Away","Home")
WHERE b.`Starting_Pitcher` = g.`PIT_ID`

Can someone offer some guidance as to what might be wrong with code that is causing this problem?

UPDATE: Here's the code for the TRIM commands for the four fields in the original code above

UPDATE events SET PIT_ID = TRIM(PIT_ID)

UPDATE starting_pitcher_game_log SET Starting_Pitcher = TRIM(Starting_Pitcher)

UPDATE events SET AWAY_TEAM_ID = TRIM(AWAY_TEAM_ID)

UPDATE events SET FLD_TEAM_ID = TRIM(FLD_TEAM_ID)

Then I ran the following code trying it both with CHAR and VARCHAR for variable type for HomeAway_ID, as well as trying it it with 'AWAY' and 'Home' and then "AWAY" and "Home" in the conditional statement in line 3.

ALTER TABLE retrosheet.starting_pitcher_game_log ADD COLUMN HomeAway_ID CHAR(4);
    UPDATE retrosheet.starting_pitcher_game_log AS b, retrosheet.events AS g
    SET b.HomeAway_ID=IF(g.`AWAY_TEAM_ID`=g.`FLD_TEAM_ID`,'Away','Home')
    WHERE b.`Starting_Pitcher`=g.`PIT_ID`

I'm stumped -- feel like it shouldn't be this hard...It just keeps processing with no end.

Any help with this would be much appreciated.

UPDATE:

I tried the following code, implementing the INNER JOIN command recommended, but it is still taking forever to process query and have yet to see it complete:

ALTER TABLE retrosheet.starting_pitcher_game_log ADD COLUMN HomeAway_ID CHAR (4);
UPDATE retrosheet.starting_pitcher_game_log AS b, retrosheet.events AS g INNER JOIN retrosheet.starting_pitcher_game_log ON g.AWAY_TEAM_ID = g.FLD_TEAM_ID 
SET b.HomeAway_ID=IF(g.AWAY_TEAM_ID=g.FLD_TEAM_ID,'Away','Home') WHERE b.Starting_Pitcher=g.PIT_ID

UPDATE:

ALTER TABLE events ADD INDEX (PIT_ID, FLD_TEAM_ID, AWAY_TEAM_ID)

I didn't create a primary key using any of the referenced fields in the events table since I plan to sort those columns referenced in the query in different ways in the future.

Then I ran this query to create the column:

ALTER TABLE retrosheet.starting_pitcher_game_log ADD COLUMN HomeAway_CD CHAR (4);
UPDATE retrosheet.starting_pitcher_game_log AS b, retrosheet.events AS g JOIN retrosheet.starting_pitcher_game_log ON g.AWAY_TEAM_ID = g.FLD_TEAM_ID 
SET b.HomeAway_CD=IF(g.AWAY_TEAM_ID=g.FLD_TEAM_ID,'Away','Home') WHERE b.Starting_Pitcher = g.PIT_ID

It just keeps going on forever...

What is the best way to index the columns to accommodate the execution of the query in a somewhat timely fashion.

CORRECT CODE: First I indexed the minimum number of fields in the table (in which I wanted to add the new column HomeAway_CD) that could accurately/sufficiently produce that new column:

ALTER TABLE starting_pitcher_game_log ADD INDEX (GAME_ID, Starting_Pitcher);

Here is the code that worked to produce the column. In the where clause, I had tried "b.GAME_ID=g.GAME_ID" by itself and "b.Starting_Pitcher = g.PIT_ID" by itself, and the query took forever each time. But, both the starting pitcher column and the Game_ID column must be linked from the origin table to the destination table:

UPDATE retrosheet.starting_pitcher_game_log AS b, 
       retrosheet.events AS g 
   SET b.HomeAway_CD=IF(g.AWAY_TEAM_ID=g.FLD_TEAM_ID,'Away','Home') 
WHERE b.GAME_ID=g.GAME_ID
   AND b.Starting_Pitcher = g.PIT_ID
Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
LeeZee
  • 107
  • 1
  • 12

2 Answers2

0

Trim the fields before comparing them. I think you have some hidden leading or trailing blanks in there.

nicomp
  • 4,344
  • 4
  • 27
  • 60
  • Nicomp, thank you for the suggestion. I went ahead and ran 4 separate TRIM commands above, code in the original post, each one for each of the four fields referenced in the original code to add the column. But, the query continues to process on and on with no end: I ran the code in the morning before work, and after returning home from work after nine hours, it is still running...I had to stop it. – LeeZee Dec 08 '15 at 04:31
0

How many records are in the table? You don't have an explicit join in the UPDATE query: I'm guessing that you have an implicit cross join wherein all the records in starting_pitcher_game_log are joined to all the records in events. For example, if there are 1000 records in each table then the query will process 1000 X 1000 records.

Add an explicit inner or outer join to the UPDATE query

UPDATE retrosheet.starting_pitcher_game_log AS b, retrosheet.events AS g inner JOIN ... SET b.HomeAway_ID=IF(g.AWAY_TEAM_ID=g.FLD_TEAM_ID,'Away','Home') WHERE b.Starting_Pitcher=g.PIT_ID

nicomp
  • 4,344
  • 4
  • 27
  • 60
  • Thanks nicomp. Now trying to figure out which of the 4 fields referenced need to be joined and what type of join. Want to create column for site of game, home or away. 1)The values of "FLD_TEAM_ID" are always the team of "Starting_pitcher" (for each game). 2) "AWAY_TEAM_ID" equals "FLD_TEAM_ID" when game is played away, and when it doesn't, it's played at home. So, join between "FLD_TEAM_ID" and "AWAY_TEAM_ID" be OUTER JOIN, right? And since "FLD_TEAM_ID" is always the team for "Starting_Pitcher" I will need to GROUP them, right? – LeeZee Dec 08 '15 at 20:02
  • There are millions of records for events table and many thousands for starting_pitcher_game_log...I'll check when I get home. But it's clear now I don't want the query to process all combinations of records in those tables.. – LeeZee Dec 08 '15 at 20:07
  • If there are millions of records then you are killing performance with a cross-join and that's why it's running forever. You need to join a foreign key to the primary key. – nicomp Dec 08 '15 at 23:39
  • There are three fields in the table starting_pitcher_game_log (the one in which I want to create a new column) that are identical to fields in the main table events:"Starting_Pitcher", "Game_ID", N "Game_Date". In which table should I create the foreign key(s) and then join to a primary key in the starting_pitcher_game_log? – LeeZee Dec 09 '15 at 15:08
  • Nicomp, I tried to add an inner join between "AWAY_TEAM_ID" and "FLD_TEAM_ID" to the query. Please see above in original post. – LeeZee Dec 10 '15 at 05:57
  • nicomp, in what table should the foreign key be, and in which table should the primary key be? Should this link be between "AWAY_TEAM_ID" and "FLD_TEAM_ID"? – LeeZee Dec 12 '15 at 01:10
  • I think so. The foreign key would be an attribute in one table that refers to a primary key in another table. – nicomp Dec 12 '15 at 03:20
  • I am completely confused about how to properly index the columns (in the starting_pitcher_log and events tables) that the following query will need to access in oder to create a HomeAway_ID column. I indexed PIT_ID, FLD_TEAM_ID and AWAY_TEAM_ID in the events table as follows in the original post above... – LeeZee Dec 12 '15 at 23:15
  • Ok, I above in the original post is the correct code for this which incorporates 2 clauses in the WHERE statement, each of which I tried by itself but not together. – LeeZee Dec 13 '15 at 22:09