0

I want to merge data. Following are my MySQL tables. I want to use Python to traverse though a list of both Lists (one with dupe = 'x' and other with null dupes).

This is sample data. Actual data is humongous.

For instance :

a b c d e f key dupe
--------------------
1 d c f k l 1   x
2 g   h   j 1    
3 i   h u u 2
4 u r     t 2   x

From the above sample table, the desired output is :

a b c d e f key dupe
--------------------
2 g c h k j 1
3 i r h u u 2

What I have so far :

import string, os, sys
import MySQLdb
from EncryptedFile import EncryptedFile

enc = EncryptedFile( os.getenv("HOME") + '/.py-encrypted-file')
user = enc.getValue("user")
pw = enc.getValue("pw")

db = MySQLdb.connect(host="127.0.0.1", user=user, passwd=pw,db=user)

cursor = db.cursor()
cursor2 = db.cursor()

cursor.execute("select * from delThisTable where dupe is null")
cursor2.execute("select * from delThisTable where dupe is not null")
result = cursor.fetchall()
result2 = cursor2.fetchall()

for each record
    for each field
        perform the comparison and perform the necessary updates

             ### How do I compare the record with same key value and update the original row null field value with the non-null value from the duplicate? Please fill this void...


cursor.close()
cursor2.close()
db.close()

Thanks guys!

ThinkCode
  • 7,841
  • 21
  • 73
  • 92
  • Can't figure out the problem. Do you want to get the algorythm, or realization in the terms of specific framework? In fact, you justt need to iterate through fields of cursor and 'coalesce' items. Can you execute plain SQL in this case? Cause if you can, the query is trivial. – Maxim Popravko Apr 09 '10 at 20:26
  • This is plain, simple test data. In reality, there are few thousand rows and few 100 columns, hence this approach. Thanks. – ThinkCode Apr 09 '10 at 20:33
  • update delthistable t set t.a = coalesce(dup.a, t.a), t.b = coalesce(dup.b, t.b)... from ( select * from delthistable where dupe = 'x' ) dup where t.dupe <> 'x' and t.key = dup.key -------------------------------------------------------------- delete from delthistable where dupe <> 'x' – Maxim Popravko Apr 09 '10 at 20:51

1 Answers1

2

OK, let's have some fun...

mysql> create table so (a int, b char, c char, d char, e char, f char, `key` int, dupe char);
Query OK, 0 rows affected (0.05 sec)

mysql> insert into so values (1, 'd', 'c', 'f', 'k', 'l', 1, 'x'), (2, 'g', null, 'h', null, 'j', 1, null), (3, 'i', null, 'h', 'u', 'u', 2, null), (4, 'u', 'r', null, null, 't', 2, 'x');
Query OK, 4 rows affected (0.00 sec)
Records: 4  Duplicates: 0  Warnings: 0

mysql> select * from so order by a;
+------+------+------+------+------+------+------+------+
| a    | b    | c    | d    | e    | f    | key  | dupe |
+------+------+------+------+------+------+------+------+
|    1 | d    | c    | f    | k    | l    |    1 | x    |
|    2 | g    | NULL | h    | NULL | j    |    1 | NULL |
|    3 | i    | NULL | h    | u    | u    |    2 | NULL |
|    4 | u    | r    | NULL | NULL | t    |    2 | x    |
+------+------+------+------+------+------+------+------+
4 rows in set (0.00 sec)

Python 2.6.5 (r265:79063, Mar 26 2010, 22:43:05) 
[GCC 4.2.1 (Apple Inc. build 5646) (dot 1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import MySQLdb
>>> db = MySQLdb.connect(host="127.0.0.1", db="test")
>>> c = db.cursor()
>>> c.execute("SELECT a, b, c, d, e, f, `key`, dupe FROM so")
4L
>>> rows = c.fetchall()
>>> rows
((1L, 'd', 'c', 'f', 'k', 'l', 1L, 'x'), (4L, 'u', 'r', None, None, 't', 2L, 'x'), (2L, 'g', None, 'h', None, 'j', 1L, None), (3L, 'i', None, 'h', 'u', 'u', 2L, None))
>>> data = dict()
>>> for row in rows:
...  key, isDupe = row[-2], row[-1]
...  if key not in data:
...   data[key] = list(row[:-1])
...  else:
...   for i in range(len(row)-1):
...    if data[key][i] is None or (not isDupe and row[i] is not None):
...     data[key][i] = row[i]
... 
>>> data
{1L: [2L, 'g', 'c', 'h', 'k', 'j', 1L], 2L: [3L, 'i', 'r', 'h', 'u', 'u', 2L]}
Messa
  • 24,321
  • 6
  • 68
  • 92
  • Thanks for the solution. I have few hundred rows in the actual table. How to adapt your code to my actual table? Thanks again! – ThinkCode Apr 09 '10 at 20:45
  • Will the data in your table fit into your RAM? If so, I think no adaptation is needed. – Messa Apr 09 '10 at 20:53
  • It works! Thanks so much. I am figuring out the best way to dump the final data into MySQL table. Some fields are None and dates are of the format date.datetime. Easy way to dump to MySQL? – ThinkCode Apr 09 '10 at 21:55