Compare two vcards

Question

I have two vcards :

vcard1 = "BEGIN:VCARD
          VERSION:3.0
          N;CHARSET=UTF-8:Name;;;;
          TEL:0005555000
          END:VCARD"

vcard2 = "BEGIN:VCARD
      VERSION:3.0
      N;CHARSET=UTF-8:Name;;;;
      TEL:0005555000
      EMAIL;CHARSET=UTF-8:my_email@email.com
      END:VCARD"

As you can see the only difference is that the second vcard has an additional attribute which is EMAIL? Are these two vcards could be considered as equal using code ?

import vobject
print(vobject.readOne(vcard1).serialize()==vobject.readOne(vcard2).serialize())

They can be whatever you want them to be. That's what's great about programming. What you need to do first is develop a complete description of what makes two cards the same. You can either talk about the things that have to be the same, or you can talk about the things that don't have to be the same, or possibly both. Is `EMAIL` the only attribute that could be different and still have the objects be equal? Only once you can describe to someone else (like in this question, for example) what determines a match vs a non-match, does it do any good to talk about code. — CryptoFool, Nov 24 '20 at 21:46
In terms of the code you show, the two values you show are strings, and so they are already serialized. If you compare them as strings, they obviously are going to test as being different. You would probably want to parse these strings and turn them into objects (de-serialize them) in order to do a non-trivial comparison on them. You'd likely want create a dict for each one where both the keys and the values are strings. Then you could compare the key by key and decide which keys to compare and which not to per your description of what makes two of these objects the same. — CryptoFool, Nov 24 '20 at 21:51
Hello @steve , thank you for your reponse , I edited my code so that we are no longer comparing two strings — faith, Nov 24 '20 at 22:16

CypherX · Accepted Answer · 2020-11-25T04:31:33.950

Solution

The first rule for any comparison is to define the basis of comparison. You can even compare apples and oranges, provided you are looking for a quantity that can be compared: such as "how many apples vs. oranges" or "weight of 5-apples vs. 5-oranges". The point being the definition of underlying basis of comparison must be unambiguous.

Note: I will use the data from the Dummy Data section below.

Extending this concept to your use-case, you can compare the vcards against each field and then also compare against all fields. For example, I have shown you three ways to compare them:

Example A1: compare only commmon fileds between vcard1 and vcard2.
Example A2: compare all fileds between vcard1 and vcard2.
Example A3: compare only commmon user-specified fileds between vcard1 and vcard2.

Obviously, in this case if you compare the serialized versions of vcard1 and vcard2, it would return False as the content of these two vcards are different.

vc1.serialize()==vc2.serialize() # False

Example

In each case (A1, A2, A3), the custom function compare_vcards() returns two things:

match: a dict, giving matches at each field's level
summary: a dict, giving aggregated absolute match (if all fields matched) and relative (scale: [0,1]) matches (good for partial match).

But you will have to define your own business logic to determine what you consider as a match and what is not. What I have shown here should help you get started though.

## Example - A1
#  Compare ONLY COMMON fields b/w vc1 and vc2
match, summary = compare_vcards(vc1, vc2, mode='common')
print(f'match:   \t{match}')
print(f'summary: \t{summary}')

## Output
# match:    {'n': True, 'tel': True, 'version': True}
# summary:  {'abs_match': True, 'rel_match': 1.0}

## Example - A2
#  Compare ALL fields b/w vc1 and vc2
match, summary = compare_vcards(vc1, vc2, mode='all')
print(f'match:   \t{match}')
print(f'summary: \t{summary}')

## Output
# match:    {'tel': True, 'email': False, 'n': True, 'version': True}
# summary:  {'abs_match': False, 'rel_match': 0.75}

## Example - A3
#  Compare ONLY COMMON USER-SPECIFIED fields b/w vc1 and vc2
match, summary = compare_vcards(vc1, vc2, fields=['email', 'n', 'tel'])
print(f'match:   \t{match}')
print(f'summary: \t{summary}')

## Output
# match:    {'email': False, 'n': True, 'tel': True}
# summary:  {'abs_match': False, 'rel_match': 0.6666666666666666}

Code

def get_fields(vc1, vc2, mode='common'):
    if mode=='common':
        fields = set(vc1.sortChildKeys()).intersection(set(vc2.sortChildKeys()))
    else:
        # mode = 'all'
        fields = set(vc1.sortChildKeys()).union(set(vc2.sortChildKeys()))
    return fields

def compare_vcards(vc1, vc2, fields=None, mode='common'):
    if fields is None:
        fields = get_fields(vc1, vc2, mode=mode) 
    match = dict(
        (field, str(vc1.getChildValue(field)).strip()==str(vc2.getChildValue(field)).strip()) 
        for field in fields
    )
    summary = {
        'abs_match': all(match.values()), 
        'rel_match': sum(match.values()) / len(match)
    }
    return match, summary

Dummy Data

vcard1 = """
BEGIN:VCARD
VERSION:3.0
N;CHARSET=UTF-8:Name;;;;
TEL:0005555000
END:VCARD
"""

vcard2 = """
BEGIN:VCARD
VERSION:3.0
N;CHARSET=UTF-8:Name;;;;
TEL:0005555000
EMAIL;CHARSET=UTF-8:my_email@email.com
END:VCARD
"""

# pip install vobject
import vobject

vc1 = vobject.readOne(vcard1)
vc2 = vobject.readOne(vcard2)

References

@faith Does this help? Let me know if you have any questions. — CypherX, Nov 25 '20 at 04:20
@faith: Thank you, for `accepting` the answer. I am glad you found it useful. Please consider **`voting-up`** as well. — CypherX, Dec 01 '20 at 22:29