In Django, is it possible to find duplicates with queryset and regex?
Django select only rows with duplicate field values shows without using a regex:
self.values('Website').annotate(count=Count('id')).order_by().filter(count__gt=1)
I have a model:
class company(models.Model):
Website = models.URLField(blank=True, null=True )
I want to find duplicates with regex
For example.
Company.objects.create(Website='http://example.com')
Company.objects.create(Website='http://www.example.com')
Both of these are the same website. I'd like to use a regex so that it will return return these companies as duplicates.
I know there is filters like that use regex. I'm not sure how to update this to use a regex:
self.values('Website').annotate(count=Count('id')).order_by().filter(count__gt=1)
I'd like to do something like:
Website__iregex='http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+'
Update There was some confusion so I'll give an example.
Here is what my db looks like
Company.objects.create(Website='http://example.com')
Company.objects.create(Website='http://www.example.com')
Company.objects.create(Website='http://example.org', Name='a')
Company.objects.create(Website='http://example.org', Name='b')
When I call
Company.objects.all().values('Website').annotate(count=Count('id')).order_by().filter(count__gt=1)
It returns:
- http://example.org (from name=a) and http://example.org (from name=b)
This is missing that example.com and www.example.com are the same website.
I want to use a regex so that I can tell django that example.com and www.example.com are the same websites.
I want to modify:
Company.objects.all().values('Website').annotate(count=Count('id')).order_by().filter(count__gt=1)
so that it returns the duplicates:
http://example.org (from name=a) and http://example.org (from name=b)
example.com www.example.com