1

I am trying to generate a random barcode_list with 6 UNIQUE barcodes that have a hamming distance of 3. The issue is that the program is generating a barcode list with duplicates and not the correct hamming distance. Below is the code.

import random

nucl_list = ['A', 'C', 'G', 'T']
length = 6
number = 6
attempts = 1000
barcode_list = []
tested = []

def make_barcode():
"""Generates a random barcode from nucl_list"""
    barcode = ''
    for i in range(length):
        barcode += random.choice(nucl_list)
    return barcode

def distance(s1, s2):
"""Calculates the hamming distance between s1 and s2"""
    length1 = len(s1)
    length2 = len(s2)
    # Initiate 2-D array
    distances = [[0 for i in range(length2 + 1)] for j in range(length1 + 1)]
    # Add in null values for the x rows and y columns
    for i in range(0, length1 + 1):
        distances[i][0] = i
    for j in range(0, length2 + 1):
        distances[0][j] = j

    for i in range(1, length1 + 1):
        for j in range(1,length2 + 1):
            cost = 0
            if s1[i - 1] != s2[j - 1]:
                cost = 1
            distances[i][j] = min(distances[i - 1][j - 1] + cost, distances[i][j - 1] + 1, distances[i - 1][j] + 1)
    min_distance = distances[length1][length2]

    for i in range(0, length1 + 1):
        min_distance = min(min_distance, distances[i][length2])
    for j in range(0, length2 + 1):
        min_distance = min(min_distance, distances[length1][j])
    return min_distance

def compare_barcodes():
"""Generates a new barcode and compares with barcodes in barcode_list"""
    new_barcode = make_barcode()
    # keep track of # of barcodes tested
    tested.append(new_barcode)
    if new_barcode not in barcode_list:
        for barcode in barcode_list:
            dist = distance(barcode, new_barcode)
            if dist >= 3:
                barcode_list.append(new_barcode)
            else:
                pass
    else:
        pass

# make first barcode

first_barc = ''
for i in xrange(length):
    first_barc += random.choice(nucl_list)
barcode_list.append(first_barc)

while len(tested) < attempts:
    if len(barcode_list) < number:
        compare_barcodes()
    else:
        break

barcode_list.sort()

print barcode_list

I think my issue is with the last while loop: I want compare_barcodes to continually generate barcodes that fit the criteria (not a duplicate, and not within hamming distance of any of the barcodes already generated).

John Kugelman
  • 349,597
  • 67
  • 533
  • 578
  • 1
    An educated guess at this would be that your problem stems from appending to `barcode_list` while you are looping over it. – MrAlexBailey Apr 21 '15 at 16:05

4 Answers4

1

Try some behavior like this in your compare_barcodes().

Essentially we track whether or not dist >= 3 with too_far. Once we finish looping over barcode_list we go back and check too_far. If it was not too_far then we can append to the list.

The old logic was appending to barcode_list every time it found dist >= 3 which would of course be more than once depending on how many barcodes have already been added to the list.

def compare_barcodes():
    too_far = False
    """Generates a new barcode and compares with barcodes in barcode_list"""
    new_barcode = make_barcode()
    # keep track of # of barcodes tested
    tested.append(new_barcode)
    if new_barcode not in barcode_list:
        for barcode in barcode_list:
            dist = distance(barcode, new_barcode)
            if dist >= 3:
                too_far = True
        if not too_far:
            barcode_list.append(new_barcode)

Edit: I just realized you wanted the hamming distance to be 3 or larger... in this case simply change if not too far to if too far.

MrAlexBailey
  • 5,219
  • 19
  • 30
1

the answer of @Jkdc is correct, +1 for him. In your original code, you are almost there. Here's my suggestion, move your if new_barcode not in barcode_list: condition inside your for loop, make it if new_barcode not in barcode_list and distance(barcode, new_barcode), then you will not add any duplicates in your list, and then calculate the distance only if the new_barcode not in your barcode_list :

def compare_barcodes():
    """Generates a new barcode and compares with barcodes in barcode_list"""
    new_barcode = make_barcode()
    # keep track of # of barcodes tested
    tested.append(new_barcode)
    for barcode in barcode_list:
        if new_barcode not in barcode_list and distance(barcode, new_barcode):
            barcode_list.append(new_barcode)

Another suggestion is if you want to avoid duplicates, you can use set store your barcodes, set manipulates unsorted unique elements.

Haifeng Zhang
  • 30,077
  • 19
  • 81
  • 125
0

The problem is with your compare_barcodes() function. In the old version, once it sees a bar code, which is 3 steps away from any of the compared string, it will add that new string to the list. The code can be modified to the follows.

def compare_barcodes():
    """Generates a new barcode and compares with barcodes in barcode_list"""
    minDist = length
    new_barcode = make_barcode()
    # keep track of # of barcodes tested
    tested.append(new_barcode)
    if new_barcode not in barcode_list:
        for barcode in barcode_list:
            dist = distance(barcode, new_barcode)
            #if dist >= 3:
            #    barcode_list.append(new_barcode)
            #else:
            #    pass
            if dist < minDist:
                minDist = dist
    else:
        pass

    if minDist >= 3:
        barcode_list.append(new_barcode)
pyan
  • 3,577
  • 4
  • 23
  • 36
0

I ended up making a new function to calculate the hamming distance...

def compare_distances(new_barcode):
"""Compares the hamming_dist between new barcode and old barcodes"""
# Count number of distances < 3
count = 0
global barcode_list
for barcode in barcode_list:
    if distance(new_barcode, barcode) < 3:``
        count +=1
return count

def compare_barcodes():
    new_barcode = make_barcode()
    if new_barcode not in barcode_list:
        count = compare_distances(new_barcode)
        if count > 0:
            pass
        else:
            barcode_list.append(new_barcode)
    else:
        pass

# Initiate the functions to generate barcodes 
while len(barcode_list) < number:
    compare_barcodes()