0

I have a txt file that has two columns as below -

LocationIndex   ID
P-1-A100A100    X000PY66QL
P-1-A100A100    X000RE0RRD
P-1-A100A101    X000R39WBL
P-1-A100A103    X000LJ7MX1
P-1-A100A104    X000S5QZMH
P-1-A100A105    X000MUMNOR
P-1-A100A105    X000S5R571
P-1-A100B100    X000MXVHFZ
P-1-A100B100    X000Q18233
P-1-A100B100    X000S6RSZJ
P-1-A100B101    X000K7C4HN
P-1-A100B102    X000RN9U59
P-1-A100B103    X000R4MZE1
P-1-A100B104    X000K9HSKT
P-1-A100C101    X000MCB5DZ
P-1-A100C101    X000O0T0RX
P-1-A100C102    X000RULTGZ
P-1-A100C104    X000O5NXKN
P-1-A100C104    X000RN3G9F
P-1-A100C105    X000D4P1P5
P-1-A100C105    X000QNBKDF
P-1-A100D100    X000FADDHP
P-1-A100D100    X000KR34DB
P-1-A100D100    X000MPCZ1X
P-1-A100D100    X000S6TO0B
P-1-A100D101    B00PANFBJ2
P-1-A100D101    X000Q1IYQD
P-1-A100D101    X000QEMDV7
P-1-A100D101    X000QHRKM1
P-1-A100D101    X000RUGIKR
P-1-A100D102    X000FF656L
P-1-A100D102    X000S13C5J

Taking the LocationIndex as the search index, I need to find which adjacent locations have the same ID.

Defining the adjacent locations :

The left and right locations for a particular Location Index is given by changing last character of the Location Index, e.g: for P-1-A100B103, left is P-1-A100B102 and right is P-1-A100B104 (the last digit is in the range 0-5)

The top and bottom locations for a particular Location Index is given by changing fourth last character of the Location Index, e.g: for P-1-A100B103, top is P-1-A100C103 and right is P-1-A100A103 (the fourth last digit is in the range A-E)

I need to find out if the ID of given location index (here for eg P-1-A100B103) matches with ID of any of its left right top or bottom location index.

I tried the following way -

import sys

with open( 'Test.txt', 'r') as f:
    for line in f:
        line = line.split()
                x = int(line[1])
                y = line[2]
                if x[-1:] > 0 && x[-1: < 5] && x[-4:] != 'A' && x[-4:] != 'E':  # eliminating corner cases
                        right = ord x[12] + 1
                        left  = ord x[12] - 1
                        top   = chr(ord x[9] + 1)
                        bottom = chr(ord x[9] - 1)
                        # how to search ID for individual right, left, top and bottom?

I can do this in shell but I need to have this done in Python. Any hint/help would be appreciated

Ashish K
  • 905
  • 10
  • 27
  • What did you try and what problems did you run into? As it stands it looks like a programming service request, which is clearly off-topic. – guidot Feb 01 '18 at 07:50
  • @guidot It's because I am really new to python and I could barely help myself with it. You can see my previous questions, I have always posted codes that I tried and didn't work. Here i could not even proceed so thought to take help from stackoverflow – Ashish K Feb 01 '18 at 07:51
  • how are the 2 headings seperated in the txt file? – Athul Soori Feb 01 '18 at 07:54
  • they are tab seperated – Ashish K Feb 01 '18 at 07:56
  • Please show us something (reading the file content and displaying it, anything)... – CristiFati Feb 01 '18 at 08:00
  • Ok give me a minute. I will show what I tried – Ashish K Feb 01 '18 at 08:02
  • @CristiFati, I have updated the question. Request your feedback. – Ashish K Feb 01 '18 at 08:23
  • Your code only reads the items into memory. Because your question is basically *"is the adjacent location present somewhere in the file"* you will probably need to read the entire file into memory, then analyze the structure you end up with. A Python `dict()` allows you to easily store key-value pairs. – tripleee Feb 01 '18 at 08:28
  • @triplee thanks for mentioning this, I am looking into `python dictionaries` and see if I am able to do this. – Ashish K Feb 01 '18 at 08:30
  • You're in the right direction. Now what I'd suggest is using the ID as a key in the dictionary, and the value will be a list with all the location indexes with that ID. when traversing the file content and constructing the dict, you could use `dictionary.setdefault`. Also, when reading each line, do: `line = line.strip().split()`. Then go for the location indexes of each id, and by comparing their last, and last 4th characters (btw, don't know if possible, but (*P-1-A100B100* and *P-1-A100B099*) and (*A* and *Z*) are also neighbors :)). – CristiFati Feb 01 '18 at 08:41

1 Answers1

1

A bit long and not the most efficient, but it gets the job done:

def getData():
    loc_keys = {}
    with open(FILE_PATH, 'r') as f:
        next(f)
        for line in f:
            line = line.split()
            loc, key = line[0], line[1]
            if loc not in loc_keys:
                loc_keys[loc] = set([])
            loc_keys[loc].add(key)

    return loc_keys


def is_adjacent(loc1, loc2):
    if int(loc1[-1]) == int(loc2[-1]) + 1 or \
       int(loc1[-1]) == int(loc2[-1]) - 1 or \
       ord(loc1[-4]) == ord(loc2[-4]) + 1 or \
       ord(loc1[-4]) == ord(loc2[-4]) - 1:
        return True
    else:
        return False


def find_matches(loc, loc_keys):
    if loc not in loc_keys:
        return None

    keys = loc_keys[loc]  # Set of keys for the input location
    matches = set([])
    for i in loc_keys.keys():
        # {*()} is an empty set literal
        if is_adjacent(loc, i) and loc_keys[i].intersection(keys) != {*()}:
            matches.add(i)

    return matches


# Call find_matches( <some LocationIndex>, getData() )
eicksl
  • 852
  • 8
  • 8