Conceptual: Collect "synonyms" from a list of "words"

Question

This question is inspired by: Generating a list of repetitions regardless of the order and its accepted answer: https://stackoverflow.com/a/20336020/1463143

Here, "alphabet" is any set of letters e.g. '012' or 'EDCRFV'

"words" are obtained by doing a cartesian product over the alphabet. We should be able to specify n for getting n-lettered words. Example:

from itertools import product
alphabet = '012'
wordLen = 3
wordList = [''.join(letter) for letter in product(alphabet,repeat=wordLen)]
print wordList

which gives:

['000', '001', '002', '010', '011', '012', '020', '021', '022', '100', '101', '102', '110', '111', '112', '120', '121', '122', '200', '201', '202', '210', '211', '212', '220', '221', '222']

a "synonym" is obtained by... uh... if only I could articulate this...

these lists contain all the possible "synonyms" within wordList:

['000',
 '111',
 '222'] 

['001',
 '002',
 '110',
 '112',
 '220',
 '221']

['010',
 '020',
 '101',
 '121',
 '202',
 '212']

['011',
 '022',
 '100',
 '122',
 '200',
 '211']

['012',
 '021',
 '102',
 '120',
 '201',
 '210']

Sadly, I'm unable to articulate how I obtained the above lists of "synonyms". I would like to do something as above for an arbitrary alphabet forming n-lettered words.

*unable to articulate how I obtained the above lists of "synonyms"*. Sadly, so are we. Without any rules on how you obtained those synonyms we are left in the dark to guess at the rules involved. I certainly see no pattern between synonyms. — Martijn Pieters, Dec 03 '13 at 12:29
@MartijnPieters First case, all are same, second case First two are same, third case first and last are same, 4th case second and third are same, last case all are different — thefourtheye, Dec 03 '13 at 12:33
@thefourtheye: Indeed, that pattern seems to fit. The OP should have articulated that himself, however. — Martijn Pieters, Dec 03 '13 at 12:37
i got those sequence by starting with a symbol that has lowest value in base 2. then "flip" that symbol. then the next lowest value word and so on. flipping might be more easily evident with alphabet = '01' and the sequence: [000,111],[001, 110],[010,101],[100,011]. i didn't say this in the question because i thought i would have got more votes for closing the question. thanks to @thefortheye i see a pattern in the sequence i hadn't seen. thanks to thg435 for an incredibly simple answer. btw, some slot machines payout according to such patterns ;) i think thg435 can crash many casions lol — samkhan13, Dec 04 '13 at 16:17
also, if you were constructing all possible 7 to 14 lettered words from an alphabet that had all upper and lower case english letters, digits and special characters because you were trying out brute-force cracking or a [dictionary attach](http://en.wikipedia.org/wiki/Dictionary_attack) you would still need a way to sensibly divide up the dictionary so that you could use multiprocessing :P you can now take each set of "synonyms" and find "synonyms" within them and then start a job with each smaller list :D — samkhan13, Dec 04 '13 at 16:36
@MartijnPieters "multiple realizability" is when different set of rules or structures produce the same functional outcome. that phrase is currently only used in "philosophy of mind" but according to me it is a general phenomenon within this universe. e.g. you can obtain electricity from a coal power plant, from a solar panel, from a hydrogen fuel cell, etc. the structures of the power plant are different but the ultimate outcome of electrical energy is the same. likewise, i was hoping to see different ways of achieving "synonyms". btw, every point and part of the universe is a "pattern" ;) — samkhan13, Dec 04 '13 at 16:48

score 3 · Accepted Answer · answered Dec 03 '13 at 12:48

3

Looks quite easy:

syns = collections.defaultdict(list)

for w in wordList:
    hash = tuple(w.index(c) for c in w)
    syns[hash].append(w)

print syns.values()

answered Dec 03 '13 at 12:48

georg

211,518
52
313
390

score 1 · Answer 2 · answered Dec 03 '13 at 12:52

1

A:

[ word for word in wordList 
    if  word[0] == word[1]
    and word[0] == word[2] ]

B:

[ word for word in wordList 
    if  word[0] == word[1]
    and word[0] != word[2] ]

C:

[ word for word in wordList 
    if  word[0] != word[1]
    and word[0] == word[2] ]

D:

[ word for word in wordList 
    if  word[0] != word[1]
    and word[1] == word[2] ]

E:

[ word for word in wordList 
    if  word[0] != word[1]
    and word[0] != word[2] ]

So, its groups of all variations of equality letters in word:
'abc' -> a<>b, b=c, c<>a ; a=b, b=c, c=a ; etc..

every empty result (for ex: a<>b, b=c, c=a) is excluded

answered Dec 03 '13 at 12:52

akaRem

7,326
4
29
43

i think this is a really worthy attempt. i'm going to try this with larger alphabet. i've accepted thg435's answer because it is more pythonic. – samkhan13 Dec 04 '13 at 16:21
I think I didn't understand your question – akaRem Dec 04 '13 at 20:21

score 0 · Answer 3 · answered Dec 03 '13 at 13:43

It seems the rule you want (for larger n as well) is the following:

A word u is a synonym of v iff u can be obtained from v by swapping two characters in the alphabet, i.e. all the words obtained from all alphabet's permutations will be synonyms.

Example: Let u = 001, and alphabet be 012.

There are six permutations of the alphabet: '012', '021', '102', '120', '201', '210'. Map u with all this permutations to get synonyms for u:

'001'
'002'
'110'
'112'
'220'
'221'

Conceptual: Collect "synonyms" from a list of "words"

3 Answers3