-2

I have a dictionary with long strings as keys and sets as values. I also have a list of keywords. For example,

dict1 = {"This is the long key with 9 in it.": {'value1'}, 'I have another long string with 4 and keyword': {'value2'}} 
list_of_keywords = ['this', 'is', 'a', 'keyword']

I would like to filter the new values into a tuple with either digits or words from the keyword list. So the above dictionary would be turned into

final_dict1 = {('9', 'this', 'is'): {'value1'}, ('4', 'keyword'): {'value2'}}

I have two regular expressions below that work and I have a function that does most of what I would like it to do:

import re
digit_regxep = r"\s\b\d{1,3}\b"
keyword_regexp = r"\b({})\b"

def filter_dict_values_for_keyword_digit(dict1, keyword_regexp, digit_regexp, list_of_keywords, sep='|'):
    formatted_regexp = regexp.format(sep.join(keyword_regexp))
    word = re.compile(formatted_regexp)
    word1 = re.compile(digit_regexp)
    filtered_dict = dict1.update(((list(re.findall(word1, k)), list(re.findall(word, k))), v) for k, v in dict1.items())
    return filtered_dict

but whenever I try to run this I get the following error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 5, in filter_dict_values_for_two_keywords
  File "<stdin>", line 5, in <genexpr>
  File "/anaconda/lib/python3.6/re.py", line 222, in findall
    return _compile(pattern, flags).findall(string)
TypeError: expected string or bytes-like object

Is there something that I am misunderstanding about the composition of my dictionary that is affecting my function? I am having trouble determining whether or not it is a problem in the function or if it is because my initial values are a set as opposed to a string.

JRR
  • 578
  • 5
  • 21
  • 2
    This code is a) not runnable and b) probably wrong anyway. What is `regexp` in the first line of the function? And are you sure you meant to join every character in `keyword_regex` with a `|` character? – Daniel Roseman Jun 11 '18 at 18:03
  • Correcting that line to what you presumably meant - `formatted_regexp = keyword_regexp.format(sep.join(list_of_keywords))` - gives a different error, `TypeError: unhashable type: 'list'`. – Daniel Roseman Jun 11 '18 at 18:07
  • The version of your code as pasted in the question is not the one that produced this error. – Thierry Lathuille Jun 11 '18 at 18:09

1 Answers1

1

Instead of re, you can split each string and check for digits or the exists of a word in list_of_keywords:

import re
dict1 = {"This is the long key with 9 in it.": {'value1'}, 'I have another long string with 4 and keyword': {'value2'}} 
list_of_keywords = ['this', 'is', 'a', 'keyword']
new_results = {tuple(i for i in a.split() if i.isdigit() or i.lower() in list_of_keywords):b for a, b in dict1.items()}

Output:

{('This', 'is', '9'): {'value1'}, ('4', 'keyword'): {'value2'}}
Ajax1234
  • 69,937
  • 8
  • 61
  • 102