1

I have a typical pattern searching problem where I need to identify where multiple patterns are appearing within an array and single them out.

ex: ['horse', 'camel', 'horse', 'camel', 'tiger', 'horse', 'camel', 'horse', 'camel']

function should return

['horse', 'camel'], 
['horse', 'camel', 'horse'],
['camel', 'horse', 'camel'],
['horse', 'camel', 'horse', 'camel']

i.e. finding patterns that are repeating within an array which can become a sub-array,

Or the other way of defining is -> Find all the sub-arrays which are occurring more than 1 times in main array.

i.e. resulting arrays should have length > 1 ->

[1, 2, 3, 1, 2, 1, 4, 5] => [1,2,3] and [1,4,5] both are sub-arrays but [1,2,3] is recurring/repeating sub-array NOT [1,4,5]

Looking for a suitable efficient algorithm instead of brute-force looping solutions.

Nishutosh Sharma
  • 1,926
  • 2
  • 24
  • 39
  • Please be more specific about the output you want. Right now it looks like you want two arrays to be returned. It's better if you provide a detailed description of the problem – pkacprzak Oct 19 '16 at 07:56
  • @pkacprzak I've edited the question to add more explanation, let me know if it explains the problem statement now. – Nishutosh Sharma Oct 19 '16 at 08:01
  • Still not clear. You have to define what it means for you that a subarrays occurs in the array. – pkacprzak Oct 19 '16 at 08:05
  • @NishutoshSharma did you try Floyd cycle finding algorithm? – Kaidul Oct 19 '16 at 08:07
  • Is a good algorithm any different from a string algorithm enumerating substrings? (Think twice.) – greybeard Oct 19 '16 at 09:41
  • @greybeard , I had a thought on it before coming to this conslusion-> Array -> ['horse', 'camel', 'horse', 'camel', 'tiger', 'horse', 'camel', 'horse', 'camel'] |=> String -> "horse camel horse camel tiger horse camel horse camel" |=> Substring -> "horse camel", "orse came", "rse camel", ...... It will craete 70-80% of waste for me. And increase complexity in code readability. I have another questions in substring category here : http://stackoverflow.com/questions/40111101/find-all-repeating-substrings-in-a-string-ignoring-spaces-on-left-right-of-final. What say ? – Nishutosh Sharma Oct 19 '16 at 10:14
  • 1
    What if you mapped words to characters ('horse' => 'h', 'camel' => 'c', 'tiger' => 't'), then word array to string 'hchcthchc', and looked for good solutions for _enumerate all recurring substrings_? (Think twice.) – greybeard Oct 19 '16 at 12:06
  • @greybeard It will lead to extra effort in identifying unique mnemonics ( tiger, tigress, tarantula) yes, once we have them, that can bring down the best substring solution. Why don't you it as an answer with some gist/logic/code(be in any language). – Nishutosh Sharma Oct 19 '16 at 12:11
  • @KaidulIslam That find cycles, this is not a cycle necessarily, it is just a recurrence. I had a look at it. it will fine [1,2,1,2,1,2,3,1,2] -> (1,2) occurs 3 times in a cycle. Whereas I want all recurrences -> (1,2) occurs 4 times in a cycle. – Nishutosh Sharma Oct 19 '16 at 12:27
  • @pkacprzak It should be clear now. Although if we go by definition sub-array means actually a part of an array. – Nishutosh Sharma Oct 19 '16 at 12:35
  • 1
    would `[camel, horse]` be included in the output? – Bobas_Pett Oct 20 '16 at 07:02
  • @Bobas_Pett In this algo. it is fine to recieve all the patterns, Once we hae all the patterns and their occurence number, it can be refined later on, I think that is efficient approach to move Single Responsibility principle, Instead one ball of mud doing everything in the same chunk of code. – Nishutosh Sharma Oct 20 '16 at 15:43
  • The `[1,2,3,1,2,1,4,5]` example is strange: how is `[1,2,3]` - occurring exactly once - any more repeating than`[1,4,5]`? What about `[1 2 1 2 1]` - [1 2] and [2 1] are no-brainers, but do you consider `[1 2 1]`to occur twice? – greybeard Oct 20 '16 at 23:40
  • @greybeard They are overlapping, hence Not to be considered. So, all Non-overlapping recurring patterns are to be taken into account. – Nishutosh Sharma Oct 21 '16 at 10:08

1 Answers1

1

This probably isn't what you want but I don't know what you have tried yet so maybe it could be useful. Here's my direct approach which probably falls under your "brute-force looping solutions" but I figured give it a try since nobody has posted full answer.

In java:

// use this to not add duplicates to list
static boolean contains (List<String[]> patterns, String[] pattern){
    for(String[] s: patterns)
        if (Arrays.equals(pattern,s)) return true;
    return false;
}


/**
 *
 * @param str String array containing all elements in your set
 * @param start index of subarray
 * @param end index of subarray
 * @return if subarray is a recurring pattern
 */
static boolean search (String[] str,int start,int end) {
    // length of pattern
    int len = end - start + 1;

    // how many times you want pattern to
    // appear in text
    int n = 1;

    // increment m if pattern is matched
    int m = 0;

    // shift pattern down the array
    for (int i = end+1; i <= str.length - len; i++) {
        int j;
        for (j = 0; j < len; j++) {
            if (!str[i + j].equals(str[start + j]))
                break;
        }

        // if pattern is matched at [i to i+len]
        if (j == len) {
            m++;
            if (m == n) return true;
        }
    }
    return false;
}


/**
 *
 * @param str String array containing all elements in your set
 * @return a list of subsets of input set which are a recurring pattern
 */
static List<String[]> g (String[] str) {
    // put patterns in here
    List<String[]> patterns = new ArrayList<>();

    // iterate through all possible subarrays in str
    for(int i = 0; i < str.length-1; i++){
        for(int j = i + 1; j < str.length; j++){

            // if a pattern is found
            if (search(str,i,j)) {
                int len = j-i+1;
                String[] subarray = new String[len];
                System.arraycopy(str,i,subarray,0,len);
                if (!contains(patterns,subarray))
                    patterns.add(subarray);

            }
        }
    }
    return patterns;
}

public static void main(String[] args) {

    String[] str = {"horse", "camel", "horse", "camel", "tiger",
                    "horse", "camel", "horse", "camel"};
    // print out
    List<String[]> patterns = g(str);
    for (String[] s: patterns)
        System.out.println(Arrays.toString(s));
}

Output:

[horse, camel]
[horse, camel, horse]
[horse, camel, horse, camel]
[camel, horse]
[camel, horse, camel]

As mentioned in a comment i posted:

"would [camel, horse] be included in the output?"

The output I have goes with this as there are 2 instances of [camel, horse] at indices [1-2] and [6-7]. But maybe I am completely misunderstanding your question and I'm not understanding the constraints.

As for optimizing, the search(...) method for example is just a simple substring search there are some more optimized ways of doing this e.g. Knuth–Morris–Pratt. Sorry if this was exactly what you didn't want but maybe there's some use

Bobas_Pett
  • 591
  • 5
  • 10