0

I need help in grouping the texts ..I have a list of merchants like this and we can see that first few belong to CENTURYLINK next to SMART ATT ..is there a way to group/label these texts with a single label or categorize these texts as per the pool they fall into ..

Thanks in advance

001 CENTURYLINK IREP

003 CENTURYLINK MY ACCOUNT

003-ClearTalk Wireless

004 CENTURYLINK IVR

005 CENTURYLINK RECURRING

006 CENTURYLINK WIFI

007 CENTURYLINK CABLE

111 SMART ATT

112 SMART ATT

113 - SMART - ATT

114 SMART ATT

120 - SMART - ATT

131 - SMART - ATT

137 - SMART - ATT

A WIRELESS AMERY

A WIRELESS ANNA

A WIRELESS APTOS

A WIRELESS ARCADIA

A WIRELESS ARNOLDS PAR

A WIRELESS ASHLAND

A WIRELESS ATHENS

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
pskumar
  • 37
  • 1
  • 4

1 Answers1

0

You have a few options. Among the simplest would be to match vendor substrings, as follows:

import java.util.Arrays;
import java.util.LinkedList;
import java.util.List;
import java.util.Map;
import java.util.TreeMap;
import java.util.stream.Collectors;

public class GroupVendors {
    public static void main(final String[] args) {
        final List<String> vendors = Arrays.asList(
            "CENTURYLINK",
            "SMART",
            "ATT",
            "A WIRELESS");

        final List<String> uncategorizedVendors = Arrays.asList(
            "001 CENTURYLINK IREP",
            "003 CENTURYLINK MY ACCOUNT",
            "003-ClearTalk Wireless",
            "004 CENTURYLINK IVR",
            "005 CENTURYLINK RECURRING",
            "006 CENTURYLINK WIFI",
            "007 CENTURYLINK CABLE",
            "111 SMART ATT",
            "112 SMART ATT",
            "113 - SMART - ATT",
            "114 SMART ATT",
            "120 - SMART - ATT",
            "131 - SMART - ATT",
            "137 - SMART - ATT",
            "A WIRELESS AMERY",
            "A WIRELESS ANNA",
            "A WIRELESS APTOS",
            "A WIRELESS ARCADIA",
            "A WIRELESS ARNOLDS PAR",
            "A WIRELESS ASHLAND",
            "A WIRELESS ATHENS");

        final Map<String, List<String>> categorizedVendors = new TreeMap<>();

        for (final String vendor : vendors) {
            categorizedVendors.put(vendor, new LinkedList<String>());
        }

        for (final String vendor : uncategorizedVendors) {
            for (final Map.Entry<String, List<String>> entry : categorizedVendors.entrySet()) {
                final String category = entry.getKey();
                if (vendor.contains(category)) {
                    final List<String> bin = entry.getValue();
                    bin.add(vendor);
                }
            }
        }

        for (final Map.Entry<String, List<String>> entry : categorizedVendors.entrySet()) {
            final String category = entry.getKey();
            final List<String> bin = entry.getValue();
            System.out.printf("vendors(\"%s\") = {%n", category);
            if (!bin.isEmpty()) {
                System.out.printf("    %s%n",
                    bin.stream()
                        .map((vendor) -> String.format("\"%s\"", vendor))
                        .collect(Collectors.joining(",\n    ")));
            }
            System.out.println("}");
        }
    }
}

Sample run:

% java GroupVendors
vendors("A WIRELESS") = {
    "A WIRELESS AMERY",
    "A WIRELESS ANNA",
    "A WIRELESS APTOS",
    "A WIRELESS ARCADIA",
    "A WIRELESS ARNOLDS PAR",
    "A WIRELESS ASHLAND",
    "A WIRELESS ATHENS"
}
vendors("ATT") = {
    "111 SMART ATT",
    "112 SMART ATT",
    "113 - SMART - ATT",
    "114 SMART ATT",
    "120 - SMART - ATT",
    "131 - SMART - ATT",
    "137 - SMART - ATT"
}
vendors("CENTURYLINK") = {
    "001 CENTURYLINK IREP",
    "003 CENTURYLINK MY ACCOUNT",
    "004 CENTURYLINK IVR",
    "005 CENTURYLINK RECURRING",
    "006 CENTURYLINK WIFI",
    "007 CENTURYLINK CABLE"
}
vendors("SMART") = {
    "111 SMART ATT",
    "112 SMART ATT",
    "113 - SMART - ATT",
    "114 SMART ATT",
    "120 - SMART - ATT",
    "131 - SMART - ATT",
    "137 - SMART - ATT"
}

I've made the assumption that the list of vendor categories you are interested in is "CENTURYLINK", "SMART", "ATT", and "A WIRELESS". This has the effect of categorizing all entries containing both "SMART" and "ATT" in both their bins. If you want each vendor to be categorized in exactly one bin, then you will need to resolve which vendor you prefer when the categories are redundant.

Dylon
  • 1,730
  • 15
  • 14