0

Hi I am parsing xml tags without using any parser just using StringUtils.substring as I need only 2 tags values. after I get these values I am adding it to list and with these 2 lists I am preparing map with values and keys. This Hash map I want to add it to file. If values are already exist then no need to add else add. But I am facing error in adding it to Hashmap and traversing thru hashmap to check if hashmap key/values exist in file reader read line.

public class CompName {

    /**
     * @param args
     * @throws IOException 
     */
    public static void main(String[] args) throws IOException {
        // TODO Auto-generated method stub

        File file = new File("xml/input1.xml");
        ArrayList<String> email = new ArrayList<String>();
        ArrayList<String> comp = new ArrayList<String>();
        Map<ArrayList<String>,ArrayList<String>> compIdmap = new LinkedHashMap<ArrayList<String>,ArrayList<String>>();
        try {
            BufferedReader br = new BufferedReader(new FileReader(file));
            br.readLine();
            while(true){
                String line =br.readLine();
                //System.out.println("line "+line);
                if(line == null) break;
            if(line.contains("<CompanyName>"))
            {
                String compName = StringUtils.substringBetween(line, "<CompanyName>", "</CompanyName>");  //str =" middle "
                System.out.println(compName);
                comp.add(compName);
            }
            if(line.contains("<CorporateEmailAddress>"))
            {
                String emailId = StringUtils.substringBetween(line, "<CorporateEmailAddress>", "</CorporateEmailAddress>");  //str =" middle "
                if(emailId == null || emailId.equals(""))
                    emailId = "unknown";
                System.out.println(emailId);
                email.add(emailId);
            }

               for(int i=0;i<email.size();i++)
               {
                   compIdmap.put(email, comp);
               }
            }
            System.out.println("mapping :"+compIdmap);
BufferedWriter br1 = new BufferedWriter(new FileWriter("xml/mapping.txt"));
            Iterator it = compIdmap.entrySet().iterator();
            while (it.hasNext()) {
                Map.Entry pair = (Map.Entry)it.next();
                System.out.println(pair.getKey() + " = " + pair.getValue());
                br1.write(pair.getKey() + " = " + pair.getValue());
                it.remove(); // avoids a ConcurrentModificationException
            }

        } catch (FileNotFoundException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }


    }

}

input xml file to check these tags is as below

    <?xml version="1.0" encoding="UTF-8"?>
<!-- Data provided by Bloomberg LP. -->
<FileDump>
<Version>IBXML 1.3</Version>
    <CompanyName>STANDARD CHARTERED B</CompanyName>
<EmailAddress>abc@gmail.com</EmailAddress>
<CorporateEmailAddress></CorporateEmailAddress>


<CompanyName>STANDARD CHARTERED B</CompanyName>
<EmailAddress>abc@gmail.com</EmailAddress>
<CorporateEmailAddress></CorporateEmailAddress>



<CompanyName>DBS BANK LIMITED HON</CompanyName>
<EmailAddress>nnn@bbg.com</EmailAddress>
<CorporateEmailAddress>nicholas@123.com</CorporateEmailAddress>

<CompanyName>DBS BANK LIMITED HON</CompanyName>
<EmailAddress>nnn@bbg.com</EmailAddress>
<CorporateEmailAddress>nicholas@123.com</CorporateEmailAddress>

<CompanyName>DBS BANK LIMITED HON</CompanyName>
<EmailAddress>nnn@bbg.com</EmailAddress>
<CorporateEmailAddress>nicholas@123.com</CorporateEmailAddress>

<CompanyName>DBS BANK (HONG KONG)</CompanyName>
<EmailAddress>www@bbg.com</EmailAddress>
<CorporateEmailAddress>WHEEL@123.com</CorporateEmailAddress>

<CompanyName>DBS BANK (HONG KONG)</CompanyName>
<EmailAddress>www@bbg.com</EmailAddress>
<CorporateEmailAddress>WHEEL@123.com</CorporateEmailAddress>
</FileDump>

I am expecting file output mapping.txt should be

unknown STANDARD CHARTERED B
nicholas@123.com DBS BANK LIMITED HON
WHEEL@123.com DBS BANK (HONG KONG)
  • 2
    you are facing an error? can you give us a hint what it is?? – Sharon Ben Asher Sep 12 '16 at 06:44
  • Hi, I have edited the question with my file writing code and I am expecting file to contain proper mapping but hashmap contains so many duplicates I guess so not able to populate and write to file. – mangala udupa Sep 12 '16 at 08:51

1 Answers1

0

There are several problems with the code, the first being you defined the key and value to the map as ArrayList. the key simply cannot be an array list - it has no logical meaning and if you wish the values to be distinct, use Set.
note: the way I understand it an email belongs to one company, so whay not map one-to-one??
and why does it have to be LinkedHashMap? do you care about the order of key insertion?

Here is a working solution

public static void main(String[] args) throws IOException
{
    File file = new File("xml/input1.xml");

    // main data structure
    // key - corporate email
    // value - set of distinct companies
    // (does this make sense? a corporate email belongs to one company, no? 
    Map<String, Set<String>> compIdmap = new HashMap<String, Set<String>>();

    // making use of Java 7 try-with-resources to auto close the file after use 
    try (BufferedReader br = new BufferedReader(new FileReader(file))) {
        String line, compName = "", email = "";
        while ((line = br.readLine()) != null) {
            if (line.contains("<CompanyName>")) {
                compName = StringUtils.substringBetween(line, "<CompanyName>", "</CompanyName>");
            }
            if (line.contains("<CorporateEmailAddress>")) {
                email = StringUtils.substringBetween(line, "<CorporateEmailAddress>", "</CorporateEmailAddress>"); 
                if (email == null || email.equals("")) email = "unknown";
                Set<String> companiesSet = compIdmap.containsKey(email) ? compIdmap.get(email) : new HashSet<>();
                companiesSet.add(compName);
                compIdmap.put(email, companiesSet);
            }
        }
        System.out.println("mapping :" + compIdmap);
        BufferedWriter br1 = new BufferedWriter(new FileWriter("xml/mapping.txt"));
        Iterator it = compIdmap.entrySet().iterator();
        while (it.hasNext()) {
            Map.Entry pair = (Map.Entry) it.next();
            System.out.println(pair.getKey() + " = " + pair.getValue());
            br1.write(pair.getKey() + " = " + pair.getValue());
            it.remove(); // avoids a ConcurrentModificationException
        }
    } catch (FileNotFoundException e) {
        e.printStackTrace();
    }
}

EDIT:

Here is a solution that produces the exact desired output

public static void main(String[] args)
{
    File file = new File("xml/input1.xml");

    // contains email + " " + company 
    Set<String> emailAndCompanySet = new HashSet<>();

    // making use of Java 7 try-with-resources to auto close the file after use
    try (BufferedReader br = new BufferedReader(new FileReader(file))) {
        String line, compName = "", email = "";
        while ((line = br.readLine()) != null) {
            if (line.contains("<CompanyName>")) {
                compName = StringUtils.substringBetween(line, "<CompanyName>", "</CompanyName>");
            }
            if (line.contains("<CorporateEmailAddress>")) {
                email = StringUtils.substringBetween(line, "<CorporateEmailAddress>", "</CorporateEmailAddress>");
                if (email == null || email.equals(""))
                    email = "unknown";
                emailAndCompanySet.add(email + " " + compName);

            }
        }
        System.out.println("mapping :" + emailAndCompanySet);
        BufferedWriter br1 = new BufferedWriter(new FileWriter("xml/mapping.txt"));
        for (String emailAndCompany : emailAndCompanySet) {
            System.out.println(emailAndCompany);
            br1.write(emailAndCompany);
        }
    } catch (Exception e) {
        e.printStackTrace();
    }
}
Sharon Ben Asher
  • 13,849
  • 5
  • 33
  • 47
  • Hey thanks but output is coming with square braces DO I need to split this again ? or any other way to extract the contents ? unknown = [, STANDARD CHARTERED B] for some input file output for unknown emails have square braces and comma and then standard chartd b. How to avoid this ? – mangala udupa Sep 12 '16 at 14:13
  • I did this change .. how efficient is this or do u suggest any better code to avoid comma and square braces from keys/values br1.write(pair.getKey().toString().replaceAll("[\\[\\],]","") + " = " + pair.getValue().toString().replaceAll("[\\[\\],]","")+"\n"); – mangala udupa Sep 12 '16 at 14:22
  • this is the result of String presentation of `Set`. if you just want `email + " " + company` you can build this during xml parsing. see edited answer – Sharon Ben Asher Sep 12 '16 at 15:42
  • Hi, If I give set of files as input its taking only last file like this File folder = new File("C:/Users/mangala/workspace/XMLParsing/xml"); File[] listOfFiles = folder.listFiles(); for (int i = 0; i < listOfFiles.length; i++) { File file = listOfFiles[i]; if (file.isFile() && file.getName().endsWith(".xml")) { – mangala udupa Sep 13 '16 at 12:46
  • if multiple files are there for input then only last file emailids r creating into mapping.txt file. Any looping problem ? Do I need to write file after all hash map is formed ? – mangala udupa Sep 13 '16 at 12:48
  • any suggestion if multiple files as input? – mangala udupa Sep 14 '16 at 14:10
  • you can run the program each file at a time – Sharon Ben Asher Sep 15 '16 at 09:14