3

So far, I have this code, which, in summary, takes two text files and a specified block size in cmd and standardises the txt files, and then puts them into blocks based on the specified block size.

import java.io.*;
import java.util.*;

public class Plagiarism {

    public static void main(String[] args) throws Exception {
        //you are not using 'myPlag' anywhere, you can safely remove it
//      Plagiarism myPlag = new Plagiarism();

        if (args.length == 0) {
            System.out.println("Error: No files input");
            System.exit(0);
        }

        String foo = null;
        for (int i = 0; i < 2; i++) {
            BufferedReader reader = new BufferedReader(new FileReader(args[i]));
            foo = simplify(reader);
            // System.out.print(foo);
            int blockSize = Integer.valueOf(args[2]);

            List<String> list = new ArrayList<String>();
            for (int k = 0; k < foo.length() - blockSize + 1; k++) {
                list.add(foo.substring(k, k + blockSize));
            }
            // System.out.print(list);
        }



    }

    public static String simplify(BufferedReader input)
            throws IOException {

        StringBuilder sb = new StringBuilder();
        String line = null;
        while ((line = input.readLine()) != null) {
            sb.append(line.replaceAll("[^a-zA-Z]", "").toLowerCase());
        }
        return sb.toString();
    }
}

The next thing I would like to do is use Horner's polynomial accumulation method (with set value x = 33) to convert each of these blocks into a hash code. I am completely stumped on this and would appreciate some help from you guys!

Thanks for reading, and thanks in advance for any advice given!

user3364788
  • 99
  • 2
  • 10
  • Is not Horner's method about -usually- numbers? You only have Strings. How do you plan to implement that method? – Gábor Bakos Mar 14 '14 at 12:56
  • 1
    @GáborBakos That's the point - I have to convert these string blocks into hash codes using this method :) – user3364788 Mar 14 '14 at 13:00
  • I thought the result of Horner's method would be another String, although strings are usually not considered to be hashcodes. I guess this is the source of my confusion. – Gábor Bakos Mar 14 '14 at 13:09

1 Answers1

4

Horner's method for hash generation is as simple as

int hash=0;
for(int i=0;i<str.length();i++)
  hash = x*hash + str.charAt(i);
Aki Suihkonen
  • 19,144
  • 1
  • 36
  • 57
  • Can you expand on this? Is it possible to relate it to my example? My string blocks are stored in the ArrayList called "list". So the for loop will have to iterate over all the elements of that I guess? – user3364788 Mar 14 '14 at 13:05
  • 1
    You should know that. You can reset `hash` between every string or not, depending on your application. This function, however, is not suitable to hashing very long strings, because the influence of the first characters will eventually overflow from the integer. – Aki Suihkonen Mar 14 '14 at 13:12
  • I understand that: I have to use modulo p to resolve that. Also because im using ArrayList charAt() doesn't work for this. What is the method for ArrayList? Could you do an example with ArrayList instead? – user3364788 Mar 14 '14 at 13:21
  • Thanks for the answer on that but that's not what I need help with. This code int x = 33; int hash = 0; for (int o = 0; o < list.size(); o++) { hash = x*hash + list.get(o); } Throws an error: required: int found: string can you provide help? – user3364788 Mar 14 '14 at 13:43
  • 2
    Yes you do. `for (String str: list) { for (int i=0;i – Aki Suihkonen Mar 14 '14 at 13:54
  • ah ok thank you. I see what you've done now and it makes sense. Although, i specify my blocksize as 2 (2 chars per block) and get minus values as hash codes. I dont get why :S – user3364788 Mar 14 '14 at 14:02