My requirement is to enter strings into an array which are not in the array. I also need to maintain fixed indexes, as this array will be used with other data structure with a one-to-one relation with each index. At present i am using the ArrayList
class and checking with the indexOf ()
method to check if it exists first, if not then add it into the list with the add ()
method with one argument. I am not familiar to the classes in java, and therefore could not understand how can i implement it with HashMap
or something else (trie or else), which will make the loading process fast .
Do the indexOf ()
in ArrayList
makes a sequential search ?
My point is to reduce the processing time when loading the words into the array, with not inserting duplicates, and maintain fixed index of the elements. If a word tested is already in the array, then the index in which it is already inserted is required, as this index is needed to index into some other structure and do some processing. Any suggestions to make this process better?
UPDATE
There is an array, i have some documents from where i need to scan each word and find unique words in the documents. But also i need to count the number of duplicates. Stated in other way, i need to count the term frequencies of the unique terms occurring in the documents. I am maintaining a ArrayList<Integer[]>
of term frequency (number of terms x number of docs). I am fetching one word and then checking if it is in the word list with the indexOf ()
method. If it is not present in the word list, then i am inserting the word into the list, and allocating a new row in the 2d array (the Array<Integer[]>
) and then setting the count of the term element in 2d array to 1. If the word is already in the word array, then i use the index of the word in the array to index in the row of the Array<Integer[]>
matrix, and use the current under processing document number to get to the cell and increment the count.
My question is to reduce the indexOf ()
processing time for each word i am currently using. I need to get the index of the word in the word array if it is already in there, and if it is not in there then i need to insert it into the array dynamically.
Sample Code
import java.io.*;
import java.util.ArrayList;
import static java.lang.Math.log;
class DocumentRepresentation
{
private String dirPath;
private ArrayList<String> fileNameVector;
private ArrayList<String> termVector;
private ArrayList<Integer[]> tf; /* store it in natural 2d array */
private Integer df[]; /* do normal 1d array */
private Double idf[]; /* do normal 1d array */
private Double tfIdf[][]; /* do normal 2d array */
DocumentRepresentation (String dirPath)
{
this.dirPath = dirPath;
fileNameVector = new ArrayList<String> ();
termVector = new ArrayList<String> ();
tf = new ArrayList<Integer[]> ();
}
/* Later sepatere the internal works */
public int start ()
{
/* Load the files, and populate the fileNameVector string */
File fileDir = new File (dirPath);
int fileCount = 0;
int index;
if (fileDir.isDirectory () == false)
{
return -1;
}
File fileList[] = fileDir.listFiles ();
for (int i=0; i<fileList.length; i++)
{
if (fileList[i].isFile () == true)
{
fileNameVector.add (fileList[i].getName ());
// System.out.print ("File Name " + (i + 1) + ": " + fileList[i].getName () + "\n");
}
}
fileCount = fileNameVector.size ();
for (int i=0;i<fileNameVector.size (); i++)
{
System.out.print ("Name " + (i+1) + ": " + fileNameVector.get (i) + "\n");
}
/* Bind the files with a buffered reader */
BufferedReader fileReaderVector[] = new BufferedReader [fileCount];
for (int i=0; i<fileCount; i++)
{
try
{
fileReaderVector[i] = new BufferedReader (new FileReader (fileList[i]));
}
/* Not handled */
catch (FileNotFoundException e)
{
System.out.println (e);
}
}
/* Scan the term frequencies in the tf 2d array */
for (int i=0; i<fileCount; i++)
{
String line;
try
{
/*** THIS IS THE PLACE OF MY QUESTION **/
while ((line = fileReaderVector[i].readLine ()) != null)
{
String words[] = line.split ("[\\W]");
for (int j=0;j<words.length;j++)
{
if ((index = termVector.indexOf (words[j])) != -1)
{
tf.get (index)[i]++;
/* increase the tf count */
}
else
{
termVector.add (words[j]);
Integer temp[] = new Integer [fileCount];
for (int k=0; k<fileCount; k++)
{
temp[k] = new Integer (0);
}
temp[i] = 1;
tf.add (temp);
index = termVector.indexOf (words[j]);
}
System.out.println (words[j]);
}
}
}
/* Not handled */
catch (IOException e)
{
System.out.println (e);
}
}
return 0;
}
}
class DocumentRepresentationTest
{
public static void main (String args[])
{
DocumentRepresentation docSet = new DocumentRepresentation (args[0]);
docSet.start ();
System.out.print ("\n");
}
}
Note: code is snipped to keep the focus on the question