2

I have a string array of stopWords and string array of input texts i.e.

string[] stopWords = File.ReadAllLines(@"C:\stopWords.txt");

and

con.Open();
SqlCommand query = con.CreateCommand();
query.CommandText = "select p_abstract from aminer_paper where pid between 1 and 500 and DATALENGTH(p_abstract) != 0";

SqlDataReader reader = query.ExecuteReader();

var summary = new List<string>();
while(reader.Read())
{
    summary.Add(reader["p_abstract"].ToString());
}

reader.Close();

string[] input_Texts = summary.ToArray();

Now, I have to use these stopWords array to remove from input_Texts array. I have used following technique but not working, weird while accessing both arrays index. For example, take first text at index 0 of input_Texts array i.e.

input_Texts[0]

and then match all the word strings in stopWords array i.e.

// have to match all the indexes of stopWords[] with input_Texts[0]
stopWords[]   

then after removing all the stopWords from index 0 text of input_Texts array, have to repeat it for all the texts in input_Texts array.

Any suggestions and code samples with modifications will be highly appreciated with acknowledgment.

Thanks.

Behzad
  • 3,502
  • 4
  • 36
  • 63
maliks
  • 1,102
  • 3
  • 18
  • 42

4 Answers4

5

try this:

string[] result = input_Texts.Except(stopWords).ToArray();
Hassan
  • 5,360
  • 2
  • 22
  • 35
1

You can use Linq to do this

        //string[] input_Text = new string[] { "Ravi Kumar", "Ravi Kumar", "Ravi Kumar" }; 
        //string[] stopWords = new string[] { "Ravi" }; 
        for(int i=0;i<input_Text.Count();i++)
        {
            for (int j = 0; j < stopWords.Count(); j++)
            {
                   input_Text[i] = input_Text[i].Replace(stopWords[j]," ");
            }
        }
Ravi Kumar Mistry
  • 1,063
  • 1
  • 13
  • 24
0

Can also be done like this:

for(int i=0;i<input_Texts.Length;i++)
  {
    input_Texts[i]=string.Join(" ", input_Texts[i].Split(' ').Except(input_Texts[i].Split(' ').Intersect(stopWords)));
  }

This will process each text in input_Texts and remove all the stop words from it.

tariq
  • 2,193
  • 15
  • 26
0
using System;
using System.IO;
using System.Collections.Generic;
using System.Collections.Specialized;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
using System.Threading.Tasks;
using System.Data;
using System.Data.SqlClient;

namespace StopWords_Removal
{
    class Program
    {
        static void Main(string[] args)
        {
            try
            {

                string[] stopWords = File.ReadAllLines(@"C:\stopWords.txt");

                SqlConnection con = new SqlConnection("Data Source=ABC;Initial Catalog=xyz;Integrated Security=True");

                con.Open();
                SqlCommand query = con.CreateCommand();
                query.CommandText = "select text from table where id between 1 and 500 and DATALENGTH(text) != 0";

                SqlDataReader reader = query.ExecuteReader();

                var summary = new List<string>();
                while(reader.Read())
                {
                    summary.Add(reader["p_abstract"].ToString());
                }

                reader.Close();
                string[] input_Texts = summary.ToArray();

                for (int i = 0; i < input_Texts.Length; i++)
                {
                    for (int j = 0; j < input_Texts.Length; j++)
                    {
                        input_Texts[j] = string.Join(" ", input_Texts[j].Split(' ').Except(input_Texts[j].Split(' ').Intersect(stopWords)));
                    }
                }

                for (int d = 0; d < input_Texts.Length; d++)
                {
                    Console.WriteLine(input_Texts[d]); 
                    Console.ReadLine();
                }

            }
            catch (Exception e)
            {
                Console.WriteLine("Exception: " + e.Message);
            }
            finally
            {
                Console.WriteLine("Executing finally block.");
            } 
        }
    }
}
maliks
  • 1,102
  • 3
  • 18
  • 42
  • Why are you using nested loop ... just use one loop , you are using for(int j=0..... – tariq Jun 19 '15 at 10:01
  • what i have written is different – tariq Jun 19 '15 at 10:01
  • nested loop is because of we have to compare each stop word in stopWords array with every single text from input_Texts array – maliks Jun 19 '15 at 10:06
  • Check this question Mr. Tariq "stop words removal using c#", you may have better understanding of the question – maliks Jun 19 '15 at 10:15
  • I have elaborated the question more elegantly there "stop words removal using c#" you may have your own solution for this please – maliks Jun 19 '15 at 10:17