Questions tagged [data-mining]

Data mining is the process of analyzing large amounts of data in order to find patterns and commonalities.

Data mining, also known as knowledge discovery, is the process of digging through and analyzing enormous sets of data and then extracting the meaning of the data. Data mining tools like SQL Server Analysis Services, predict behaviors and future trends, allowing businesses to make proactive, knowledge-driven decisions. Data mining tools can answer business questions that traditionally were too time consuming to resolve. They scour databases for hidden patterns, finding predictive information that experts may miss because it lies outside their expectations. Input to learning mining algorithms is called cases, samples, examples, instances, events, and observations.

3094 questions
1
vote
1 answer

Sweep through all machine learning classifiers?

I'm using Weka to perform classification, clustering, and some regression on a few large data sets. I'm currently trying out all the classifiers (decision tree, SVM, naive bayes, etc.). Is there a way (in Weka or other machine learning toolkit) to…
stackoverflowuser2010
  • 38,621
  • 48
  • 169
  • 217
1
vote
1 answer

Efficient nested set comprehension in clojure

I have a list of 100 transactions, each containing 100 items. I need to find the most frequent sets of items that appear together. One of the things I have to do a lot to accomplish this is calculate the support of various itemsets among the…
Kevin
  • 661
  • 6
  • 14
1
vote
1 answer

How to Save an Input table in Mining Model Prediction tab in SSAS

I am using SSAS (Visual Studio 2010) to create a Decision Tree model. After the model has been created I can go to the Mining Model Prediction tab to "score" another data set against the model. However, if I close the Mining Structure (or the…
DmitryV
  • 31
  • 3
1
vote
1 answer

How to handle time series data with other attributes in machine learning?

I'm working on a binary classification problem, and if each data instance has several time series of different metrics and there're also some other attributes. How to deal with the time series, treat them as they are separate attributes? But that…
user1552372
  • 111
  • 6
1
vote
1 answer

unable to upload CSV file for WEKA analysis - java

I am working on a big data analysis project and i am stuck at this point I am trying to upload a CSV file with data and want to use WEKA java API to perform the analysis. I am looking to tokenize the text, remove stop words, identify pos and filter…
pret
  • 273
  • 3
  • 6
  • 18
1
vote
1 answer

K means Clustering in R

I have a data frame with given structure. District Value1  Value2  Value3 X         1200   1500   1420 Y         1456   1458   1247 Z         1245   1689   1200 I used K-means function in R to cluster…
Sudo
  • 651
  • 2
  • 7
  • 18
1
vote
1 answer

Database and application design for frequent itemset generation

I'm datamining match data from an online game where each match is 5 on 5 with each player picking a unique character or hero at the start of the match. My ultimate goal is to use frequent itemset generation to determine which hero combinations are…
1
vote
2 answers

Information mining, classification, modification

Any examples, tips, guidance for the following scenario? I have retrieved updates from several different news websites. I then analyse that information to predict on current trend in the world. I could only find the information on data mining when…
1
vote
0 answers

Use pre-computed model for text classification in Weka

I have a task of sentiment analysis. I have tweets (labelled as negative or positive) as training data. I created a model out of it using StringToWordVector and NaiveBayesMultinomial. code: try{ TextDirectoryLoader loader = new…
1
vote
2 answers

Grouping to extract common values in semi-structured data

I've got a 'somewhat' ugly field in a database which holds the names of locations. For instance, Madison Square Gardens which has also been entered as "The Madison Square Gardens", etc. etc. I'm trying to extract the data so that I can get an…
pedalpete
  • 21,076
  • 45
  • 128
  • 239
1
vote
3 answers

Sequential Pattern - Data Mining

I am new to data mining, so I apologize if this question may be an obvious question to anyone. I know there are quite a few data mining algorithms out there, such as sequential pattern mining, or the apriori algorithm. I would like to know if the…
user2554121
  • 225
  • 1
  • 7
  • 17
1
vote
1 answer

Difference between input attribute and predictable attribute

Could anyone please clarify the difference between input attribute and predictable attribute for decision tree algorithm in Data mining. Thanks.
kewl
  • 53
  • 1
  • 1
  • 4
1
vote
1 answer

How to deal with frequent classes?

I'm working on a classification task in Weka and got the problem that my class to predict has one value that is very frequent (about 85%). This leads to a lot of learning algorithms just predicting this frequent value of this class for a new…
1
vote
0 answers

Ask about SMOTE in DMwR package

I use SMOTE method by typing: New<-SMOTE(Y~.,origin,perc.over=1300,k=5,perc.under=100) But there is a warning: Error in colnames<-(*tmp*, value = c("Y", "X1", "X2", "X3", "X4", : 'names' attribute [12] must be the same length as the vector…
1
vote
1 answer

Why would I train a prediction model on global data, then use regional data for input?

I am working through the Adventure Works data mining examples on the Microsoft website. In it, we are going to train a model using all sales data globally, then use the data for a region and bike model as inputs. Wouldn't this just predict…
Camron B
  • 1,650
  • 2
  • 14
  • 30