Questions tagged [data-mining]

Data mining is the process of analyzing large amounts of data in order to find patterns and commonalities.

Data mining, also known as knowledge discovery, is the process of digging through and analyzing enormous sets of data and then extracting the meaning of the data. Data mining tools like SQL Server Analysis Services, predict behaviors and future trends, allowing businesses to make proactive, knowledge-driven decisions. Data mining tools can answer business questions that traditionally were too time consuming to resolve. They scour databases for hidden patterns, finding predictive information that experts may miss because it lies outside their expectations. Input to learning mining algorithms is called cases, samples, examples, instances, events, and observations.

3094 questions
1
vote
1 answer

Use data mining in SQL Server 2008 R2

I have SQL Server 2008 R2 on my computer and I want to use data mining with this version of SQL Server. My question is how can I do this? Because I've read some where that I can use data mining in SQL Server evaluation edition. I can use data mining…
Mohammad hossein
  • 255
  • 3
  • 8
  • 17
1
vote
2 answers

Methods to remove outliers from data using R

I have remove outliers in the modeling data. I am tired trying all methods for removing as there is an outlier that i troubling me a lot after applying many methods . can anyone pleas help me on this..... please.. I hv used…
1
vote
1 answer

KDD1999 dataset Features exolaination

I'm using KDD1999 dataset to prevent intrusion, but i have some questions about the features: can someone explain to me or give me the meaning of the flags. Here is the list of the flags used in the KDD1999 dataset: 'flag' { 'OTH', 'REJ', 'RSTO',…
Nadya Nux
  • 519
  • 1
  • 5
  • 17
1
vote
2 answers

How to classify a small and peculiar subset out of a large database?

I have to perform a data mining task on a database containing informations about insurance policies. Each tuple indicates data about a single policy, along with information regarding the agency that issued it, the customer it is referring to and…
Totem
  • 454
  • 5
  • 18
1
vote
0 answers

Arranging dimentions for clustering with SSAS

I am having some trouble with SSAS and data mining - specifically the Microsoft Clustering package. I intend to ultimately do my work in AMO and MDX, but for now, just happy to understand how it works in the BIDS via Visual Studio. One step at a…
willy_pond
  • 47
  • 1
  • 5
1
vote
1 answer

Location mining from text

I'm working on a text mining problem: extract the place from the text. The place could be either only states, or more specific such as name of a neighborhood in Chicago, or even a specific address. But it's only in US. I've been trying Yahoo Place…
Dzung Nguyen
  • 3,794
  • 9
  • 48
  • 86
1
vote
3 answers

How to determine if a current set of data values represent or relate to previous historic data values?

I am trying to develop an method to identify browsing pattern of a user on the basis of page requests. In a simple example I have created 8 pages and for each page request from the user to the page I have stored that page's request frequency in the…
1
vote
3 answers

Prediction Algorithm for Basketball Stats

I'm working on a project where I need to predict future stats based on past stats of basketball players. I would like to be able to predict next season's statistics based on the statistics of the past three seasons (if there are three previous…
arc
  • 477
  • 2
  • 8
  • 14
1
vote
1 answer

How to find similarity for large number of features

I'm not sure if I am asking the question at right place as I'm new to stackoverflow, please move if required. I'm trying to solve a link prediction problem for Flickr Dataset. My dataset has 5K nodes and each node has around 27K features, it is…
TechCrunch
  • 2,924
  • 5
  • 45
  • 79
1
vote
2 answers

String comparison of wikipedia articles

i am retrieving Wikipedia categories for a request with http://en.wikipedia.org/w/api.php?format=json&action=query&prop=categories&cllimit=5000&titles=request What i am trying to do next is compare the description article of each of the categories…
Evan
  • 1,683
  • 7
  • 35
  • 65
1
vote
3 answers

Arranging documents in a grid in accordance with the content similarity

How is it possible to arrange documents in to a space (say multiple grids), so that the position in which they are placed in, contains information about how similar they are to other documents. I looked in to K-means clustering, but it is a bit…
jvc
  • 604
  • 2
  • 12
  • 33
1
vote
0 answers

Data Mining prediction Server Operation has been cancelled

I will appriciate the help if anyone has encountered the problem before or has an idea how to resolve it. I am trying to create an data mining with visual studio 2008. And I am almost to the end. and when I press the Run button I get the following…
1
vote
1 answer

Weka: Classifier and ReplaceMissingValues

I am relatively new to the data mining area and have been experimenting with Weka. I have a dataset which consists of almost 8000 records related to customers and items they have purchased. 58% of this data set has missing values for the "Gender"…
user2275504
  • 31
  • 1
  • 4
1
vote
1 answer

How to identify a new pattern in a URL with a machine learning algorithm (Text mining)

I am trying to identify new patterns after analyzing a number of URLs. So let's say, I am investigating the hypothetical website Yoohle.com and their URLs have the following structure. domain = yoohle.com q= search phrase lan= language used pr=…
1
vote
2 answers

How to perform least squares regression in R given training and testing data with class labels?

I have a 63*62 training set and the class labels are also present. The test data is a 25*62 dimensions and has the class labels too. Given this how would I perform least squares regression? I am using the code: res = lm(height~age) what does height…
user1403848
  • 103
  • 2
  • 4
  • 15