Data Mining Notes
Please leave a remark at the bottom of each page with your useful suggestion.
Data mining is the process of finding anomalies, patterns and correlations within large data sets to predict outcomes. Data mining is looking for hidden, valid, and potentially useful patterns in huge data sets. Data Mining is all about discovering unsuspected/ previously unknown relationships amongst the data. Data mining is also called as Knowledge discovery, Knowledge extraction, data/pattern analysis, information harvesting, etc.
Using a broad range of techniques, you can use this information to increase revenues, cut costs, improve customer relationships, reduce risks and more.
Types of Data
- Relational databases
- Data warehouses
- Advanced DB and information repositories
- Object-oriented and object-relational databases
- Transactional and Spatial databases
- Heterogeneous and legacy databases
- Multimedia and streaming database
- Text databases
- Text mining and Web mining
Data Mining Techniques
- Classification: This analysis is used to retrieve important and relevant information about data, and metadata. This data mining method helps to classify data in different classes.
- Clustering: Clustering analysis is a data mining technique to identify data that are like each other. This process helps to understand the differences and similarities between the data.
- Regression: Regression analysis is the data mining method of identifying and analyzing the relationship between variables. It is used to identify the likelihood of a specific variable, given the presence of other variables.
- Association Rules: This data mining technique helps to find the association between two or more Items. It discovers a hidden pattern in the data set.
- Outer detection: This type of data mining technique refers to observation of data items in the dataset which do not match an expected pattern or expected behavior. This technique can be used in a variety of domains, such as intrusion, detection, fraud or fault detection, etc. Outer detection is also called Outlier Analysis or Outlier mining.
- Sequential Patterns: This data mining technique helps to discover or identify similar patterns or trends in transaction data for certain period.
- Prediction: Prediction has used a combination of the other data mining techniques like trends, sequential patterns, clustering, classification, etc. It analyzes past events or instances in a right sequence for predicting a future event.
Data Mining Implementation
The data from different sources should be selected, cleaned, transformed, formatted, anonymized, and constructed. Data cleaning is a process to "clean" the data by smoothing noisy data and filling in missing values.
- Smoothing: It helps to remove noise from the data. Aggregation: Summary or aggregation operations are applied to the data. I.e., the weekly sales data is aggregated to calculate the monthly and yearly total.
- Generalization: In this step, Low-level data is replaced by higher-level concepts with the help of concept hierarchies. For example, the city is replaced by the county.
- Normalization: Normalization performed when the attribute data are scaled up o scaled down. Example: Data should fall in the range -2.0 to 2.0 post-normalization. Attribute construction: these attributes are constructed and included the given set of attributes helpful for data mining.
- Mathematical models are used to determine data patterns.
- Based on the business objectives, suitable modeling techniques should be selected for the prepared dataset.
- Create a scenario to test check the quality and validity of the model.
- Run the model on the prepared dataset.
- Results should be assessed by all stakeholders to make sure that model can meet data mining objectives.
- Patterns identified are evaluated against the business objectives.
- Results generated by the data mining model should be evaluated against the business objectives.
- Gaining business understanding is an iterative process. In fact, while understanding, new business requirements may be raised because of data mining.
- A go or no-go decision is taken to move the model in the deployment phase.
- The knowledge or information discovered during data mining process should be made easy to understand for non-technical stakeholders.
- A detailed deployment plan, for shipping, maintenance, and monitoring of data mining discoveries is created.
- A final project report is created with lessons learned and key experiences during the project. This helps to improve the organization's business policy.
- Data Mining is all about explaining the past and predicting the future for analysis.
- Data mining helps to extract information from huge sets of data. It is the procedure of mining knowledge from data.
- Data mining process includes business understanding, Data Understanding, Data Preparation, Modelling, Evolution,Deployment.
- Important Data mining techniques are Classification, clustering, Regression, Association rules, Outer detection, Sequential Patterns, and prediction
- R-language and Oracle Data mining are prominent data mining tools.
- Data mining technique helps companies to get knowledge-based information.
- The main drawback of data mining is that many analytics software is difficult to operate and requires advance training to work on.
- Data mining is used in diverse industries such as Communications, Insurance, Education, Manufacturing, Banking, Retail, Service providers, eCommerce, Supermarkets Bioinformatics.