Data mining refers to extracting or ’mining’ knowledge from large amounts of data.
There are many other terms carrying similar or slightly different meaning to data mining such as knowledge mining from databases, knowledge extraction etc.
Knowledge Discovery in Databases (KDD)
– Data cleaning (to remove noise and inconsistent data)
– Data integration (where multiple data sources may be combined)
– Data selection (where data relevant to the analysis task are retrieved from database
– Data transformation (where data are transformed into forms appropriate for mining by performing summary or aggregation operation)
- Data mining (an essential process where intelligent methods applied in order to extract patterns)
Pattern evaluation (to identify truly interesting patterns representing knowledge based on some interesting measures)
-Knowledge presentation (where visualization and knowledge representation techniques are used to present the mined knowledge to the user)
diagram of kdd
Data mining system has major components
• Database , Data warehouse or other information repository
• Database or data warehouse server
• Knowledge base
• Data mining engine
• Pattern evaluation module
• Graphical user interface
Diagram of architecture
How does data mining work?
While large-scale information technology has been evolving separate transaction and analytical systems , data mining provides the link between the two. Data mining software analyzes relationships and patterns in stored transaction data based on open-ended user queries. Several types of analytical software are available: statistical, machine learning and neural networks. Generally, any of 4 types of relationships
“classes, clusters, associations, sequential patterns”
Data mining consists of five major elements:
Extract, transform and load transaction data onto the data warehouses system
Store and manage the data in a multidimensional database system
Provide data access to business analysts and information technology professionals.
Analyze the data by application software
Present the data of in a useful format such as graph or table.
Different level of analysis are available
Artificial neural networks
Genetic algorithms
Decision trees
Nearest neighbor method
Rule induction
Data visualization
Data mining functionalities and kinds of patterns
Data mining functionalities are used to specify the kind of
patterns to be found in data mining tasks. Data mining tasks
can be classified as two types:
1. Descriptive: It characterize the general properties of the data in the database.
2. Predictive: It perform inference on the current data in order to make predictions
Data mining functionalities and kinds of patterns are
Data characterization and discrimination
summarizing the data of class under target class
comparison for the target class under with one or a set of comparative classes
Association analysis
it is the discovery of association rules showing attribute-value condition that occur frequently together in given set of data . It used for market basket and transaction data analysis.
Classification and predication
classification is the process of finding set of models
that describe and distinguish data classes or concepts,
for the purpose of being able to use the model to predict
the class of objects whose class label is unknown. The
derived model is based on analysis of set of training data.
The prediction of continuous values can be modeled by statistical techniques of regression. It has two types
linear regression
multiple regression
Outlier analysis
A database may contain data objects that do no comply with the general behaviour or model or data.
These data objects are outliers. The analysis of outlier data is referred as outlier mining
cluster analysis
clustering analyzes data objects without consulting a known class label. clustering can also facilitate
taxonomy formation.
Data mining application
Data mining for biomedical and DNA data analysis
Data mining for financial data analysis
Data mining for the retail industry
Data mining for the telecommunication industry
A data warehouse is a database of data gathered from many systems and intended to support management reporting and decision making. It provides architectures and tools for business executives to systematically organize and usw their data to make strategic decisions.
The key features are
• Subject-oriented
• Integrated
• Time-variant
• Nonvolatile
Data warehousing is very useful from the points of views of heterogeneous database integration.
Design and construction for data warehouses
Four different views regarding design of a data warehouses:
Top-down view
selection of the relevant information
Data source view
information being captured, stored and managed by operational systems
Data warehouses view
fact tables and dimension tables
Business query view
perspective of data in the data warehouse from the viewpoint of the end user.
The construction of data warehouses which involves data cleaning and data integration can be viewed as an important preprocessing step for data mining. It provide on-line analytical processing (OLAP) tools for the inter-effective data mining.
Diagram of three-tier architecture
1. Bottom tier
It is a warehouse database server that is almost always relational database system.
2. Middle tier
It is an OLAP server that is implemented using either relational OLAP(RLAP) nor multidimensional OLAP(MLAP)
3. Top tier
It is a clean which contains query and reporting tools , analysis tools and/or data mining tools.
Goal of warehouses
……….
Application:
Information processing
Analytical processing
Data mining
Conclusion
Data mining is the extraction of hidden predictive information from large databases. This is a new powerful technology with great potential to help companies focus on the most important information in data warehousing.
Friday, April 17, 2009
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment