Friday, April 17, 2009

DATA MINING

Data mining refers to extracting or ’mining’ knowledge from large amounts of data.

There are many other terms carrying similar or slightly different meaning to data mining such as knowledge mining from databases, knowledge extraction etc.
Knowledge Discovery in Databases (KDD)
– Data cleaning (to remove noise and inconsistent data)
– Data integration (where multiple data sources may be combined)
– Data selection (where data relevant to the analysis task are retrieved from database
– Data transformation (where data are transformed into forms appropriate for mining by performing summary or aggregation operation)

- Data mining (an essential process where intelligent methods applied in order to extract patterns)
Pattern evaluation (to identify truly interesting patterns representing knowledge based on some interesting measures)
-Knowledge presentation (where visualization and knowledge representation techniques are used to present the mined knowledge to the user)

diagram of kdd


Data mining system has major components
• Database , Data warehouse or other information repository
• Database or data warehouse server
• Knowledge base
• Data mining engine
• Pattern evaluation module
• Graphical user interface


Diagram of architecture


How does data mining work?
 While large-scale information technology has been evolving separate transaction and analytical systems , data mining provides the link between the two. Data mining software analyzes relationships and patterns in stored transaction data based on open-ended user queries. Several types of analytical software are available: statistical, machine learning and neural networks. Generally, any of 4 types of relationships
“classes, clusters, associations, sequential patterns”
Data mining consists of five major elements:
 Extract, transform and load transaction data onto the data warehouses system
 Store and manage the data in a multidimensional database system
 Provide data access to business analysts and information technology professionals.
 Analyze the data by application software
 Present the data of in a useful format such as graph or table.


Different level of analysis are available
 Artificial neural networks
 Genetic algorithms
 Decision trees
 Nearest neighbor method
 Rule induction
 Data visualization

Data mining functionalities and kinds of patterns
Data mining functionalities are used to specify the kind of
patterns to be found in data mining tasks. Data mining tasks
can be classified as two types:

1. Descriptive: It characterize the general properties of the data in the database.
2. Predictive: It perform inference on the current data in order to make predictions
Data mining functionalities and kinds of patterns are

 Data characterization and discrimination
summarizing the data of class under target class
comparison for the target class under with one or a set of comparative classes

 Association analysis
it is the discovery of association rules showing attribute-value condition that occur frequently together in given set of data . It used for market basket and transaction data analysis.

 Classification and predication
classification is the process of finding set of models
that describe and distinguish data classes or concepts,
for the purpose of being able to use the model to predict
the class of objects whose class label is unknown. The
derived model is based on analysis of set of training data.

The prediction of continuous values can be modeled by statistical techniques of regression. It has two types
linear regression
multiple regression
 Outlier analysis
A database may contain data objects that do no comply with the general behaviour or model or data.
These data objects are outliers. The analysis of outlier data is referred as outlier mining
 cluster analysis
clustering analyzes data objects without consulting a known class label. clustering can also facilitate
taxonomy formation.


Data mining application
 Data mining for biomedical and DNA data analysis
 Data mining for financial data analysis
 Data mining for the retail industry
 Data mining for the telecommunication industry
A data warehouse is a database of data gathered from many systems and intended to support management reporting and decision making. It provides architectures and tools for business executives to systematically organize and usw their data to make strategic decisions.
The key features are
• Subject-oriented
• Integrated
• Time-variant
• Nonvolatile
Data warehousing is very useful from the points of views of heterogeneous database integration.

Design and construction for data warehouses


Four different views regarding design of a data warehouses:
 Top-down view
selection of the relevant information
 Data source view
information being captured, stored and managed by operational systems
 Data warehouses view
fact tables and dimension tables
 Business query view
perspective of data in the data warehouse from the viewpoint of the end user.


The construction of data warehouses which involves data cleaning and data integration can be viewed as an important preprocessing step for data mining. It provide on-line analytical processing (OLAP) tools for the inter-effective data mining.

Diagram of three-tier architecture


1. Bottom tier
It is a warehouse database server that is almost always relational database system.
2. Middle tier
It is an OLAP server that is implemented using either relational OLAP(RLAP) nor multidimensional OLAP(MLAP)
3. Top tier
It is a clean which contains query and reporting tools , analysis tools and/or data mining tools.


Goal of warehouses
……….
Application:
 Information processing
 Analytical processing
 Data mining

Conclusion

Data mining is the extraction of hidden predictive information from large databases. This is a new powerful technology with great potential to help companies focus on the most important information in data warehousing.

No comments:

Post a Comment