Posts Tagged ‘ Data Mining ’

Data Requirement for Advanced Analytics

TDWI author, Philip Russom has presented a fantastic checklist on the data requirements for advanced analytics.

First, it is a major BI/DW organization which pinpoints the need of different data architectures for reporting and analytics (particularly advanced analytics).

Second, it serves as an important document for data warehousing and modeling experts who usually dont consider the advanced analytics usage when designing the data storage.

Third, it promotes the provisioning of separate analytical data stores that advanced analytics demand.

Fourth, it serves a business case for in-Memory databases.

Standard reporting and analytics (OLAP) suffice well with multidimensional models (high level, summarized data) while advanced analytics require raw transactional data (low level, detail data) along with aggregated data and derived data usually in denormalized forms. The exact nature of the design is determined on the type of analysis to be carried out.

The data integration is also different for data warehousing serving reporting and analytics and for the analytics databases serving advanced analytics. The former mostly rely on ETL while the later is better served up both in practicality and the nature of analysis by ELT.

Secondly, the data integration for data warehousing deals mostly with aggregating, consolidating and changing the schema type from relational to multidimensional. Whereas in analytics database, the data integration is of an advanced mathematical nature where activities like discretization of continuous data, binning, reverse pivoting, data sampling and PCA are heavily employed.

A similar discussion had been carried out sometime ago here.

This white paper makes a strong case.

Data Mining Definition

Here is my definition of Data Mining:

Data Mining is a process of extraction of non-trivial patterns from massive datasets which either provides descriptive insights of the data (not perceived without this extraction) or provides actionable intelligence (in the form of reusable patterns which the process extracted). Where actionable intelligence is a structure of explicitly representable patterns which can be used for decision making either manually or computationally.

What do you think about it? Whats missing? Whats extra?

An overview of Statistical aspects of Fraud Detection

Here is a video presented by Mr. David Hand on the issues of automated Fraud Detection system. Worth watching:
David Hand

Statistical techniques for fraud detection, prevention, and evaluation

Complementing this video by his paper is a good combination.