Archive for the ‘ business intelligence ’ Category

A Presentation on HR Analytics

The Pre-Sales Diary:Great but Too Expensive Dear!


The apparent feeling of being dumped by a potential customer can be materialized by a host of many available options for such potential customers. Although there is no harm (apparently) for being rude to the sales team but just for the sake of better euphemism, these no longer ‘potential’ customers can simply blame their distaste of your products/services or whatever you do to being overpriced!

Although a fashion statement in some novelty industries and an admired trait, most enterprise business software taboo out the pricy tagging.

First of all, we all know it, IT notoriously sucks the money out of a business, especially when there is no enterprise strategy around, IT is definitely a pure cost center. That is why they invited you to sell them business performance management and intelligence software to get share of the corporate ‘strategy’ cake.

But you don’t like to understand any of this, you spent a lot of resources in time and people to execute a sales cycle, raised expectations, probably gave a proof of concept with purpose, all to listen to the once potential customer spit out the devilish decree. It all starts with ‘But…….‘ and follows a variation of ‘Your solution is too expensive‘ or ‘we can’t allocate the budget for it‘, ‘we don’t make the final purchasing decisions’, etc etc.

In reality, this is just a polite way of saying that you didn’t meet the expectations or weren’t able to create the right value of your products/services.

How do you cater to this catch-22 situation?

Many  survivors tell us some common strategies, including:

  1. Price Justifications (e.g. Our product works in zero gravity, our costs are only upfront heavy, incremental upgrades are very cheap)
  2. Price Distractions (e.g, we have overall very low TCO)
  3. Competitor Demeaning (e.g. The competitors have lousy products and are thus cheap) (Pun Intended)
  4. Bargaining (e.g. whats your budget, let us fit something for you, else we will definitely, ultimately come down to your level)
  5. Reinventing the Sales Wheel (e.g. Lets try again, lets talk again, let us repeat our efforts to emphasize why we are not so affordable)
  6. Reassess our own Assumptions about the Expectations and Value Offered (e.g. Does the customer really know what they can get as true ROI, is our product redundant, can they solve pain point using other lesser expensive solutions)

The reality is, most of these techniques are pretty frequently used, some of them are quite demeaning (e.g. 3), but in most cases, the bottom line is, you need to set the Expectations straight, and such an objection raised only indicates the lack of effectiveness to do the same.

Once the objection is raised, ask the prospect what should the product/service have more for him to rethink the budget?

He would either give you the points for mending the gaps or acknowledge your product fitness to be good.

For the former case, if the points mentioned are offered in your products/services with a workaround or a doable approach, go ahead, you have nearly resolved the objective.

If the prospect is unable to provide any missing points, then you need to re-emphasize on the need, figure out the real decision makers (if he/she sites others for budget approval), or figure out the true ‘champions’ and ‘villians’ in your deal. Most likely, you will find out that your current assessment is different from your initial assessment.

Apply the changes only, this will set new Expectations and hopefully hopefully you will have the objection resolved, your product/services will be valued the way you wanted or pretty close to that.

Happy Selling!


The Pre-Sales Diary: Data Profiling before Proof of Concepts

The Raison D’Etre for many Pre-Sales Engineers is to carry out Proof of Concepts. Although for most of the potential leads, Proof of Concepts are to be avoided because they incur greater costs in the sales cycle, increase the sales closing time, increases chances of failure but there are certain cases where proof of concepts are really much more helpful for the Sales cycle then anything else.

Some of these cases include when there are competitors involved touting the same lingo/features/capabilities etc, others include a genuine customer scenario which needs addressing in a proof of concept either because the scenario is pretty unique, it is part of their due diligence, or your product hasn’t been tested on those waters before.

Pre-Sales folks are pretty comfortable on their technology which they like to showcase to such customers but they are totally new to the customer’s scenario. There are always chances of failure and there are many failures abound.

Before embarking on a scope for a proof of concept and promising deliverables, it is more than required, infact mandatory not just to analyze the customer organization, but also processes, metrics and ofcourse data.

The last part is where I find most proof of concepts depending on. Everything is set, you took extensive interviews with the stakeholders and know what needs to be ‘proved’, you scoped out a business process or two, figured out some metrics and one or two KPIs and they gave access to their data pertaining to it. Now the ball is in your court, but before you know it, your doomed!

The data is incomplete, inaccurate, and have tons of issues which data governance and MDM were meant to solve but didn’t, they don’t exist yet. In most likelihood, the customer is quite unaware of such issues, that is why you are offering them a Business Intelligence solution in the first place, to tap into their data assets. They have never done so before themselves or done so quite limited way to be able to uncover such obstacles. In other scenario when they are aware of these issues, they either are unable to tap it or it is a trick question for you, they want to check whether you cover this aspect or not.

You can either proof the ‘time’ challenge by jumping right into the proof of concept and ignoring all standard practices which are pretty standard during project implementations but then you ignore all of them (or most of them) simply because ‘its just a demo’!


I always carry out a small data survey activity before promising any value to be shown in the proof of concept to make sure what we have in store before we can do anything. Simple rule, GIGO – Garbage In, Garbage Out. If you want to have a good quality, successful demo, profile your data first, understand the strengths and weaknesses and above all let the customer know fully about the limitations, if possible, get enrichments in your data based on your profile to make your demo successful.

This one single step can lead to drastically different outcomes if it is performed or not.

Data Profiling:

Data Profiling is defined as the set of activities performed on datasets to identify the structure, content behavior and quality of data. The structure will guide you towards what links, what is missing, do you all have the required master data, do you have data with good domain representation (possible list of values), what granularity you can work with. Content Behavior guides you on what are the customer’s NORMS in terms of KPI and metric values. e.g. if the dataset contains age groups of 40+, then there is no need to showcase cross selling market basket targeted to toddlers. You can simply skim it out, or ask for data enrichment. if you dont data pertaining to more than one year, then you can’t have year’ as a grain level which for certain metrics and analysis might be critical. Data Quality assessment, albeit a general one, can save you many hours ahead. Most notable of quality issues are data formats, mixed units of measurements, spell checks. e.g. you have RIAD, RIYADH, RIYAD, RYAD all indicating the same city, mixed bilingual datasets like names and addresses etc.

There are many tools available out there which can aid in Data Profiling, including the ubiquitous SQL and Excel. However, Data Profiling, being a means to an end and not the end in itself does not warrant more time and energy than required, there fore a purpose built RAD enabled data Profiler is one of your most critical investments in your toolbox.

One which I have come across recently and which fits the bill very nicely is Talend OpenProfiler, a GPL-ed, Open Source and FREE software which is engineered with great capabilities and power. You can carry out structure analysis, content analysis, column or fields analysis, pattern based analysis on most source systems including many DBMS, flat files, excels etc with readily available results in both numerical and visual representations to make you get a better sense of your data.

I believe all Data Quality tools are (or should be) equipped with good data profiling capabilities, most ETL vendors have data profiling capabilities and some data analysis packages like QlikView can also be used albeit in limited ways to profile data in limited time.

The Data Profile can also be later shared with the customer as a value deliverable.


Happy Demoing!

Data Discovery – The BI Mojo

Gartner’s Q1-2011 Magic Quadrant for Business Intelligence was recently released.

Without much surprise, the four quadrants hosted some of the best BI offerings. As expected, QlikTech moved to the Leaders’ Quadrant thanks to its growing customer base, bigger deployments and a successful IPO back in October last year.

Other players also shone, inlcuding the likes of Spotfire (TIBCO) and Tableau earning the challengers title. This is what we see a trend of the Magic Quadrant, no vendor directly moves to the Leader’s box without entering the Challengers zone first. It is well expected that sooner or later, Spotfire and Tableau will join the ranks of the leaders while it is also quite possible that one or two existing leaders might start fading in history.

The Zeitgeist:

Data Discovery tools have the greatest mind share, success and momentum. They have proved to be highly disruptive and have pushed aside slowly moving elephants aside. Although elephants might be able to dance, tools like Qlikview, Tableau and Spotfire represent the new wave of BI both from both adoption and approach perspectives.

These vendors are business friendly, analyst-savvy, agnostic to (traditional)reporting and have very agile development approaches. That is why the buying criteria are reporting to be

1. Ease of Use

2. Rapid Deployment

3. Functionality

These in-memory offerings compete on OLAP’s limitations and thus add a value addition to functionality, which is pretty much appreciated by IT as well.

However, this addition to the Leaders and Challengers quadrant by these new wave BI tools have caused a chain reaction resulting in SAP, Microsoft and Cognos innovating with their own in-memory offerings and interactive visual discovery tools. However, the post-2007 acquisition hangover lingers on and still customer dissatisfaction caused due to these acquisition and merger into larger product and services suite of the mega-vendors is the cause of concern for these players.

For these new wave BI tools, old adage problems are surfacing including Data Governance, Data Quality, Master Data Management, Single Version of the Truth and the curse of the information silos. Some of these new age vendors  are solving this by having a larger portfolio of products to cater to this, like TIBCO while others focus more on OEM partners to deliver these important facets, like QlikView, while still others rely on a symbiotic relationship with existing (traditional) BI deployments like Tableau.

The Observations:

  1. Both Traditional BI and Data Discovery tools are required, therefore, saturation in the Leaders Quadrant is far from reality while emergence of new vendors will still be observed.
  2. The overall BI maturity is being observed with the trend shifting from measurement to analysis to forecasting and optimization
  3. Cost is an increasingly important factor in purchasing and thus alternatives like open source offerings and SaaS deployments are gaining potential.
  4. Niche players will continue to flourish but need to have a viable road map amidst constant threat from mega-vendors to replicate or acquire (similar) technology.

Google Trends for Business Intelligence Today

Interesting find between traidtional giants, open source competitors and innovative new generation BI:

This slideshow requires JavaScript.

Clearly shows, Qlikview is gaining steady momentum, Pentaho is also gaining popularity, steady decline for traditional powerhouses…

Impact of Business Modeling on Data Warehouse Success

Business Modeling to Process Management professionals may mean something different, but in data management, business models are the logical data structures which capture the meaningful events in context. Here the context refers to dimensions, measures and their specific usage. Data Warehousing activities should be agile, thats the evolving zeitgeist which concepts like DW2.0 (Inmon), Agile BI and Self-Service demand. The fundamental problem with Data warehouse development is catering to change, agility and still serving a diverse user base.

BI processes and tools like Qlikview, Spotfire, Tableau with in-memory, associative models, alternative to OLAP provide the agility but the backbone of the organizational data warehouses are still locked into fixed, rigid and less scalable subject areas.

The impact of conducting Business Modeling sessions with the Business users prior to BI is effective on many fronts:

1. Business Exceptions are captured early.

One of the trick questions in requirements gathering for data warehouses is to extract the exceptions in business rules, processes, events from business users. This phase is an ‘art’ and only few can get a good knack of exposing as many business exceptions early in the design process. The problem with realizing business exceptions late result in architectures with patchy workarounds and fixes which reduce the overall muscle of the data warehouse building.

2. The Essence of Business Processes necessary for Data Warehousing is preserved.

At times, business modeling invoke the discussions among business users who rather than portraying the as-is states, often pertain to discussions and debates on what ought ‘to-be’ states and thus the BI consultant captures the overall strategy and goals of the business process under scrutiny. This info leads him/her to design solutions which adhere to the general gist and thus can manage both as-is and to-be with little modifications or further effort. Decent data warehouse scalability can be achieved.

3. A common language for communication is established between business and IT/BI.

Often times BI consultants & DW Architects force Business Users into understanding the business data model using the underlying database data models with naming conventions, physical layouts and dimensional modeling technicalities which overwhelm the business user. Business Users love when IT talks their language and IT appreciates removing ambiguity and vagueness by using a common lingua franca.

4. Greater sandboxing capabilities

Faster iteration cycles for prototype warehouse models can be reached which will allow for faster convergence to the desired data model(s). Easily an end to end prototype including subject area development, followed by BI can be experienced often revealing obstacles before going for the time consuming and complete implementation.

5. Setting expectations becomes more accurate.

Business Users through sandboxing and through modeling see the highlights of things to come and therefore right sizing is achieved.

6. An initial scope for BI is understood.

Business modeling will reveal to the BI Consultant(s) the major pain points of the business users and having seen the business models with its exceptions and all facets, a rough time estimate can be set and/or a rough activity and project (or phase) scope can be identified.

7. The A-Team is on-board.

By bringing a wide array of people from end users, power users, quality assurance officers, BI consultants and data architects, a company can easily identify the A-Team from the business modeling activity. This A-Team will carry the momentum to execute the project and in the longer run participate in the BICC.

8. Nip the Evil in the Bud, Data Quality and Governance issues start surfacing

Although they seem to be a distraction and are separate programs themselves, Data Governance (and Quality) has to be addressed as early in the Data warehouse design phase as possible since the design elements can be either considerate of cleansing or assume that to be cleansed when it comes to the warehouse. Secondly, Data Quality fixes later in the value chain, (worst case, front end reporting) will cause greater deviations from standards and design and a solution based more on patches and work around.

9. Greater Transparency in design and architectural choices is observed.

Due to Business Modeling, the organization can better track down on individual decisions which underwent while designing micro elements in the design process. Usually data warehouse implementations which don’t start this way, eventually end up with a long audit trail of documentations and ‘memos’ which are hard to capture and even harder to manage.

10. Overall project implementation times in principle, reduce.

Since the initial challenges are dealt upfront or are atleast discussed upfront, the overall project implementation times reduce. Lesser time is required to do exception management, lesser time is required in debating design choices and communication, and lesser number of surprises (in principle) have to be dealt with.

All in all, it is vital to involve the business users in the design process using business modeling approaches. There are several business modeling tools available in the market which assist in the process but at the end of the day, its the process of conducting such sessions that bear fruit.

Qlikview Section Access – Some Thoughts

Security, in BI…Is that a misnomer? I usually prefer the term Privacy instead…

Nevertheless, whatever term you use for the two processes, you still have to cater to authentication and authorization to see specific data.

QlikView, going enteprise, now has quite a mature security framework to comply with various standards including SOX, HIPAA, ISO either directly or through partners like NOAD.

This means that any company which needs to certify on these standards can be rest assured that QlikView follows compliance friendly and open standards for both authentication and authorization of data as well.

However, for those coming from other data security (privacy) frameworks, like that of traditional BI or ERP environments will find some familiarities in the patterns followed but also some differences.

The security patterns and How-To’s are very well documented by QlikView, one of the good documents can be found here..

Here, I’d like to highlight on the unusual way QlikView implements one of its security models, the Section Access. Of course, there are other ways to implement security which resemble traditional approaches using the QlikView Publisher but here I’d like to focus on a quite powerful security mechanism built within an app, called Section Access which serves a number of use cases for security implementations.

Section Access is a part of the load script which basically maps a list of groups/users with authorized fields/conditions and explicitly denied fields/conditions. As a causal phenomenon, using Section Access also ends up with Data Reduction, i.e. splitting up (and reduction) of data based on defined users with their granted and denied authorizations.

Some Problems:

1 – I hear people concerned about certain drawbacks or unexpected behavior with Section Access. First of all, there is a risk that a developer can lock him/herself out of the application if not being careful. Well, yes, there should be a failsafe mechanism to warn the user of doing so beforehand but then again, the idea behind Section Access is a self-securing, self-controlling Qlikview App, independent of a centralized environment responsible for data security (privacy), in a truly disconnected, democratic analysis (btw, still retaining single version of the truth) approach. The solution, use the document versioning built in feature in Qlikview 10

and roll back to previous versions of the app if this mishap takes place. Or, simply, take a backup of an app before implementing security (privacy) to it.

2 – Some people pointed out that it is quite insecure to define the security matrix (actually the data authorization matrix) within the script. Although samples and demos are pulling in the security matrix using INLINE data loads, in reality, the idea is to have data locked in a data store with authorization only to

Qlikview script to read it, and that the script be placed within a ‘Hidden’ script tab to avoid developers to get overtly curious or just accidentally curious.  The location of the actual data store can also be concealed by loading it in the Hidden script area. Of course, I am not asserting that it is totally failsafe either…

The idea behind section access for me is to add security (privacy) for disconnected analysis which is usually the norm of a few within an organization as compared to the bulk of the users who prefer the more vanilla cake ready implementations and work within corporate infrastructure to work with Qlikview applications. This can be handled by the more systematic and productive QlikView Publisher.

Bottom Line:

You have two types of security (privacy) mechanisms, the Qlikview Publisher, which is more enterprise friendly, centralized and more IT driven while the other is a unique feature which maintains the security context in disconnected analysis mode as well and also provides a self-defining, self-controlling DAR (or MAD) application.

Related Links:


Xcelsius Connectivity – FlyNet Services

This is a part of a mini series on the data connectivity aspects of Crystal Xcelsius dashboard tool, part of SAP BusinessObjects.

Xcelsius dashboards are Shockwave (SWF) files providing rich user interactions and visualizations. The product is a special RAD tool on top of Adobe Flash which focuses exclusively on building dashboards. It has a set of key dashboard visualization widgets including charts, graphs, trends, signals, gauges etc. It is extensible through Adobe Flex.

Xcelsius dashboards can be deployed in various manners and has several usage scenarios with typically both individual and large groups of consumers for a particular dashboard. One of the most important areas is the data connectivity features of the product.

This mini series talks in some detail about various connectivity mechanisms. Here I will talk about Flynet Services, a third party component which comes free as an express edition along with Xcelsius Engage and Enterprise licensed media.

Flynet, is simply a .NET web services generator, (WSDL + ASMX) which in usual circumstances is not a popular skill set among dashboard designers. WSDL, Web Services Description Language is an xml format for the interoperability of functionality provided by distributed services. Xcelsius dashboards can communicate to and fro with data stores through the web service interface. One can write custom code for communicating between the Xcelsius engine and the data stores using .NET, Java, PHP, etc but it requires dedicated developer skills.

Flynet services provide a handy tool to automatically generate web services code for deployment to a web server, usually Microsoft IIS (but can be deployed on an apache web server as well).

If you have already purchased Xcelsius Engage or Enterprise, the Flynet setup will be in the folder <Xcelsius Install Path>\AddOns\Flynet or <Xcelsius Install Path>\Connectivity\FlyNet

After installing FlyNet WebService Genrator and IIS..

Enter the license both in Xcelsius (Help->Update License Key) and FlyNet Web Services Generator (Help-> Enter New Product Key).

If you enter an Xcelsius Enterprise License, you will get this message.
Enterprise License Issue

Make sure you have the Crystal Xcelsius Engage Server License. This particular Xcelsius version connects directly with data sources.

The Enterprise license only works for data connectivity with BusinessObjects enterprise and not diectly with live data sources.

The FlyNet Generator has a catch. It can only allow for as many ‘analytics’ updates as the number of CAL license available. i.e for a 10-CAL Engage Server license, Flynet servics will randomly update only 10 analytic widgets (copmonents) at one deployment. Therefore, it is not a viable solution for dashboards requiring data conectivity with more analytic widgets.

If this doesn’t suffice, you have the options of the Adobe LifeCycle Data Services (LCDS) or manual coding of webservices…

FlyNet WS Generator is a simple three tab and thus three step tool.

1. Web Service

Web Services

Enter a name, preferrably without spaces (since certain webservers dont deal with white spaces in URLs very well), description and a folder location where you want your generated web services to be placed. If you are planning to use IIS and want to deploy the web services using FlyNet, make sure the folder is under the right security domain in the production system. Otherwise, you can generate the web service to any folder and then manually deploy it as a web app in IIS by unchecking the option: “Register Web Service with IIS”

If you have a corporate web service deployment policy and/or you want to better organize your deployed webservices, you can use the advanced feature and determine a .NET namespace for your generated webservice code and also the WS namespace URL.

Advanced Setting_WebServices

Once you have filled up this basic information, you can move to the next teb: “Data Source Connection”

2. Data Source Connection

This is where the beauty of this utility lies. You just have to define your query in SQL (with sligth variations) and you get yourself a generated webservice.


Click on New Data Source to define either an OLE DB or an ODBC connection. FlyNet provides data adapters for numerous data sources inlcuding RDMBS, OLAP cubes and even directly from excel files. However, it is kind of strange to use web services to connect to excel using FlyNet, Xcelsius provides XML mapping via excel already!


You can define the connection string either through the wizard or directly using the Advanced Button. Once completed, you can test your connection string and view the summary.


3. Queries:


Best way to define the queries is using the query wizard. The reason is that FlyNet for unkown reasons didnt write an ANSI SQL-92 or 2000 parser. For aggregates, stored procedures and case statements, there are special provisions.


For ‘Aggregate Values’, place ~ before and after the alias. For ‘Case Statements’, enclose in parantheses, for stored procedures, use the EXEC keyword.


Generate Web Service

Thats it! Generate the Web Service once you are done with your queries. Each query will lead to a new variable in Xcelsius. Ill come to that later.

The utility then generates a wsdl file, associated asmx and the web.config files for usage. Pity it doesnot reveal the c# code…

If you selected “Register Web Service  with IIS”, you will get a dialog asking you to view your webservice. That will open the URL in your browser.

Otherwise you can deploy the three files in IIS or Apache manually.

You can test your webservice by using the “Invoke” button. You should see the generated XML file with the relevant data in it. If things go wrong, the webservice will spit errors within the data elements of the xml.



Now lets hook up the Xcelsius dashboard with the web service.

Goto the ‘Data Manager’ accessible from the toolbar. Click on Add->Add a Web Service Connection.

Enter the WSDL URL and click Import. This will import all the parameters into Xcelsius for linking purposes.


Notice each query you defined in FlyNet is converted to a method here. Thsi way you can build multiple input and output parameters. It also helps in running parallel queries. Designing the distribution of data to input and fetch using a multiple of queries is a pure design desicion and requires a lot of factors including the runtime performance, database performance and dashboard update frequency.

You can now link the webservice with the embedded Excel model by linking cells to parameters.

Once you are done with this, you can design your dashboard interface the usual way, linking cells with various widgets. Since these cells are now linked up with webservices parameters, your dashboards will be dynamically changing based on the source data!

Note: NullPointerException when accessing Dashboard and Analytics Setup (BOXI 3.0)

NullPointerException: Dashboad and Analytics Setup

When using the dashboard and analytics setup for the first time, you might end up at this stack trace screwup:

    at org.apache.jsp.jsp.appsHome_jsp._jspService(
    at org.apache.jasper.runtime.HttpJspBase.service(
    at javax.servlet.http.HttpServlet.service(
    at org.apache.jasper.servlet.JspServletWrapper.service(
    at org.apache.jasper.servlet.JspServlet.serviceJspFile(
    at org.apache.jasper.servlet.JspServlet.service(
    at javax.servlet.http.HttpServlet.service(
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(
    at org.apache.catalina.core.StandardWrapperValve.invoke(
    at org.apache.catalina.core.StandardContextValve.invoke(
    at org.apache.catalina.core.StandardHostValve.invoke(
    at org.apache.catalina.valves.ErrorReportValve.invoke(
    at org.apache.catalina.core.StandardEngineValve.invoke(
    at org.apache.catalina.connector.CoyoteAdapter.service(
    at org.apache.coyote.http11.Http11Processor.process(
    at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(
    at org.apache.tomcat.util.threads.ThreadPool$

This is caused since performance management needs a separate user account to function.

This is set in the file //businessobjects/performance management12.0  /

1. Create a new user name with no password, belonging to the administrator group and update with this information.

2. Wihtin Central Configuration Manager, restart both Apache Tomcat and Server Intelligence Agent.

3. Login to InfoView using the new credential and Violla!

    at org.apache.jsp.jsp.appsHome_jsp._jspService(
    at org.apache.jasper.runtime.HttpJspBase.service(
    at javax.servlet.http.HttpServlet.service(
    at org.apache.jasper.servlet.JspServletWrapper.service(
    at org.apache.jasper.servlet.JspServlet.serviceJspFile(
    at org.apache.jasper.servlet.JspServlet.service(
    at javax.servlet.http.HttpServlet.service(
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(
    at org.apache.catalina.core.StandardWrapperValve.invoke(
    at org.apache.catalina.core.StandardContextValve.invoke(
    at org.apache.catalina.core.StandardHostValve.invoke(
    at org.apache.catalina.valves.ErrorReportValve.invoke(
    at org.apache.catalina.core.StandardEngineValve.invoke(
    at org.apache.catalina.connector.CoyoteAdapter.service(
    at org.apache.coyote.http11.Http11Processor.process(
    at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(
    at org.apache.tomcat.util.threads.ThreadPool$

Data Requirement for Advanced Analytics

TDWI author, Philip Russom has presented a fantastic checklist on the data requirements for advanced analytics.

First, it is a major BI/DW organization which pinpoints the need of different data architectures for reporting and analytics (particularly advanced analytics).

Second, it serves as an important document for data warehousing and modeling experts who usually dont consider the advanced analytics usage when designing the data storage.

Third, it promotes the provisioning of separate analytical data stores that advanced analytics demand.

Fourth, it serves a business case for in-Memory databases.

Standard reporting and analytics (OLAP) suffice well with multidimensional models (high level, summarized data) while advanced analytics require raw transactional data (low level, detail data) along with aggregated data and derived data usually in denormalized forms. The exact nature of the design is determined on the type of analysis to be carried out.

The data integration is also different for data warehousing serving reporting and analytics and for the analytics databases serving advanced analytics. The former mostly rely on ETL while the later is better served up both in practicality and the nature of analysis by ELT.

Secondly, the data integration for data warehousing deals mostly with aggregating, consolidating and changing the schema type from relational to multidimensional. Whereas in analytics database, the data integration is of an advanced mathematical nature where activities like discretization of continuous data, binning, reverse pivoting, data sampling and PCA are heavily employed.

A similar discussion had been carried out sometime ago here.

This white paper makes a strong case.