The Pre-Sales Diary: Data Profiling before Proof of Concepts

The Raison D’Etre for many Pre-Sales Engineers is to carry out Proof of Concepts. Although for most of the potential leads, Proof of Concepts are to be avoided because they incur greater costs in the sales cycle, increase the sales closing time, increases chances of failure but there are certain cases where proof of concepts are really much more helpful for the Sales cycle then anything else.

Some of these cases include when there are competitors involved touting the same lingo/features/capabilities etc, others include a genuine customer scenario which needs addressing in a proof of concept either because the scenario is pretty unique, it is part of their due diligence, or your product hasn’t been tested on those waters before.

Pre-Sales folks are pretty comfortable on their technology which they like to showcase to such customers but they are totally new to the customer’s scenario. There are always chances of failure and there are many failures abound.

Before embarking on a scope for a proof of concept and promising deliverables, it is more than required, infact mandatory not just to analyze the customer organization, but also processes, metrics and ofcourse data.

The last part is where I find most proof of concepts depending on. Everything is set, you took extensive interviews with the stakeholders and know what needs to be ‘proved’, you scoped out a business process or two, figured out some metrics and one or two KPIs and they gave access to their data pertaining to it. Now the ball is in your court, but before you know it, your doomed!

The data is incomplete, inaccurate, and have tons of issues which data governance and MDM were meant to solve but didn’t, they don’t exist yet. In most likelihood, the customer is quite unaware of such issues, that is why you are offering them a Business Intelligence solution in the first place, to tap into their data assets. They have never done so before themselves or done so quite limited way to be able to uncover such obstacles. In other scenario when they are aware of these issues, they either are unable to tap it or it is a trick question for you, they want to check whether you cover this aspect or not.

You can either proof the ‘time’ challenge by jumping right into the proof of concept and ignoring all standard practices which are pretty standard during project implementations but then you ignore all of them (or most of them) simply because ‘its just a demo’!

Kaput!!!!

I always carry out a small data survey activity before promising any value to be shown in the proof of concept to make sure what we have in store before we can do anything. Simple rule, GIGO – Garbage In, Garbage Out. If you want to have a good quality, successful demo, profile your data first, understand the strengths and weaknesses and above all let the customer know fully about the limitations, if possible, get enrichments in your data based on your profile to make your demo successful.

This one single step can lead to drastically different outcomes if it is performed or not.

Data Profiling:

Data Profiling is defined as the set of activities performed on datasets to identify the structure, content behavior and quality of data. The structure will guide you towards what links, what is missing, do you all have the required master data, do you have data with good domain representation (possible list of values), what granularity you can work with. Content Behavior guides you on what are the customer’s NORMS in terms of KPI and metric values. e.g. if the dataset contains age groups of 40+, then there is no need to showcase cross selling market basket targeted to toddlers. You can simply skim it out, or ask for data enrichment. if you dont data pertaining to more than one year, then you can’t have year’ as a grain level which for certain metrics and analysis might be critical. Data Quality assessment, albeit a general one, can save you many hours ahead. Most notable of quality issues are data formats, mixed units of measurements, spell checks. e.g. you have RIAD, RIYADH, RIYAD, RYAD all indicating the same city, mixed bilingual datasets like names and addresses etc.

There are many tools available out there which can aid in Data Profiling, including the ubiquitous SQL and Excel. However, Data Profiling, being a means to an end and not the end in itself does not warrant more time and energy than required, there fore a purpose built RAD enabled data Profiler is one of your most critical investments in your toolbox.

One which I have come across recently and which fits the bill very nicely is Talend OpenProfiler, a GPL-ed, Open Source and FREE software which is engineered with great capabilities and power. You can carry out structure analysis, content analysis, column or fields analysis, pattern based analysis on most source systems including many DBMS, flat files, excels etc with readily available results in both numerical and visual representations to make you get a better sense of your data.

I believe all Data Quality tools are (or should be) equipped with good data profiling capabilities, most ETL vendors have data profiling capabilities and some data analysis packages like QlikView can also be used albeit in limited ways to profile data in limited time.

The Data Profile can also be later shared with the customer as a value deliverable.

 

Happy Demoing!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: