Recently, Express Analytics was engaged by a client to assist them in selecting an Analytics platform for marketing analytics. This led us to ponder the answers to the following questions.
- What is an Analytics Platform, and how is it different from a Transactional Platform?
- What prevents organizations from exploiting the existing data in their databases for Analytics?
- Who are the new entrants in the market offering analytics platforms?
- What is the long-term direction of the market?
- Can I afford an analytics platform?
- What is the correct measure of the cost of an analytics platform?
Let us ponder the first question:
What is an Analytics Platform, and how is it different from a transactional Platform?
A bit of Background
For the last 40 years or so, the entire technology industry has been focused on solving just one problem.
- How to improve the ability to record and manage transactions?
This unrelenting focus has led to enormous improvements in the ability to record transactions.
We have become so proficient that today we can record transactions in microseconds and Nanoseconds.
This has given rise to high-frequency trading on stock markets, massive-scale collaboration among billions of people on social media, and the recording of sensor data from machines.
It is well understood that the core requirement of a transaction management system is to be able to
- Create a new record
- Read a single record
- Update any field of that record
- Delete a record
- Save the record.
These few functions (CRUDS) all operate on a single record.
Hence, we have developed database systems that are highly efficient in selecting a needle from the proverbial haystack.
To achieve this singular objective, we record the complete record as a row. Each row can have fields that are number, date, or text type.
In the late 1990s and 2000s, we struggled to modify these database systems to accommodate audio and video content with limited success.
Therefore, every time we need to read a few fields in a record, we must bring the entire record into memory before we can operate on it.
So if the record has 100 fields and I mostly query the 10 most frequently used fields, I am moving the whole record even though I have no use for 90% of the fields in my current operation.
When I have to read millions of records from memory, this becomes a wasteful use of precious resources, such as server memory and CPU, at an enormous cost due to I/O from disk.
The result is a system that is sluggish and doesn’t respond before the train of thought has left the station.
Ready to choose the right analytics platform? >>>>> Talk to our experts today
What about Now?
Over the last decade, we have observed a significant increase in the amount of data stored in our databases, making it increasingly difficult to access that data efficiently.
This led to exploring various approaches to storing and accessing the data. First, we analyzed the queries we frequently ran and discovered that only 5-10% of the fields of a record are used in our queries.
This led to a different way to store data in databases. This approach stores data as columns rather than a complete record.
This approach is called the columnar database system. Since each column of a table has a single data type, we can utilize compression techniques to reduce the database size.
This, in turn, reduces the I/O necessary to retrieve a large volume of data and improves query performance dramatically.
Second, we discovered that the clock speed of the CPU and the memory has hit a wall, so we adopted a parallel processing approach using multicore CPUs.
Taking this one step further, we created massively parallel clusters of commodity servers that work together to crunch a massive amount of data very effectively.
During the last decade, we have also uncoupled the hardware and software in servers.
Today, we can define what a cluster delivers by the software installed on it.
Completely software-defined servers enable us to utilize commodity hardware and open-source software to create Big Data crunching capabilities that are easy to configure and modify.
They bring fluidity to our operations. We have been moving from brittle to flexible architectures.
Good but not good enough!
So, currently, we can record and retrieve large volumes (Gigabytes, terabytes, and Petabytes) efficiently.
However, how do we effectively make sense of the large volumes of data?
We are attempting to develop machine-learning techniques to analyze high-volume data, as it is not feasible for humans to read, understand, and analyze large amounts of granular data.
Along with this increased velocity of data generation, the data is also becoming more unstructured, sparse, and arriving from all channels. Even digital and modern channels are starting to fall behind.
Today, texting is preferred over email, and written letters are not even used to fire employees or deliver divorce notices. Worldwide, we frequently use three- or four-letter acronyms from slang. So LOL, PFA, and GTG are used routinely in communication. Our interactions have become informal, sparse, multi-channel, and asynchronous. Yet our desire to be understood has never been greater.
We expect our service providers to understand our expectations and be prepared to serve us without needing us to express our needs. We are migrating to an era when an organization needs to:
- observe us,
- understand our desires,
- appreciate our tastes,
- analyze our past behavior
- serve us graciously
- In real-time
Or we are ready to change our service providers in a flash.
What should an analytics platform provide?
All this has led to the desire for an Analytics platform that will allow us to analyze the data and extract meaning and nuances from it. The modern analytic platform needs to perform the following functions efficiently:
- Select a few columns from a vast number of records
- Select sets of records based on different criteria
- Create new sets from the intersection or union of these record sets
- Create derived fields from the few columns selected to enrich this data
- Create algorithms to recognize the trends in this data
- The project discovered trends in the future
- Describe the patterns recognized
- Classify similar data together
- Predict the likelihood of events
- Prescribe corrective action
- Infer meaning from seemingly dissimilar data
- Present data in an easy-to-understand visual image
The modern analytics platform has many more requirements that are contradictory to transactional platforms.
Ready to choose the right analytics platform? >>>>> Talk to our experts today
In the following posts, we will discuss the answers to the remaining questions. The list above is incomplete.
I am sure you have a lot more functions that you feel are important. Please let me know, and I will continue to add it to the list.
In the next blog, I will discuss the reasons why organizations struggle to leverage existing data within the company. Stay tuned.