This post originally appeared on VoltDB.com in May, 2016.
Top 5 Ways to Better Use Your Data
Sir Francis Bacon is said to have coined the
phrase “scientia potentia est”, translated as ‘knowledge is power’. Four
hundred years later, we might rewrite the phrase as “data potentia est”. Data is power - so how can you better use
your organization’s data? Here are five
suggestions to help you drive value from your organization’s data - quickly.
1)
Know what you have.
First and foremost,
inventory your data.
There are two types
of data to identify. First is historical data. Historical data is data you’ve
accumulated over years of doing business. This could include databases, files,
spreadsheets, presentations, transactions, logs, etc. The second type of data
is the data that is being created “right now” - this is real-time data.
Real-time data potentially has immediate value, and then ultimately turns into
historical data.
Catalog and
prioritize the data you have. Ideally, you will also want to identify the
sources of data. Knowing how data is created allows you to capture it, store
it, and eventually extract value from it, at the least cost to the
organization, and with the best ROI.
The value of each
type of data is different. Historical data allows you to analyze and mine past
events. Real-time data gives you the opportunity to calculate analytics,
possibly compare them to historical trends, and perform business actions in
real-time, to capture additional and immediate value.
By way of example,
consider a fraud prevention offering. Fraudulent transaction patterns are mined
from historical data. These historical patterns are applied to real-time
transactions to identify and reject suspected fraudulent transactions.
2)
Architect a data strategy that
handles both Big and Fast data.
Creating a historical
archive, perhaps a data lake via a Hadoop cluster, to store your data is only
one step. Today enterprises create data at a tremendous - and growing -
rate. Processing and ingesting data in
batch mode overnight is no longer acceptable. Real-time responsive enterprises
need to process and react to data in seconds to minutes. Many organizations,
including mobile operators, telecom providers, financial services organizations
and advertising technology providers must respond in milliseconds.
3)
Choose the appropriate
technologies.
There are a plethora
of big data tools, and most are designed around best practices, optimized to
extract value from both historical and real-time data.
Minimally you will
need technologies for these areas:
Big Data: Typically, the main data management platform for big data is Hadoop
or a data warehouse or perhaps a combination of the two to handle both
structured and unstructured data. They act as the repository for all your data,
often called the “data lake”. The data lake stores historical data to be
analyzed and mined.
Fast Data: Data is being created at a dizzying rate every day. Fast data is data
that is being created now and is streaming into your company now. It could be user clicks on your corporate web
page or product downloads or any operational event occurring in your organization. To deliver this fast data to the systems that
can act on it, consider a message queueing systems such as Kafka. To eliminate batch processing (slow data!)
this message queue needs to deliver event data to an operational data stores
capable of handling and processing messages at web-scale speed, thousands to
tens of thousands to even millions of events per second. The operational data store’s role is to
ingest the data and process it in real-time.
Real-time processing can include computing real-time analytics, such as
counts, aggregations and leaderboards, issuing real-time alerts, deduping,
enriching and aggregating events, and making transactional decisions on an
event-by-event basis. Both NewSQL and NoSQL operational stores can provide horsepower
for handling real-time processing of event streams. Modern operational data stores range from
strongly-consistent SQL databases to eventually consistent key/value and
document stores. Consider numerous
factors when choosing, including transactions as well as query interface,
important for your data visualization tooling.
Data Visualization: Dashboards, charts, leaderboards, pivot tables, and visualizations
all play a key role in understanding your data, both historical and real-time.
Historical visualization
helps you explore, understand patterns, and create predictive analytics.
Real-time visualizations help you understand the current state of your
business, usually in the form of a real-time dashboard.
You will want to
evaluate tools from vendors such as Tableau, Qlik and MicroStrategy for
dashboarding and ad hoc visualizations -- user experience is a critical factor
with this kind of software so having your users try it out is essential.
Data Science: A growing number of tools can help you extract information and
insight from your data. Machine learning packages provide data classification,
clustering, and regression analysis, and allow software to “learn” to identify
and make predictions on data. Consider popular open source offerings such as
Spark (MLlib) or R to get started.
4)
Build a Data Pipeline that
delivers Data as a Service (DaaS) to internal customers.
Define an
architecture that serves data to your internal customers. Capturing and
analyzing the data is great, but it is only the first step. Data and insights
must be readily available to consumers (people and applications) across your
enterprise. Consumers of your data must be able to tap into both historical
data from the data lake as well as real-time fast data, along with the insights
derived from both together.
5)
Begin building applications to
extract value from the data - then iterate.
Start small and add
incrementally. Identify opportunities for small quick wins that will prove you
can capture value from your data. Realize that data evolves and new patterns
will emerge. Foster an environment of experimentation, innovation and
continuous improvement and iterate on your data analysis.
Data is valuable. Batch processing is so
1990s. Now you’ve got five ideas for how to extract more value from your data.
Start now and iterate. Think Big, of course, but also Think Fast.
No comments:
Post a Comment