Monday, December 18, 2017

Are you caught up with the real-time (r)evolution?

This post originally appeared on InfoWorld on December 1, 2017.

Are you caught up with the real-time (r)evolution?

Real-time analytics have been around since well before computers. Here's where the evolution of real time is headed in 2018

People often think of “real time” as a new concept, but the need for real-time notification and decisioning has been around forever. Gartner defines real-time analytics as “the discipline that applies logic and mathematics to data to provide insights for making better decisions quickly.” This definition has always held true, it is just the meaning and measurement of real time that has evolved. As we approach the end of 2017, it’s a good time to look at the evolution of real-time decision making and ask yourself—are you caught up?

Real time B.C. (before computers)

Before there were computers, people were finding ways to use real-time information. For instance, in 500 BC, Philippides ran 25 miles from Marathon to Athens, Greece, to alert the magistrates of the victory at the Battle at Marathon, and in 200 BC, smoke signals were used on the Great Wall of China to warn those further down the wall of enemy size and attack direction.

Fast-forward to 1775 when Paul Revere developed a real-time alerting system using lanterns. We all know the famous phrase “one if by land, two if by sea,” and after initiating this real-time alert, he rode on horseback to deliver the information to towns around Boston along with 40 other horseback riders to spread the word throughout eastern Massachusetts.

Real-time A.D. (after digital)

With the invention of computers, we upgraded from “human speed” to “machine speed.” For example, telephone call routing was initially executed by human switchboard operators, but by 1983, computers automated this process, enabling real-time voice communication for almost everyone.

People also began interacting with real-time systems, typically through an intermediary. The stock market is a good example—if you wanted to buy or sell securities, you had to make a phone call to your stockbroker who would execute the trade on your behalf using the real-time stock exchange trading software.

From an organizational standpoint, computers enabled companies to gather transactions into a database and produce monthly, weekly or even daily reports. At this early stage, nightly batch processing of data was generally viewed as “real-time data.”

Today in real time

Every day most of us interact with real-time systems and execute real-time decisions, possibly without even realizing it. For instance, going back to our stock example, trades can be executed in seconds by the consumer without a broker.

Take credit card fraud as another example. What was previously fraud detection has evolved into fraud prevention thanks to advancements in real-time technology, and preventing the unauthorized transaction from being processed can save corporations millions of dollars.

Smartphones have helped advance the capabilities of real-time processing thanks to the personalized data they can ingest. Tracking the location of consumers in real time allows companies to deliver personalized offers based on what they are close to. Mobile traffic applications like Waze provide real-time incident alerts and traffic updates to redirect drivers to the fastest route.

Tomorrow in real time

We are witnessing the commoditization of real-time information, alerts, and decisions, and these capabilities will proliferate across all application domains, including online gaming, casino gaming, financial compliance and risk, fraud prevention, digital ad technology, telecom policy and billing, and more.

As we look ahead to 2018, we’ll see real-time advancements in the following areas:

  • Self-driving cars will process data from countless sensors and ultimately get passengers to their destinations safely and efficiently, saving thousands of lives a year in the process.
  • Real-time threat detection and remedy for data security breaches and DDoS attacks will alert consumers as they occur.
  • Wearable health sensors (they already exist in your smartwatch) will not only monitor health statistics, but will also combine that telemetry with personal medical history to know when and how to notify doctors if necessary.
  • Augmented reality will become part of our everyday lives. When traveling in a foreign country, AR on cell phones and tablets will provide “live” translations of street signs and shops and integrate reviews, recommendations and advertisements on whatever signposts or buildings you scan.
The value of real time is the ability to make decisions as soon as possible based on the most readily available information. Information that used to take days to receive can now be ingested and analyzed in fractions of a second, and 2018 will only cut that time down even more, empowering companies to leverage real-time information to increase corporate profit, protect resources and even save lives.


Thursday, November 16, 2017

Real-Time Computing: 4 Sources of Latency And How To Avoid Them

This post originally appeared on InfoWorld on November 1, 2017.

4 sources of latency and how to avoid them

Even Google and Amazon can’t process data instantly—here’s how to combat latency in your real-time application


Despite all the advances we’ve seen in data processing and database technology, there is no escaping data’s Public Enemy No. 1: latency, the time delay before a response is generated and returned. Even Gartner’s definition of a zero-latency enterprise acknowledges that latency can never actually be zero because computers need time to “think.”  
While you may never truly achieve zero latency, the goal is always to deliver information in the shortest amount of time possible, so ensuring predictable, low latency processing is key when building a real-time application. Often the hardest part, though, is identifying the sources of latency in your application and subsequently eliminating them. If you can’t remove them entirely, there are steps you can take to reduce or manage their consequences.  
Before, during, and after computing the response, there are number of areas that can add unwanted latency. Below are some common sources and tips for minimizing their impact.

Network I/O

Most applications use the network in some manner, whether between the client application and the server or between server-side processes and applications. The important thing to know here is that distance matters—the closer your client is to the server, the lower the network latency. For instance, round-trip latency between nodes within the same datacenter can cost 500 microseconds, while it can be an additional 50 milliseconds for nodes in California and New York.
What to do:
  • Use faster networking, such as better network interface cards and drivers and 10GigE networking.
  • Eliminate network hops. In clustered queuing and storage systems, data can be horizontally scaled across many host machines, which can help you avoid extra network round-trip connections.
  • Keep client and server processes close together, ideally within the same datacenter and on the same physical network switch.
  • If your application is running in the cloud, keep all processing in one availability zone.

Disk I/O

Many real-time applications are data intensive, requiring some sort of database to service the real-time request. Databases, even in-memory databases, make data durable by storing it to persistent storage, but for high-velocity real-time applications, making data durable can add significant unwanted latency, and disk I/O, like network I/O, is costly. Accessing memory can be upwards of 10,000x faster than a single disk seek.
What to do:
  • Avoid writing to disk. Instead use write-through caches or in-memory databases or grids (modern in-memory data stores are optimized for low latency and high read/write performance).
  • If you do need to write to disk, combine writes where possible, especially when using fsync. The goal is to optimize algorithms to minimize the impact of disk I/O. Consider asynchronous durability as a way to avoid stalling main line processing with blocking I/O.
  • Use fast storage systems, such as SSDs or spinning disks with battery-backed caches.

The operating environment

The operating environment in which you run your real-time application—on shared hardware, in containers, in virtual machines, or in the cloud—can significantly impact latency.
What to do:
  • Run your application on dedicated hardware so other applications can’t inadvertently consume system resources and impact your application’s performance.
  • Be wary of virtualization—even on your own dedicated hardware, hypervisors impose a layer of code between your application and the operating system. When configured properly, performance degradation can be minimized, but your application is still running in a shared environment and may be impacted by other applications on the physical hardware.
  • Understand the nuances of the programming language environment used by your application. Some languages, such as Java and Go, use automatic memory management and use periodic garbage collection to reclaim unused memory. The processing impact of garbage collection can be unpredictable and introduce unwanted latency at seemingly random times.

Your code

When it comes to coding, there are some common core functionalities that can pose barriers to speed.
What to do:
  • Inefficient algorithms are the most obvious sources of latency in code. When possible, look for unnecessary loops or nested expensive operations in code—restructuring loops and caching expensive computation results usually help.
  • Multi-threaded locks stall processing and thus introduce latency. Use design patterns that avoid locking, especially when writing server-side applications.
  • Blocking operations cause long wait times, so use an asynchronous (non blocking) programming model to better utilize hardware resources, such as network and disk I/O.
  • Unbounded queues may sound counter-intuitive, but these lead to unbounded hardware resource usage, which no computer has. Limiting the queue depths and providing back pressure typically lead to less wait time in your code resulting in more predictable latencies.

Combating the enemy

Building real-time applications requires that the application developer not only write efficient code, but to also understand the operating environment and hardware constraints of the systems on which your application will be deployed. Provisioning the fastest networking equipment and the fastest CPUs won’t singularly solve your real-time latency requirements.
Thoughtful application architecture, efficient software algorithms and optimal hardware operating environment are all key considerations for fighting latency.


Tuesday, October 17, 2017

What real-time application pattern works for you?

This post originally appeared on InfoWorld on September 27, 2017.

What real-time application pattern works for you?

3 common real-time application patterns that require a real-time decision


At first glance, building a real-time application may sound like a daunting proposition, one that involves technical challenges as well as a significant financial investment, especially when you have an application goal of responding within a fraction of a second. But advances in hardware, networking, and software—both commercial as well as open source—make building real-time applications today very achievable. So what do these real-time applications look like?
This article presents three common real-time application patterns that require a real-time decision, meaning a response returned or transaction executed based on real-time input. To determine which pattern to apply to your application, you must first define your real-time objective. Ask yourself: How fast does the application need to respond?
Each application pattern addresses a particular level of real-time response: sub-millisecond, milliseconds, or 100 milliseconds and greater.

Pattern 1: Embedded applications—delivering responses in sub-milliseconds

To achieve sub-millisecond response, you need to eliminate any server-side networking and embed your application onto a computer or hardware appliance. This is the bleeding edge of real-time processing for more specialized applications that are not very common. This pattern is relevant for areas such as high frequency trading applications, nuclear power plant systems and signal processing and sensor applications.
Delivering sub-millisecond responses involves low-level programming, often at the kernel level. Standard kernels, operating systems, and device drivers can add unwanted processing overhead resulting in extra latency. Applications that care about every microsecond or nanosecond, every clock cycle, should seek to eliminate this overhead and code directly on the hardware. Alternatively, if you can withstand some additional latency, you can forgo writing low-level code and build and run your application directly on the operating system, embedding a data store such as SQLite, if needed.

Pattern 2: High speed OLTP—delivering responses in milliseconds

This is the classic client-server OLTP application architecture where a client application talks to a server-side application and database. These applications are very common —  you have likely interacted with them several times today without even realizing it. These applications detect credit card fraud, compute personalized webpages, and deliver optimized digital ads. For instance, when you use your iPhone or Android phone to make a call, run an app, or access the internet, several decisions (transactions) must be made by the telco provider before your action is allowed to occur: Is your account valid? Do you have enough quota (voice or data)? What policy should apply to the action (throttling etc.)? And each transaction must respond in milliseconds.
Optimizing network performance between the client application and the server allows for low-latency responses for high-speed OLTP application patterns. Low-cost gigabit ethernet (GigE) and relatively low-cost 10GigE networking is readily available to most application developers. Network performance can be further optimized by keeping the application on the same network switch or rack as the server or minimally on the same LAN. In other words, keep the client and server in close proximity. Within the server, the application and database usually minimize blocking disk I/O, either by avoiding it completely, by applying sequential I/O, or by using advanced storage such as SSDs or the newly emerging non-volatile RAM.
One additional point worth noting is that with next generation in-memory data stores and caches, it is even possible to achieve low single-digit millisecond latency with highly available clustered data stores; that is, databases and systems spanning multiple nodes or processes.  Today, many shared-nothing, in-memory databases, data grids, and NoSQL stores offer highly available data stores with predictable low latency (often single-digit millisecond) response times.

Pattern 3: Streaming fast data pipelines—delivering responses in seconds

A fast data pipeline, historically rooted in complex event processing (CEP) applications, is becoming a more broadly deployed real-time application pattern today. In this application pattern, a never-ending stream of immutable events is being ingested with real-time analytics applied.
Typical applications have a queuing or streaming system that delivers events, ultimately feeding the data lake, managed by Hadoop, Spark, or a data warehouse. Before arriving at the historical archive, the event stream is processed by a fast data store or computational engine. It is the role of this engine to aggregate, dedupe, and compute real-time analytics on incoming events and generate real-time alerts or decisions as required. The analytics are often displayed on a dashboard, and alerts or decisions are generated. A person or business process reacts to the alert, in human speed. A few seconds is often enough time to ensure any late data has arrived to inform the decision.
In this pattern, data flows in one direction. This real-time engine often holds a predetermined amount of “hot data,” either in the form of continuously computed analytics or a database of the last hour, day, or week’s worth of data. Older data is delivered to the historic data lake or data warehouse.
Advances in queuing systems like Kafa, in-memory databases, data grids, and NoSQL data stores make implementing this pattern possible. This pattern has broad usage across the internet of things (IoT), electric smart grids, log file management, and mobile in-game analytic processing, among others. We’ll be seeing more of this pattern in future applications.

The age of real time is now

If you are just starting out with your real-time application, first consider what response rate your problem domain requires. If it requires sub-millisecond response, consider an embedded application. If your application is high-velocity OLTP, explore high-performance network configurations and new offerings in low-latency data store and in-memory database technology. If you need to handle relentless streams of data, consider a fast data-pipeline architecture.
Low-cost computing, readily accessible high-speed networking, and numerous open source and commercial data storage software offerings capable of low-latency data processing means that real-time applications are no longer out of reach.


Thursday, September 14, 2017

This post originally appeared on InfoWorld on August 28, 2017

 Measuring real-time

I explore the different types of real-time applications and how real-time is measured based on the application need

The term “real-time” is thrown around a lot these days, but it’s a buzzword that is often surrounded by ambiguity. Every day, it seems a new product is announcing its real-time capability. But how is real-time measured? It certainly isn’t measured in days (or even hours)—so is it measured in:
  • Nanoseconds?
  • Microseconds?
  • Milliseconds?
  • Seconds?
  • Minutes?
  • All of the above?
Everyone, from developers to software corporate marketing departments to even consumers, seems to have a slightly different answer. So let’s explore the question “What does ‘real-time’ really mean?”
Let’s begin with the dictionary definition:
Real-time—“of or relating to applications in which the computer must respond as rapidly as required by the user or necessitated by the process being controlled.”
While this definition continues with the subjective theme, it does confirm that the correct answer to how to measure real-time is “All of the above.” The meaning of the term real-time varies based on application need—the amount of time a computer (the application) takes to respond and the acceptable latency is as fast as required by the problem domain.
Rather than look at applications and determine if they are real-time or not, let’s examine various time units and understand the types of real-time applications that require those response rates:
Nanoseconds: A nanosecond (ns) is one billionth of a second. Admiral Grace Hopper famously explained a nanosecond using an 11.8-inch wire, as that is the maximum distance electricity can travel in one nanosecond. This quick video of Hopper is worth watching if you haven’t yet seen it.
With this in mind, it is easy to see why nanoseconds are the unit used to measure the speed of hardware, such as the time it takes to access computer memory. Worrying about nanosecond latency is at the bleeding edge of real-time computing and is primarily driven by innovation with hardware and networking technology.
Microseconds: A microsecond (┬Ás) is one millionth of a second. Real-time applications that worry about microsecond latency are high-frequency trading (HFT) applications. Financial trading firms spend large sums of money investing in the latest networking and computer hardware to eliminate microseconds of latency within their trading platforms. A trading decision has to be made in as few microseconds as possible in order to execute ahead of competition and thus maximize profit. 
Milliseconds: A millisecond (ms) is one one-thousandth of a second. To put this in context, the speed of a human eye blink is 100 to 400 milliseconds, or between a 10th and half of a second. Network performance is often measured in milliseconds. Real-time applications that worry about latency in milliseconds include telecom applications, digital ad networks, and self-driving cars. The decision on what optimal ad to display or whether there is enough balance to let a cellphone call proceed must be made on the order of 100 milliseconds.
Seconds: We’re starting slow down here. We’re still in the realm of real-time, but we are now venturing into near real-time. Sub-minute processing time is often more than good enough for applications that process log files, computing analytics on event streams, as well as alerting applications. These real-time applications drive actions and decisions that are made in human-reaction time rather than machine-time. Reducing the response time by one tenth of a second (100ms), which may be costly to implement, has no change in value for the application.
Minutes: Waiting minutes may seem like an eternity to a high-frequency trading application. However, consider package shipment and delivery alerts or ecommerce stock availability notifications. Those applications certainly feel real-time to me—the fact that I receive a “delivery notification” text message within 10 minutes of a delivery made to my home is very satisfying.
Finally, though I discounted it up front, let’s briefly consider hours and days. While this time range is generally not regarded as true real-time, if you’ve been getting finance or sales reports on a monthly, weekly, or daily basis, and now you can get up-to-date reports every hour, that may be as real-time as you need. The modernization of these applications is often termed as upgrading from “batch” to “real-time.”
The old proverb is correct: Time is money. Throughout history, the ability to make real-time decisions has meant the difference between life and death, between profit and loss. The value of time has never been higher and therefore speed has never been more critical to business applications of all kinds.
Luckily, we live in an age where fast computing is very affordable and making decisions in real-time is economically achievable for most applications. The first step is determining the appropriate definition of real-time that aligns with the needs of your business applications.

Friday, August 11, 2017

Fast Data Pipeline Design: Updating Per-Event Decisions by Swapping Tables

Fast Data Pipeline Design: Updating Per-Event Decisions by Swapping Tables

VoltDB was one of the first companies to enable a new modern breed of applications, applications that combine streaming, or “fast data”, tightly with big data.We call these applications Fast Data Pipelines.
First, a quick high-level summary of the fast data pipeline architecture:
Fast Data Pipeline

The first thing to notice is that there is a tight coupling of Fast and Big, although they are separate systems. They have to be, at least at scale. The database system designed to work with millions of event decisions per second is wholly different from the system designed to hold petabytes of data and generate Machine Learning (ML) models.
There are a number of critical requirements to get the most out of a fast data pipeline. These include the ability to:
  • Ingest / interact with the data feed in real-time.
  • Make decisions on each event in the feed in real time
  • Provide visibility into fast-moving data with real-time analytics
  • Seamlessly integrate into the systems designed to store Big Data
  • Ability to deliver analytic results (mined “knowledge”) from the Big Data systems quickly to decision engine, closing the data loop. This mined knowledge can be used to inform per event decisions.
Hundreds of Fast Data Pipeline applications have been built and deployed using VoltDB as the fast operational database (the glue) between Fast and Big. These applications provide real-time decisioningengines in financial fraud detection, digital ad tech optimization, electric smart grid, mobile gaming and IoT industries, among others.
This blog is going to drill into how to implement a specific portion of this fast data pipeline, namely the last bullet: the ability to close the data loop, taking knowledge from a Big Data system and applying this knowledge, online, to the real-time decision engine (VoltDB).

Closing the Data Loop

“Per-event decisioning” means that an action is computed for each incoming event (each transaction).  Usually some set of facts informs the decision, often computed from historical data. These “facts” could be captured in machine learning models or consist of a set of generated rules to be executed on each incoming event. Or these facts could represented as rows in a database table, used to filter and generate optimized decisions for each event. This blog post will focus in on the latter, storing and updating facts represented in database tables.

When storing facts in database tables, each row corresponds to some bit of intelligence for a particular value or set of values.  For example, the facts might be a pricing table for airline flights, where each row corresponds to a route and service level.  Or the values might be list of demographic segmentation buckets (median income, marital status, etc) for browser cookies or device ids, used to serve up a demographic-specific ads.

Fact tables are application-specific, can be simple or sophisticated, and are often computed from an historical “big data” data set such as Spark, Hadoop, or commercial data warehouse, etc.  Fact tables can often be quite large and can be frequently recomputed, perhaps weekly, daily, or even hourly.

It is often important that the set of facts changes atomically.  In other words, if airline prices are changing for ten’s of thousands of flights, all the prices should change all at once, instantly. It is unacceptable that some transactions reference older prices and some newer prices during the period of time it takes to load millions of rows of new data.  This problem can be challenging when dealing with large fact tables as transactionally changing millions of values in can be a slow, blocking operation. Locking a table, thus blocking ongoing operations, is unacceptable when your application is processing hundreds of thousands of transactions per second.

VoltDB solves this challenge in a very simple and efficient manner.  VoltDB has the ability to transactionally swap tables in a single operation.  How this works is as follows:

  1. Create an exact copy of your fact table schema, giving it a different name. Perhaps Facts_Table and Facts_Table_2.

  1. Make sure the schemas are indeed identical (and neither is the source of a view).

  1. While your application is running (and consulting rows in Facts_Table to make decisions), populate Facts_Table_2 with your new set of data that you wish future transactions to consult. This table can be populated as slowly (or as quickly) as you like, perhaps over the course of a day.

  1. When your Facts_Table_2 is populated, and you are ready to make it “live” in your application, call the VoltDB System Procedure @SwapTables. This operation essentially switches the data for the table by swapping internal memory pointers. As such it executes in single to sub millisecond range.

  1. At this point, all the data that was in Facts_Table_2 is now in Facts_Table, and the old data in Facts_Table now resides in Facts_Table_2.  You may consider truncating Facts_Table_2 in preparation for your next refresh of facts (and to reduce your memory footprint).

Let’s look at a contrived example using the VoltDB Voter sample application, a simple simulation of an ‘American Idol’ voting system. Let’s assume that each day you are going to feature different contestants for which callers can vote. Voting needs to occur 24x7, each day, with new contestants. The contestants change every day at midnight. We don’t want any downtime - no maintenance window, for example  - when changing our contestant list.

Here’s what we need to do to the Voter sample to effect this behavior:

  1. First we create an exact copy of our CONTESTANTS table, calling it CONTESTANTS_2:

-- contestants_2 table holds the next day's contestants numbers -- (for voting) and names
CREATE TABLE contestants_2
 contestant_number integer     NOT NULL
, contestant_name   varchar(50) NOT NULL

2. The schemas are identical, and this table is not the source of a materialized view.

3. The Voter application pre-loads the CONTESTANTS table at the start of benchmark with the following contestants:

1> select * from contestants;
------------------ ----------------
                1 Edwina Burnam   
                2 Tabatha Gehling
                3 Kelly Clauss    
                4 Jessie Alloway  
                5 Alana Bregman   
                6 Jessie Eichman  

$ cat contestants_2.csv
1, Tom Brady
2, Matt Ryan
3, Aaron Rodgers
4, Drew Brees
5, Andrew Luck
6, Kirk Cousins

$ csvloader contestants_2 -f contestants_2.csv
Read 6 rows from file and successfully inserted 6 rows (final)
Elapsed time: 0.905 seconds
$ sqlcmd
SQL Command :: localhost:21212
1> select * from contestants_2;
------------------ ----------------
                1 Tom Brady       
                2 Matt Ryan       
                3 Aaron Rodgers   
                4 Drew Brees      
                5 Andrew Luck     
                6 Kirk Cousins    

(Returned 6 rows in 0.01s)

4. Now that we have the new contestants (fact table) loaded and staged, when we’re ready (at midnight!) we’ll swap the two tables, making the new set of contestants immediately available for voting without interrupting the application. We’ll do this by calling the @SwapTables system procedure as follows:

$ sqlcmd
SQL Command :: localhost:21212
1> exec @SwapTables contestants_2 contestants;

(Returned 1 rows in 0.02s)
2> select * from contestants;
------------------ ----------------
                6 Kirk Cousins    
                5 Andrew Luck     
                4 Drew Brees      
                3 Aaron Rodgers   
                2 Matt Ryan       
                1 Tom Brady       

(Returned 6 rows in 0.01s)

5. Finally, we’ll truncate the CONTESTANTS_2 table, initializing it once again ready to be loaded with the next day’s contestants:

$ sqlcmd
SQL Command :: localhost:21212
1> truncate table contestants_2;
(Returned 6 rows in 0.03s)
2> select * from contestants_2;
------------------ ----------------

(Returned 0 rows in 0.00s)

Note that steps 3-5, loading, swapping, and truncating the new fact table, can all be done in an automated fashion, not manually as I have demonstrated with this simple example.

Running the Voter sample and arbitrarily invoking @SwapTables during the middle of the run yielded the following results:

A total of 15,294,976 votes were received during the benchmark...
- 15,142,056 Accepted
-   152,857 Rejected (Invalid Contestant)
-        63 Rejected (Maximum Vote Count Reached)
-         0 Failed (Transaction Error)

Contestant Name Votes Received
Tom Brady      4,472,147
Kirk Cousins 3,036,647
Andrew Luck      2,193,442
Matt Ryan      1,986,615
Drew Brees      1,963,903
Aaron Rodgers 1,937,391

The Winner is: Tom Brady

Apologies to those not New England-based! As you might have guessed, VoltDB’s headquarters are based just outside of Boston, Massachusetts.

Just the Facts, Ma’am

Leveraging big data intelligence to make per-event decisions is an important component of a real-time decision engine within your data pipeline. When building fast data pipeline applications using VoltDB, VoltDB provides tools and functionality to make this process easy and also painless to a running application. Two key tasks need to be performed: loading your new fact table into VoltDB, and atomically making that new data “live” to your business logic.
Loading data into VoltDB from an external data source can be done easily via a couple of approaches: you can use one of our loaders such as the CSV, Kafka or JDBC loader; or you can write an application to insert the data.
Swapping tables in VoltDB is a trivial exercise with the @SwapTable system procedure. And most importantly, swapping in new fact table data does not impact ongoing stream processing.