The Big Data Ecosystem and Better Visualization
This is the second blog of a three part blog series on Big Data Eco-systems. As a quick refresher, last week we connected business-technology domains through a hierarchical set of business terms along with a series of visualizations representing a 24 hour stream of continuous sensor data.
This week’s example, “The Round-table Big Data Eco-system”, will use the same technology domains shown and discussed last week, but this time the BD/BI directives will be using a circular Eco-system framework. The round-table approach incorporates high level visualizations of the evolution of discrete and asynchronous data to business solution. We explore how even the stakeholders sitting at the table contribute to the business-technology derivatives at the center of the eco-system.
The figure below emphasizes the visualization aspects, it begins with the business development task initiated at step 1. We will track the solution in a CCW loop to the solution given in step 6. In doing so the roles of the solution stakeholders (BD, System Architecture, ETL, Modeling, and BI) are openly portrayed. An “Eco-System” box is placed in the center and acts as the place for stakeholder contributions and the place where discussions are initiated. The entries to this box may range from Quality Assurance issues to unforeseen gleanings for the BI/BD group.
The task is to develop a schedule reflecting a customers’ home activity. Sensors within the home were used to populate a spreadsheet of activity times in step 2. At this point, in a legacy system, where the activity variable is well known, a simple model would have sufficed to directly provide the answer in step 6. However, we believe filling in the details, which we describe below, enriches the whole process and places the stakeholders on a stage with empowerment and responsibility. We ask one thing as we continue this journey. Patience. It is not easy at first sight but soon it will be the only way you want to go.
With Big Data, CGI knows the multiple sources and behaviors are not, in general, consistent. For this example we took the spreadsheet information and engineered a visualization in step 3 that allowed us to see at a very high and yet intelligible level. To do this we actually increased the number of data points by an order of magnitude. We aligned 1,000 customers in a matrix that included 2,500 time stamps running down the side. The term “matrix” shouldn’t scare anyone away, instead, it should be embraced. The organization a matrix can offer with 2.5 million numbers in one image is astounding.
We would like to note that these matrices are drawn with Matlab, an engineering/visualization tool, but recognize there are other visualization tools that do a good job too.
Activity from the spreadsheet is indicated in the matrix display by a yellow dot. A blue dot means no activity. A subtle point to catch is that the spreadsheet only shows “what is” whereas our matrix shows “what is and what is not”. Already, we begin to incur deeper perspective through diversity.
With these two simple on/off indicators, the round-table can begin to get a feel for the information. In step 3 there are vertical columns entirely coded in blue, meaning no activity was listed at all. In step 4 we reorganize the columns according to amount of activity. We find out that in fact there are approximately 400 columns (customers) without activity, as indicated with the red box. The green and yellow boxes indicate gaps in activity that too would be difficult to see with a spreadsheet approach.
The plot in step 4 is very important. We see this as a pivotal point in the solution where the pieces of the solution are exposed in a very raw form. It is the main feeder point into the Eco-System box directly above it. At this point all of the round-table stakeholders should be engaged with each other and forming discussion points.
The solution continues with taking a well-defined segment of information from step 4 and feeding it to the modelers in step 5. Now it is apparent that the modeling is based on approximately 200 home sand not the 1,000 homes listed in the spreadsheet.The plot shown in step 5 exhibits the several stages of the modeling process used to produce a solution. These stages are the impetus for feeding the BI/BD portion in the Eco-System since it contains information about customer activity beyond the BD work schedule task. A solution is passed on to step 6 in the form of a plain text work schedule and one of the plots from step 5.
In closing, with the loop from (1) to (6) we hope to have convinced you that with the help of a good visualization tool there comes novel visualization techniques that can express information and encourage cross-stakeholder conversation in ways that are complimentary to the approach shown in last week’s blog. We really do need both approaches.
Check back next week for the final blog in this series, an audio/visual computer perspective on Big Data.