In the age when every keystroke on your keyboard or swipe on your phone is tracked the era of Big Data is thriving. The advent of Microsoft Azure in 2008 allowed the Healthcare Industry to finally have access  information that, up until that point, had only been accessible via large companies such as IBM. The ability for the Healthcare Industry to pull information based on mass amounts of accurate data was nothing short of revolutionary.

The advent of this mammoth data machine altered the face of both the for-profit and non-profit sector.  It changed the way nearly all organizations worked and created entirely new industries. With the addition and popularity of mobile applications in the late 2000’s the business of tracking data all but exploded. Soon preventative health was being tackled by companies such as Fitbit which created a personal activity tracker which measures and tracks heart rate, sleep activity and number of steps walked.

Data Lakes - The Next Big Thing

The flood of data coming in, literally, from all corners of the world was organized into countless institutional Data Warehouses. Early industry predictors indicated that this mass amount of data would lead to healthcare researchers quickly uncovering information that could lead to cures or treatments. While this newfound data assisted greatly, flaws in the Data Warehouse concept were soon discovered.

The modern concept of the Data Warehouse began in the late 1980’s. IBM’s Systems Journal article published in 1988 coined the term “business data warehouse”. Bill Inmon (the ‘father’ of data warehousing) began to discuss Data Warehouses as far back as the 1970’s and in the early 1990’s published the industry bible Building the Data Warehouse. Inmon’s model for data warehousing concentrates on a centralized data repository.

Healthcare providers and researchers began to realize that this model meant accessing the data proved much more difficult and often it was not helpful to their research.  The main issue they faced was that the Data Warehouses were designed and controlled by a diverse range of operators. These individual operators could range from hospitals to research centres. These Data Warehouses employed the concept of ‘schema on write’, meaning that the data is organized as it is added to the warehouse. In fact, data is not even loaded until its eventual use is determined. For healthcare providers and researchers this method meant that they had to rely on countless institutions and their respective warehouse designs.  The information culled from disparate Data Warehouses produced at times inconsistent and conflicting data. Also, the ‘schema on write’ method prevented data from being entered in a timely manner; all information would first have to be surveyed and analyzed through individual systems. Healthcare leaders realized what they needed was access to unstructured data that they could analyze on their own timeline.

Data Lakes - DapasoftThe concept of Data Lakes was born.

A Data Lake is a storage system that is able to hold mass amounts of data, but unlike the Data Warehouse with its structured, hierarchical format, the Data Lake holds raw data intentionally eschewing up-front formatting to provide users unfiltered access to the most up to date information. Data Lakes use the concept of ‘schema on read’; data is not analyzed until the end-user accesses it.

Therefore, with Data Lakes at their disposal the Healthcare Industry are not constrained by institutional schemas. While it is logical that hospitals worldwide have created their own Data Warehouses based on their own understanding of what was required by the front-end user, naturally each institutional Data Warehouse would be managed by different teams of people whose intake process for the Data Warehouse can inherently cause wide gulfs in how information is analyzed. In contrast, the Data Lake allows users to pull raw healthcare data unburdened by (if well meaning) ineffective filters.

Data Lakes provide numerous advantages over Data Warehouses for the Healthcare Industry beyond data capture.

Healthcare spending in Canada now runs into the billions of dollars annually. A portion of this cost is infrastructure spending to operate Canadian healthcare institutions including their IT operations and data storage. Adopting the use of Data Lakes greatly minimizes the costs associated with data capture and storage. Not only do operators save costs on the physical assets required for storage, but they can avoid the cost of hiring specialized staff for schematic design and data input.

Data Lakes also allows practitioners to provide patients with Precision Medicine.  Precision Medicine is an emerging medical concept that proposes tailoring healthcare to individual patients. Using Data Lakes and previously mentioned health applications such as the Fitbit personal health tracker, the ability for capturing unfiltered health information from individuals and its timely analysis can now have immediate impact for patients. By its very definition, Data Lakes provide the most open, agile format for end users.

The Healthcare Industry can now take advantage of Data Lakes supported by Microsoft Azure.

Azure Data Lakes will enable the Healthcare Industry to create repositories where their data can be held without constraint. Data of any size or format can be held at a much lower cost, and these savings can be used toward providing improved patient care. Health practitioners and researchers can also access data in real-time increasing the speed in which to apply this knowledge to produce real-world results. The Azure Data Lakes also enable users to invest in new technology without concern that this investment will not sync with their current Data Warehouse.

Big Data provided the Healthcare Industry volumes of structured information that influenced practitioners and researchers alike.  Azure Data Lakes is the bold next step and the future of Healthcare Data.