Today’s world is driven by data. Data is the main source of income for many technology leaders of the world today. Google’s main source of income, directly or indirectly is dependent upon data only.
A huge amount of Data(in range of millions of GBs) is generated every day and is analysed daily to give users the best experience on a specific platform.
So how this data is stored, analysed, extracted, are all part of data warehousing. So what exactly is this data warehousing? In this post we will cover the basics what data warehousing is all about.
Data warehousing is the concept of building and developing a data warehouse.
“A data warehouse is the storage area where data come from various heterogeneous resources and is stored after doing various manipulations such as changing the structure of data, eliminating redundant data, normalizing etc.”
This data is then made available for analytical reporting, ad hoc queries, and decision-making.
Decisions are made by management by analysing data which is available in the data warehouse. Various technologies such as Big Data, machine learning can be applied to this data easily (because it is in structured form) for making useful reports.
Consider it as useful data in the structured format which is stored in a centralised location. Updation of this data can be made on daily basis or weekly basis depending upon the requirement.
For example: Suppose in an e-commerce website, thousands of transactions happen on a daily basis. So data can be selected from these transactions using queries to update it into a data warehouse for analytical analysis like sales in a particular region, sales by a particular customer etc.
The data warehouse is a separate entity and is kept separate from the operational database of an organisation.
In simple terms, it posses historical data and helps business in making decisions in an intelligent manner.
So when we talk about the difference between OLTP and OLAP, a data warehouse is an OLAP system whereas an operational database is an OLTP system.
Need of Data warehouse
Now one can think what is the use of centralised data warehouse when all the tasks performed by it can be done through different databases and other means.
So let’s dive into the benefits of having a data Warehouse:
- It is much more convenient to store relevant data in a single database. This data can be accessed using single query engine.
- Having a central view of data across the enterprise or complex system. This can be of great help when the size of organisation is quite large.
- Decisions are made more quickly using this data as it is available at a faster pace.
- Restructuring of data can be done in a data warehouse without changing the structure of the data in the production.
- Data warehouse provides timely access to data and enhanced business intelligence.
- The data warehouse enables business users and decision makers to have access to data from many different sources as they need to have access to the data. Additionally, business users will spend little time in the data retrieval process.
- It increases the productivity of corporate decision-makers by creating an integrated database of consistent, subject-oriented, historical data.
Stages in Data warehousing
There are various stages in data warehousing in which data moves. Here are the stages in which a construction of a data warehouse is divided:
1. Offline Operational Databases:
This is the initial stage of database warehousing. In this stage data from an operational source is copied to an offline server simply. Simple database replication is done in this process.
2. Offline Data warehouse:
In this stage, data is updated regularly from an operational system and is stored in an integrated report oriented structure.
3. Real-time Data Warehouse:
In this stage, data warehouse is updated on an event-driven method. For example, whenever an order is executed or services are delivered, the database is updated. This is completely dependent upon the operational source. When this source is updated, data in data warehouse is also updated. Hence a real-time system is achieved.
4. Integrated Data Warehouse
The activity or transactions generation which are passed back into the operational system is done in this stage. These transactions or generated transactions are used in the daily activity of the organization.
Functions of Data warehouse tools
Since developing a data warehouse is not an easy task, it involves tools and utilities for integrating data from source to other systems. Here are the main functions which are performed during developing a data warehouse.
Data is gathered from multiple heterogeneous sources. Complex queries are used generally for getting this data. Only data which is relevant for the data warehouse is extracted from operational systems. In some cases, the whole database is replicated to the data warehouse.
Data is cleaned and errors in data are removed. For example: If a row in a database contains unstructured data which do not meet the standards of the table in data warehouse, then it is cleaned and error-free data is obtained.
After data is cleaned, it is transformed into a specific format which is in accordance with the data warehouse format.
Involves loading, sorting, checking integrity, and building indices and partitions on data in data warehouse.
5.Refreshing of data
Data is updated from the source on the weekly or monthly basis.