Tech Volcano: Architecture

Showing posts with label Architecture. Show all posts

Sunday, 8 February 2015

Part 4: Types of Data Warehouses

Posted by Unknown on 8:57:00 pm with No comments

In this series, the previous post was about Part 3: Types of Data Warehouse Architectures. Please go through it, if you missed it.

A data warehouse is a relational/multidimensional database that is designed for query and analysis rather than transaction processing. A data warehouse usually contains historical data that is derived from transaction data. It separates analysis workload from transaction workload and enables a business to consolidate data from several sources.
In addition to a relational/multidimensional database; a data warehouse environment often consists of an ETL solution, an OLAP engine, client analysis tools, and other applications that manage the process of gathering data and delivering it to business users.

There are three types of data warehouses:

Enterprise Data Warehouse

ODS (Operational Data Store)

Data Mart

Enterprise Data Warehouse

In an enterprise Data Warehouse, the entire data of the organisation is fetched from the heterogeneous data sources into the ODS, then the data is loaded in to the data warehouse after cleansing and applying business transformations.

The data marts are built from the data warehouse. Each data mart will hold the data specific to the departments in the organization. Each user group will have access to the respective data marts. This mechanism helps to secure the data by avoiding unnecessary access and it also speed up the report generation.The user groups uses analytic tools to generate the reports from the respective data marts.

ODS (Operational Data Store)

ODS is the place where the data from multiple heterogeneous sources are stored for further processing. Data is loaded to the ODS in batches. Data once loaded in the ODS will not go back to the OLTP / Source / Originating system. Some amount of cleansing and standardization is applied to the data before storing it in the Operational Data Store. It act as a single version of truth to the organization. It enables the user to get all the organizational information at one place. ODS is not a data warehouse. But it act as an intermediate storage in the data warehouse. In the above diagram, the data from files, CRM, ERP, Web Service, etc are loaded into the ODS using ETL process.

Data Mart

Data Mart is a subset of Data Warehouse. There are 2 different approaches in Data Warehouse.

Top-Down Approach

Top-Down Approach is introduced by Bill Inmon. In this approach, the data warehouse is created first. Then the data marts are created from the broad data warehouse. For example Sales, Purchase, Marketing, Human Resource, etc.

Advantages

The entire organizational data will be available to uses from the data warehouse.

Disadvantages

Development time is more.
Scalability is complex.

Bottom-Up Approach

Bottom-Up Approach is introduced by Ralph Kimball. In this approach, the data marts are created first. Each data mart serves a particular business area's reporting requirements in an organization. For example Sales, Purchase, Marketing, Human Resource, etc.

Once the data marts are created, all these data marts are integrated together to form the data warehouse.

Advantages

Development time is minimal since it will not create the entire data warehouse at once.
Faster report availability.
Highly scalable.

Disadvantages

The entire information of the organization will not be available since the data is scattered in different data marts.

Thanks for reading my blog.

Part 3: Types of Data Warehouse Architectures

Posted by Unknown on 8:50:00 am with No comments

Data Warehouse Architectures

In this series, the previous post was about Part 2: Data Warehousing Concepts - OLTP Vs Data Warehouse. Please go through it if you did not read.

Below are some of the widely used Data Warehouse architectures.

Basic Data Warehouse Architecture
Data Warehouse Architecture with Staging Area
Data Warehouse Architecture with Staging Area and Data Marts

Basic Data Warehouse Architecture

In the Basic Data Warehouse, the data from the transaction system (OLTP) is loaded in to the Data Warehouse and finally it reaches the end users.

http://www.amazon.in/b?node=1592905031&ref_=as_acmain_dotd&tag=techvolc0d-21

Advantages

Development effort is comparatively less.
By the use enterprise Data Warehouse, provide single version truth for analysis.

Disadvantages

Depends on OLTP system.
OLTP down times affects the Data Warehouse load.
Data Warehouse load may slow down the OLTP performance.
Data needs to be loaded before the next OLTP up time.

Data Warehouse Architecture with Staging Area

In the Data Warehouse with Staging Area, the data from the OLTP System is loaded in to a Staging Area first. The loaded in the Staging Area is cleansed to some extent and the cleansed data is loaded to the Data Warehouse. The frequency of load will be daily, weekly, monthly, etc based on the business requirement.

Advantages

Less dependent on OLTP System since the data is staged.
OLTP Down times will not affect the Data Warehouse load.
Will get enough time to complete the Data Warehouse load.
Data Warehouse load will not affect the OLTP performance.

Disadvantages

Development time is more.

Data Warehouse Architecture with Staging Area and Data Marts

In the Data Warehouse with Staging Area and Data Mart, architecture is same as the previous architecture. The only difference here is the Data Marts. A data mart is a subset of data warehouse that is designed for a particular line of business; such as sales, Purchase, Stock, etc. Each data mart is built from the Enterprise Data Warehouse. Only some group of users will have access to the specific Data Marts. For example, Only Analysts from Sales department will have access to the Sales Data Mart.

Advantages

Less dependent on OLTP System since the data is staged.
OLTP Down times will not affect the Data Warehouse load.
Will get enough time to complete the Data Warehouse load.
Data Warehouse load will not affect the OLTP performance.
User specific Data Marts will speed up the performance for each department.

Disadvantages

Development effort is more.
Not all department will have access to the entire data or Enterprise Data Warehouse.

In my next post, I will explain about the types of Data Warehouses.

If my post helped you understand the concept. Please like and share with your friends and also share your comments with me to make the content better. Thanks for visiting my Blog.

Part 2: Data Warehousing Concepts - OLTP Vs Data Warehouse

Posted by Unknown on 8:25:00 pm with No comments

OLTP Vs Data Warehouse

In this series, the previous post was about Part 1: Data Warehousing Concepts. Please go through it if you did not read.

OLTP and Data Warehouse have many differences in the architecture, process and performance. Here are the differences on various aspects.

Performance
Data Modification
Design
Operation
Historical Data

Performance

Data Warehouse are developed to accommodate adhoc queries. So data warehouse should be optimized to perform well for any variety of possible query operations. Data Warehouse will perform better on queries. Because of the indexes define on the dimensional tables, less number of joins and denormalized tables.

OLTP System is developed to support only predefined set of operations. So the database will be tuned only for those operations. For example, the transactions records are inserted, updated or deleted. If the index exists in the database, it will degrade the insert, update and delete operations.

Data Modification

A data warehouse is updated on a regular basis by the ETL process using bulk data load techniques. The end users of a data warehouse do not directly update the data warehouse.

In Data Warehouse, the data is loaded on bulk mode and the end user will not change any data.

Design

Data warehouses often use denormalized or partially denormalized schema (such as a star schema) to optimize query performance.

OLTP systems often use fully normalized schema to optimize update/insert/delete performance, and to guarantee data consistency. In OLTP systems, end users routinely issue individual data manipulation statements to the database. The OLTP database is always up to date, and reflects the current state of each business transaction.

Operation

Data Warehouse fetches millions of records at one shot. Whereas the OLTP fetches only handful of records at once.

Historical Data

Data warehouses usually store many months or years of data. This is to support historical analysis.

OLTP systems usually store data from only a few weeks or months. The OLTP system stores only historical data as needed to successfully meet the requirements of the current transaction.

In my next post, I will be explaining the different Data Warehouse Architectures.

If my post helped you understand the topic, please like and share and also share your comments with me to make the content better. Thanks for visiting my blog.

Part 1: Data Warehousing Concepts

Informatica Architecture

Posted by Anonymous on 1:21:00 am with No comments

Informatica ETL product, known as Informatica Power Center consists of 3 main components.

1. Informatica PowerCenter Client Tools:
These are the development tools installed at developer end. These tools enable a developer to

Define transformation process, known as mapping. (Designer)
Define run-time properties for a mapping, known as sessions (Workflow Manager)
Monitor execution of sessions (Workflow Monitor)
Manage repository, useful for administrators (Repository Manager)
Report Metadata (Metadata Reporter)

2. Informatica PowerCenter Repository:
Repository is the heart of Informatica tools. Repository is a kind of data inventory where all the data related to mappings, sources, targets etc is kept. This is the place where all the metadata for your application is stored. All the client tools and Informatica Server fetch data from Repository. Informatica client and server without repository is same as a PC without memory/harddisk, which has got the ability to process data but has no data to process. This can be treated as backend of Informatica.

3. Informatica PowerCenter Server:
Server is the place, where all the executions take place. Server makes physical connections to sources/targets, fetches data, applies the transformations mentioned in the mapping and loads the data in the target system.

This architecture is visually explained in diagram below:

This serves the basic knowledge to start with Informatica. Happy learning !!!

Tech Volcano

Anything and everything in Information Technology !

Sunday, 8 February 2015

Part 4: Types of Data Warehouses

Enterprise Data Warehouse ODS (Operational Data Store) Data Mart

Enterprise Data Warehouse

ODS (Operational Data Store)

Data Mart

Top-Down Approach

Bottom-Up Approach

Thursday, 15 January 2015

Part 3: Types of Data Warehouse Architectures

Data Warehouse Architectures

Basic Data Warehouse Architecture

Data Warehouse Architecture with Staging Area

Advantages

Disadvantages

Data Warehouse Architecture with Staging Area and Data Marts

Advantages

Disadvantages

Saturday, 10 January 2015

Part 2: Data Warehousing Concepts - OLTP Vs Data Warehouse

OLTP Vs Data Warehouse

Performance

Data Modification

Design

Operation

Historical Data

Part 1: Data Warehousing Concepts

Wednesday, 8 May 2013

Informatica Architecture

Referral Programme

Ad

Visits

Blogger templates

Data Warehousing Tutorial Download

Popular Posts

Ad

Categories

Bloggers

Blog Archive

Enterprise Data Warehouse

ODS (Operational Data Store)

Data Mart