Enterprise Data Warehouse Migration to Cloud – The Software Testing Perspective

By: Artem Grechishnikov, Senior QA Project Manager, Exactpro Jul 2023

Transitioning operations to a cloud infrastructure is a challenge that firms face on their digital transformation journey. Cloud-driven modernisation in the financial-services space involves migrating core parts of the enterprise-grade IT infrastructure, such as the data repository. It entails fundamental changes in the system’s business logic and architecture, data flows, the Data Base Management System (DBMS) functionality, warehouse operation, as well as the dependencies. The process demands alignment, coordination and careful planning within the organisation.

However, setting up the right migration strategy is only getting half way—implementing it with no data loss or redundancies and with minimal performance issues in production is what a trading, clearing/settlement or core banking platform operator is looking to achieve. Developing end-to-end software testing expertise at an early stage of the cloud migration design and planning helps ensure the quality and reliability of the new cloud infrastructure.  

Ahead of the migration, it is also crucial to thoroughly assess and understand the technology characteristics of your preferred cloud provider, as there are differences between operators. This step helps shape the underlying processes going forward and gives an understanding of the level and extent of software testing able to support the transition. Let this article serve as a brief review of the aspects not to be missed.

What are the forces driving driving data services usage in the cloud?

Technology

Greater infrastructure flexibility – eliminating the need to invest in self-hosted hardware-based data centre infrastructures, cloud computing resources enable efficient, pay-as-you-go resource management with an option to deploy new environments, and scale the capacity up or down in a matter of minutes.

Regulatory Compliance

Advanced analytics and record keeping – the option of unlimited cloud storage makes data control straightforward and the capacity for data retention infinite. The side benefit of the cloud setup is the ability to analyse the entire volume of data for patterns and introduce advanced market surveillance mechanisms.

Audit/certification documents – cloud providers participate in various compliance programs and have the required documentation in place.

Product

More room for innovation – new services and technologies can be experimented with, sandbox-tested, adopted or terminated, without having to set up additional hardware or software and eliminating the need for approval chains impeding the process of deploying extra resources.

Client 

Lower latency – a global distributed cloud infrastructure enables deployment of applications in regions close to customers and expanding to new regions as the business grows. 

Data protection (GDPR, CCPA, CRPA, etc.) – can be ensured by using the appropriate cloud region. 

Improved security – cloud providers ensure usage of best practices in cyber security for its infrastructure and provide guidelines for their clients. 

Better resiliency – elasticity of the cloud will minimise chances of outages for clients, even at peak load. 

Business and Sustainability

Resource and cost optimisation – the principles of resource pooling and supply-side savings allow established technology firms and startups alike to reap the benefits of economies of scale in the cloud. Cloud providers’ energy-efficient infrastructures have been proven to lower clients’ workload carbon footprints.

However, these should be assessed against the impediments intrinsic to a cloud-native infrastructure, and the list below identifies the most pertinent ones.

Cloud limitations and their repercussions

Infrastructure limits

The underlying hardware is neither owned nor managed by the client firm, hence:

After successful completion of resource provisioning testing, there’s no guarantee that resources will always be available for the client’s purposes: the infrastructure is not infinitely elastic and a request for additional resources may, in fact, not be satisfied at once.

There is little or no control over cloud components or the network bandwidth between them, which limits resiliency testing and confirmation testing for resiliency improvements.

For various cloud services, the underlying infrastructure is assigned randomly and can have different performance or capacity characteristics.

There is no visibility on the versions or release plans for the underlying infrastructure or software, which, from the regression testing perspective, implies a necessity of creating additional automated regression test suites factoring in such upgrades.

Sporadic access or network issues may cause a failure with input/output (I/O) operations, hence it is highly recommended to always include retry logic that, in its turn, is difficult to simulate and test.

Per-account limits and quotas

Activities in other environments within the same cloud account (or, in the worst case, within the entire cloud region) might affect performance figures. For example, a functional testing and a non-functional test environment deployed in one account will have to compete for resources.

Increased testing costs

Large volumes of transactions required for continuous end-to-end testing generate massive amounts of traffic, which can be costly. Some cloud components are charged for uptime, others – for the number of calls/runs. Depending on the configuration and number of test environments, costs should be continuously assessed and optimised.

Enterprise data warehouse cloud migration: when involve testing?

From the technology perspective, the main objective of a data warehouse migration is the correct transfer of years of historical data and all data processing jobs. Due to differences between the legacy system’s formats and configurations, and the new cloud setup, it is essential to carefully reconcile the two volumes of data. Engaging software testing expertise as early as possible in the migration (preferably, at the ideation phase) helps remediate architectural and documentation-related issues, as well as ensure a more optimal cloud setup.

Functional testing scope

During the migration, the legacy system’s direct connectivity (or connection via a dedicated component) to the on-premise database is replaced with connectivity to the cloud infrastructure via an advanced message broker.

The chosen technology stack should feature a high-throughput high-availability event streaming platform and message broker capable of storing and processing historical and real-time data. It is imperative for it to allow data consumers to be reading messages from its message broker, instead of the database or the financial system’s gateways. This helps increase the resiliency of the system, decrease the load on the database, and makes it possible to improve the data consumers’ performance.

After being submitted to the cloud-native data streaming platform by the financial system, the data goes through several storage areas (raw data, processed data, pre-generated reporting data) that have their corresponding functional testing and verification checks [see Fig. 1].

Fig 1

As the new cloud-based infrastructure passes the non-production stage, it is released into the production environment. Both the on-prem legacy systems and the new cloud-based system are connected to the financial system’s production environment, test run results are monitored live, both systems are observed and compared simultaneously. The functional testing scope should also include the historical data migration.

Non-functional testing scope

If the nature and scope of functional testing between an on-premise and a cloud system are fairly alike, those of non-functional testing differ – at times, drastically – due to the technical characteristics of cloud-native systems.

The non-functional testing and validation checks include:

Performance and Latency tests

Latency KPIs used as a reference for the cloud setup are driven by the upstream system’s KPIs for different load shapes (simulating peak demand in extreme market conditions). In the case of a downstream system (the one receiving data from the financial system), performance KPIs are identical to those of the main system. Latency tests measure the latency of a message from the moment of exiting the financial system to being recorded in the ‘pre-generated reporting data’ area in the cloud data store. 

Performance testing covers report generation. Compared to the on-premise configuration, the cloud setup tends to enable increased performance (from a next-day to a few-hours report generation).

Capacity tests

Warehouse Capacity KPIs should be in line with the financial system’s metrics. In other words, the downstream system should be able to store and process all the data expected to come from the upstream system (the financial system itself).

The following non-functional testing checks are highly dependent on the particular cloud environment configuration – a challenge also addressed in the Cloud Limitations and Their Repercussions section. 

Resiliency tests (site failover – both financial system and warehouse, network failovers, etc.)

These ensure the system is resilient to failures and can recover on its own, when needed. It should also be able to efficiently and effectively operate from a secondary geographic region in case of the failure of the primary system.

Resource provisioning and scalability.

The flexibility of cloud infrastructures allows us to benefit from saving up on idle resources/components. The downside, however, is that due to the fact that resources are shared among many participants, the service provider may run out of the available underlying hardware to use for deployment at any given time. So resource provisioning and scalability tests focus on the architectural configuration, while the provisioning at any given time is subject to the conditions of the subscription plan and/or the general availability of resources at that time.

Cloud components testing:

The checks include the deactivation and restart of separate components, while monitoring the entire system for failures (chaos simulation). These checks are limited by the access rights a technology provider can have in the cloud infrastructure. For example, the network between components cannot be down due to the cloud provider’s restrictions, so verifying system behaviour in a situation of an inter-component network connection outage would be impossible, even using third-party chaos toolkits.

Data storage testing – replication and tiering.

Daily Life Cycle – the timing and order of all schedule-dependent processes is verified.

Monitoring – checks for alerts and dashboard activity.


To read the full white paper containing our proposed cloud migration testing strategy, a sample technology stack and migration implementation, as well as a brief discussion on why cloud migrations fail, visit us at exactpro.com

Disclaimer:

The views, thoughts and opinions contained in this Focus article belong solely to the author and do not necessarily reflect the WFE’s policy position on the issue, or the WFE’s views or opinions.