Major Asset Manager Migrates 1,800 Informatica PowerCenter Mappings to Databricks

Executive Summary

A major financial services and asset management firm managing hundreds of billions in assets under management faced a forcing function: Informatica's announcement of PowerCenter 10.x end-of-standard-support, combined with a multi-year cloud transformation program centered on AWS and Databricks, created an urgent need to migrate its 1,800 PowerCenter mappings and associated workflows. The firm's data pipelines underpin fund NAV calculation, portfolio risk analytics, regulatory capital reporting (Basel III, CCAR), and investor reporting for institutional and retail clients across multiple countries. Over 9 months, MigryX parsed all PowerCenter XML exports, converted every mapping to production PySpark, reconstructed workflow execution logic in Databricks Workflows, and delivered a fully governed estate in Databricks Unity Catalog. The program produced 900,000 lines of PySpark, performance improvements of 4–9X on critical end-of-day NAV and position reconciliation pipelines, and a projected $4.3 million in two-year savings from eliminated Informatica licensing and infrastructure.

Client Overview

The client is a diversified asset management firm with operating entities across North America, Europe, and Asia-Pacific, offering equity, fixed income, multi-asset, and alternative investment strategies to institutional clients including sovereign wealth funds, pension plans, endowments, and foundations, as well as a retail mutual fund and ETF platform. Their data engineering function is responsible for delivering clean, reconciled, and auditable data to fund accounting, risk management, portfolio management, regulatory reporting, and investor relations systems — all subject to strict SLA requirements with zero tolerance for late or incorrect delivery.

The PowerCenter estate had been the firm's primary ETL platform since 2009, consolidated from three legacy ETL platforms following a major acquisition. It had accumulated 1,800 mappings organized across 12 PowerCenter folders representing distinct business domains: trade data, position management, corporate actions, pricing, fund accounting, client reporting, reference data, regulatory capital, compliance monitoring, tax reporting, operations, and risk analytics. The platform ran on a dedicated Linux infrastructure stack with an Oracle-based PowerCenter Repository Service, scheduled and monitored through PowerCenter Workflow Manager.

Business Challenge

The end-of-support timeline created a hard deadline for the migration program, but the technical complexity of the estate had initially led internal estimates to project a 24-36 month timeline — far too long given the support window. Key challenges included:

PowerCenter 10.x end-of-support: Informatica's standard support for PowerCenter 10.x was ending, with extended support available only at a premium that would increase the already-substantial licensing burden by an additional 25-30%. Neither paying the extended support premium for an extended period nor undertaking the complex upgrade to IDMC (Informatica Intelligent Data Management Cloud) aligned with the firm's strategic direction.
Complex SCD Type 2 logic: Over 340 mappings implemented Slowly Changing Dimension (SCD) Type 2 logic for reference data management — tracking historical changes to security master records, counterparty details, fund structures, and account hierarchies. PowerCenter's SCD Wizard-generated mappings used a specific pattern of lookup transformations, router transformations, and update strategy expressions that required careful semantic preservation in PySpark to maintain the continuity of historical dimension records.
Session-level configuration depth: PowerCenter sessions encode a substantial portion of runtime behavior at the session level rather than in the mapping definition: partition counts, connection pools, commit intervals, rejection file paths, DTM buffer sizes, and reader/writer thread configurations. This session-level metadata, embedded in the workflow XML, needed to be parsed and converted to equivalent Databricks cluster and job configurations — a component often overlooked in mapping-only migration approaches.
Reusable transformation objects: The repository contained 280 shared reusable transformations (Reusable Mapplets and Reusable Transformations) that were referenced across multiple mappings. Converting these correctly required resolving the complete dependency graph to ensure that updates to shared objects propagated correctly to all dependent mappings in the output code.
Expression transformation complexity: PowerCenter Expression transformations use Informatica's proprietary expression language with built-in functions covering date manipulation, string processing, conditional logic, and financial calculations. The estate used 94 distinct built-in functions, many of which have non-obvious behavioral differences (particularly around null handling, date precision, and string truncation) that required verified equivalence mapping to PySpark's expression functions.
Regulatory audit requirements: All pipelines feeding regulatory capital reports (Basel III RWA, CCAR stress testing, FINRA net capital) were subject to model validation and change management requirements mandating full documentation of transformation logic, explicit approval by a second-line risk function, and backtesting against at least 12 months of historical data before production promotion.

The MigryX Approach

MigryX structured the engagement around PowerCenter's native XML export format, which encodes the full repository object graph — sources, targets, transformations, mappings, mapplets, sessions, workflows, and worklets — in a single exportable artifact. The MigryX PowerCenter parser processes these exports to reconstruct the complete logical model of each mapping and its associated session and workflow execution context, enabling conversion fidelity that extends beyond the transformation logic to encompass the full runtime configuration.

The SCD Type 2 challenge was addressed through MigryX's semantic SCD library, which recognizes the PowerCenter SCD Wizard pattern and its common variants and converts them to Delta Lake merge operations using the MERGE INTO syntax with explicit effective date management. Delta Lake's native support for ACID transactions made it an ideal target for SCD Type 2 logic, as the merge operations execute atomically — eliminating the risk of partial updates that had occasionally caused reconciliation issues in the PowerCenter environment during network interruptions. The resulting Delta Lake tables also provided time travel capability, enabling point-in-time queries against dimension history that were previously only possible through complex historical snapshots maintained as separate tables.

Session-level configurations were parsed from the workflow XML and mapped to Databricks job cluster configurations and Databricks Workflow task-level settings. Partition counts influenced Spark executor configurations; commit intervals were replaced with Delta Lake checkpoint configurations; connection pool settings were mapped to JDBC connection properties in Databricks Secrets-backed connection objects. This session configuration fidelity ensured that the migrated jobs exhibited equivalent I/O behavior and resource utilization profiles to their PowerCenter predecessors — a critical requirement for production capacity planning validation.

Reusable transformations and mapplets were converted to Python function libraries and PySpark transformation modules that could be imported across multiple pipeline scripts, preserving the reuse architecture of the PowerCenter design. This produced a maintainable, DRY (Don't Repeat Yourself) codebase where shared logic changes could be applied centrally — actually improving on the PowerCenter model by enabling version-controlled, unit-testable shared functions. The entire migrated estate was organized in a modular Python package structure with a clear domain hierarchy mirroring the original PowerCenter folder organization, deployed to Databricks via Azure DevOps CI/CD pipelines with automated test execution on every merge.

PowerCenter Mapping Component Mapping Reference

PowerCenter Component	PySpark / Databricks Equivalent	Conversion Notes
Source Qualifier (relational)	`spark.read.jdbc()` with pushdown SQL	SQL override and filter conditions preserved exactly
Expression Transformation	`withColumn()` / `select()` with PySpark expressions	94 Informatica built-in functions mapped with null-handling verified
Aggregator Transformation	`groupBy().agg()`	Group By ports mapped to groupBy keys; aggregate ports to agg functions
Joiner Transformation	`DataFrame.join()`	All join types (normal, master outer, detail outer, full outer) supported
Lookup Transformation	Broadcast join or Delta Lake lookup	Connected vs. Unconnected lookup semantics preserved; caching mapped to broadcast
Router Transformation	`DataFrame.filter()` per output group	Multiple output groups emitted as separate filtered DataFrames
Update Strategy Transformation	Delta Lake `MERGE INTO` with DD_INSERT/DD_UPDATE/DD_DELETE flags	Update strategy expression converted to merge condition predicates
SCD Wizard (Type 2)	Delta Lake `MERGE INTO` with effective/expiry date columns	Time travel queries replace historical snapshot pattern
Reusable Transformation	Python module function (importable)	Shared logic centralized in versioned Python package
Mapplet	Python function or PySpark pipeline function	Input/output groups mapped to function parameters and return values
PowerCenter Workflow	Databricks Workflow task graph	Task dependencies, failure handling, and event triggers preserved
Session (with partition config)	Databricks Job cluster config + task settings	Partition count, commit interval, buffer size translated to Spark config

Key Migration Highlights
All 1,800 PowerCenter mappings were converted and validated within the 9-month program timeline, enabling the firm to exit PowerCenter 10.x before the standard support end date without incurring extended support fees.
MigryX's session configuration parser extracted and converted over 12,000 session-level parameters across the 1,800 workflows, ensuring full runtime configuration fidelity — a dimension of migration that manual rewrite approaches consistently miss.
All 340 SCD Type 2 mappings were converted to Delta Lake MERGE operations with full audit trail, providing the first true time-travel capability for dimension history queries and simplifying point-in-time data reconstruction for regulatory inquiries.
End-of-day NAV calculation pipeline — the firm's most critical operational process — was reduced from a 3.5-hour execution window to under 30 minutes, providing fund accounting teams with a 2-hour earlier availability of final NAV figures for client reporting and trade settlement.
280 reusable transformations and mapplets were converted to a versioned Python shared library, improving maintainability and enabling unit testing of core transformation logic for the first time in the platform's history.
Regulatory pipeline validation was completed ahead of schedule, with all 12-month backtesting validation runs for Basel III and CCAR pipelines completed and signed off by the Model Risk function within 7 months of program initiation.

Security & Compliance

The client operates under an exceptionally demanding regulatory framework spanning SEC and FINRA requirements in the US, FCA rules in the UK, ESMA directives across the EU, SFC requirements in Hong Kong, and MAS requirements in Singapore. The migration program was subject to formal change management governance under the firm's Model Risk Management (MRM) policy for all pipelines with regulatory capital implications.

Model Risk Management: All 127 mappings classified as Model inputs (feeding Basel III RWA, CCAR, or FINRA net capital calculations) were subject to the firm's full MRM validation process, including independent technical review, backtesting against 12 months of historical data, and second-line sign-off before production promotion.
Data classification and Unity Catalog: All 1,800 migrated pipeline outputs were classified according to the firm's data sensitivity taxonomy (public, internal, confidential, restricted) and Unity Catalog column-level security policies were applied accordingly, with restricted data (MNPI, client PII) accessible only to authorized roles.
AWS private networking: All migration tooling operated within the client's AWS Organization boundary using VPC-isolated Databricks workspaces with no internet egress; PowerCenter XML exports were transferred via AWS Direct Connect from on-premises environments.
Secrets management: All database credentials and API keys were migrated from PowerCenter's proprietary domain object store to Databricks Secrets backed by AWS Secrets Manager, with automated rotation policies applied.
Immutable lineage: Unity Catalog's automatic lineage capture provides complete column-level data provenance for all regulatory reporting pipelines, satisfying SEC Rule 17a-4 record-keeping requirements for data processing audit trails.

Results & Business Impact

The 9-month timeline significantly outperformed the client's internal estimate of 24–36 months for a manual rewrite approach. The accelerated timeline was directly attributable to MigryX's automated conversion coverage, which reduced the estimated engineering labor from 84,000 person-hours (the internal estimate for a manual rewrite) to 9,200 person-hours including MigryX-assisted engineering review, validation, and knowledge transfer. This 9X labor reduction translated directly into the compressed 9-month delivery timeline that allowed the client to avoid extended support fees entirely.

1,800

PowerCenter Mappings Migrated

900K

Lines of PySpark Generated

4–9X

Avg. Pipeline Performance Gain

$4.3M

Projected Savings Over 2 Years

9 Mo

End-to-End Migration Duration

24 min

NAV Batch (was 3.5 hrs)

The $4.3 million two-year savings projection is based on $2.8 million in eliminated Informatica PowerCenter licensing and repository infrastructure costs, $900K in reduced operational support labor (PowerCenter administration required a dedicated three-person team that has been redeployed to Databricks development), and $600K in avoided Oracle database costs for the PowerCenter Repository Service. The firm's CFO formally recognized the program as delivering ROI in excess of 4X the program investment within the two-year measurement horizon, making it the highest-returning infrastructure modernization initiative in the firm's data engineering history.

"We had been told that migrating an Informatica estate of our size and complexity would take two to three years minimum. MigryX delivered in nine months — and the quality of the output exceeded what we expected. The SCD Type 2 conversions to Delta Lake were particularly impressive: we now have time travel on all our dimension tables, which is something we'd wanted for years but could never justify rebuilding from scratch. Our fund accounting team had NAV figures two hours earlier on the first day of cutover. That was an immediate, visible win for the business."

— Managing Director, Data Engineering & Architecture, Major Asset Management Firm

Ready to Modernize Your Informatica PowerCenter Estate?

See how MigryX can accelerate your migration to Databricks with parser-driven automation. Full session-config fidelity. Delta Lake SCD. Automated validation.

Explore Databricks Migration →

Major Asset Manager Exits Informatica PowerCenter 10.x, Migrates 1,800 Mappings to Databricks in 9 Months