I’m occasionally asked general questions about migrating to AWS, so I thought I’d put together a blog to describe the processes and resources involved in not just migrating a workload to AWS, but also establishing a presence on AWS that can provide success and sustained growth as your needs evolve.
As a starting point, AWS has a program called the Migration Acceleration Program, or MAP, which provides prescriptive guidance, broken out into three phases (Asses, Mobilize, Migrate and Modernize) intended to help organizations prepare, plan, and successfully execute a workload migration while also driving an adoption of AWS that provides value to the business.
To elaborate on each phase, the first phase of the MAP is intended to assess an organizations current state to not only be successful in the migration of one or more of these workloads but to assess their organizational readiness to support and evolve these workloads long term. AWS makes available a tool, Migration Readiness Assessment, or MRA, to help organize these efforts. Building on the efforts of the assess phase, the mobilize phase puts into place the supporting capabilities that workload migration efforts will rely on, which includes both organizational enablement as well as technical solutions such as Control Tower (which provides a “Landing Zone” for workloads to operate in). Finally, in the migrate and modernize phase, previously evaluated workloads are migrated to the Landing Zone using the previously defined process (more on that later), but a key point here is that migrate also includes modernize. This is a key point as a workloads initial position post migration may be, for example, rehosted, however, as workloads and capabilities on AWS evolve, it may not be its final state as that workload can be replatformed when conditions are favorable to do so.
A summary of the end to-end migration process from blank-slate to continuous improvement might look like this:
- Identify current state of organization, supporting infrastructure and workloads including definition of goals, timeframes, budgets, and success measures via the Migration Readiness Assessment
- Create initial technical and enablement plans
- Execute on technical plan by establishing a Landing Zone via Control Tower, including considerations for Enterprise Integration (e.g. Network, LDAP, PKI, IAM, SAML, Security, Operations, Architecture)
- Execute on enablement plans
- Begin workload evaluation process
- Inventory (automatic or manual)
- Dependency Mapping/Resource Scheduling
- SME Interviews
- Current State (Configured versus Actual) and Proposed State (e.g. are resources currently under or over provisioned), with corresponding budget
- Migration approach or the “7 Rs Migration Strategy”
- Data Volume versus Data throughput (e.g. can the connectivity to AWS sustain the required data throughput for the amount of data to be migrated in the allotted project timeframe)
- Perform test migration of workload with pre-staged data
- Validate test migration results
- Remediation/mitigation of findings from test migration
- Perform actual migration
It goes without saying that workloads have a near infinite variation of complexity and size and with increases in complexity and size, tools and automation, in addition to people and processes can play a supporting role in increasing throughput of migration exercises and decreasing room for human error. Tools, simply put, can be categorized in three main areas:
- Discovery, Inventory, Dependency Mapping
- May be agent based, or agentless, and based on specific implementation may be high level or low level details. Given the details observed may also provide recommended sizing
- Planning, Orchestration
- Organizing schedule, dependencies but also recommended sizing/budgeting based on current provisioned and current observed utilization (CPU, RAM, Storage, IO, Network).
- Migration of data, operating system or other key components of the workload.
A variety of tools exist in the market – some AWS native, some AWS acquired, and some simply third party solutions. A non exhaustive list is below:
- Discovery, Inventory, Dependency Mapping
- CloudChomp (typically agent-less)
- Datadog (typically agent based)
- New Relic (typically agent based)
- Flexera/Risc (supports both agent and agent based approaches)
- Cloudamize (typically agent based)
- Application Discovery Service (AWS)
- Migration Evaluator – formerly TSO Logic, now AWS
- Migration Hub – AWS (which not only orchestrates related AWS services as well as third party services)
- Application Migration Service (MGN) – formerly Cloud Endure, now AWS
- Server Migration Service
- Database Migration Service
- VMWare Cloud
One final consideration with regard to the “how” of data, operating system, etc. migration can be summarized by these questions:
- How much data needs migrated?
- How much time is available to migrate said data?
- What is that data’s rate of change?
- How much, if any, downtime can be afforded for them migration of this workload?
- What is the realistic sustainable throughput? Take into account storage IO, network interface, network switching/firewalling/routing and Internet bandwidth. This would include variances due to scheduled or event driven contention with other resources
These considerations will drive the data migration approach you take – whether it’s entirely over the wire or pre-seeded leveraging one of AWS Snow Family Devices, which are essentially hardened storage devices designed to be shipped to a location the data exists and transferred (and optionally processed) using local, high throughput facilities before being returned to AWS for ingestion into S3 (or other final resting places for the data).
A non-exhaustive list of these services includes the following:
- Snow Family
- Data Sync
- Transfer for S3
- Accelerated Endpoints for S3
- Storage Gateway
As referenced above, the workload migration approach may not be the final state of the workload configuration, and as a consideration for the intermediate and final state is the lifecycle management of the workload, which includes data and operational recoverability on AWS post migration. AWS Well Architected provides excellent guidance on evaluating workloads, and providing recommendations in addressing the findings of those evaluations, as it relates to 6 pillers, including reliability, in which data and operational recoverability capabilities will play a role.
From a disaster recovery perspective, the main driver of the solution for any given server would be around the businesses requirements, specifically RTO and RPO – how much time is acceptable for recovery from a failure of the service/services the server supports and if there is data loss how much is acceptable.
From there you should be able to determine which servers are in scope for active/active or active/passive replication, and which are in scope for other approaches (snapshots, copying snapshots to another region, etc.).
This is also driver in the AZ vs. Region conversation, but need to keep in mind that any solution is going to balance complexity versus reliability – so a default stance may be “replicate everything to another region), but that means a new region, VPC, connectivity, dependent services, etc. is required to support. AZ, on the other hand simplifies things with some added risk (e.g. full region failure of a service, human error, etc.). For multi-region there is also the conversation around which regions – for Workloads suited in the US East Coast, Ohio has advantages over California or Oregon – in terms of data egress (a cost to consider for anything Multi AZ or Multi Region) where it’s equivalent to cross AZ and latency (Ohio is low enough for synchronous replication).
There are also two modes of failure we want to consider as well – lack of service availability and actual data loss. One could be the outage of a service, instance, AZ, AWS Service, Region, etc. while the other could be a bad keystroke, so Snapshot, Snapshot Retention, Snapshot versioning should be the minimum for all in scope server storage with other solutions (third party replication, native service backups, native service replication) building on top.
Snapshot management has gotten more integrated into AWS through a variety of new services – Lifecycle Management, Backup and Organization Backup, but not all of them offer the same options – Snapshots by default are not necessarily crash recoverable, so something like a database, writes need paused when the Snapshot is made – some of these services offer that option, others do not.
There are also native approaches to replication (databases for example, each have their own style of replication), which offer some advantages over a general data replication solution.
Finally, on this subject, Disaster recovery, business continuity, etc. is not a set it and forget it situation – this will need to be periodically tested to validate procedures + functionality.
In conclusion, migration of a workload, especially the first one, is benefitted through up-front, long term planning, which includes technical and organizational planning, and consideration for the evolution of workloads and AWS services.