Migrating data solutions to the cloud: a checklist. Part 2: Discovery

This post is a continuation from Part 1: Preplanning and Evaluation in a 9 part series. If you want to download the full checklist or slides without all the wordy-word stuff: you can find it in this Github repository. (The checklist has wordy-word stuff. No getting around that.)

Topics covered today:

  • Digital Estate
  • Data Management
  • Data Lineage
  • Pilot Project

Digital Estate

Understanding your digital estate at the beginning of your project will help you determine what to assess and migrate down the road. Even if you already think you know all the things you need to migrate, it’s helpful to check how all of the things may be connected. You need to identify your infrastructure, applications, and all the dependencies. You don’t want any surprises! Don’t just rely on old documentation.

Azure Migrate has a Discovery and Assessment tool that can assist in this task, but there are certainly many other ways to acquire this information. You may have other 3rd party tools or internal processes that already gather this information for you. Just make sure that it is UTD. Personally, I really like the free pre-Azure Migrate solution: Microsoft Assessment and Planning Toolkit as it dumps everything in excel sheets that Admins and Management tend to like to see. But the visual display of Migrate (and ALL the additional tools) is pretty fantastic.

Screenshot of MAP toolkit database scenarios.
Some options available in the MAP Toolkit

Whatever tool you use, from a database perspective, you want to know things like what database systems are in your environment, what version and edition they are on, how many databases may be on an instance, what are the database names, file sizes, statuses, users, configurations, and other various database metadata. You are going to want to know some performance metric results and additional server details. You are going to want to know the various components that are installed on your servers, details about those components, and how they are used. Are you REALLY using those SSRS and SSAS components and if so, how?

Lastly, you want to make sure you know all of your relationships between applications, instances, database objects, and processes. It’s no fun to find out later that you had a database with hard-coded servers in some stored procedures or unknown linked server requirements. Or a SQL job that PBI Report Server created for each data refresh.

The Key Take-Aways here:

1.) Identify the infrastructure : things like servers

2.) Identify what apps do they use – this includes all your SQL server apps!

3.) And identify dependencies they may have: Internally and across servers. Don’t forget to include things like ports/networking

Data Management

Now is the time to find out what documentation you have about your data (and what you need to get). Having this information is essential if you determine you need to move things in parts or if you have overlap in data that might be potentially consolidated. This will help you down the road when we get into some architecture designs with the 5 Rs of rationalization. Our focus here is on having a data dictionary, a business glossary, a data catalog, and classifying your data.

A quick summary of these terms: a data dictionary helps you to understand and trust data in databases better, a business glossary provides a common language for the organization when it comes to business concepts and metrics,  a data catalog helps you to find, understand, trust and collaborate on data, and data classification groups your data elements to make it easier to sort, retrieve, and store.

Why are these things important for migration? First off, they are important just from a data governance standpoint. But more than that, knowing this information up front can save you a lot of headaches down the road. You may have business requirements for some of your data to be labeled in a security context. Maybe you are dealing with highly classified government data, health care data, or HR data. Or you may find you have data type mismatches? And data catalogs often review hidden dependencies that you may not have otherwise known.

All is not lost if you don’t have all of this. Azure has some internal tools like Purview to assist with this, and there are plenty of 3rd party tools. If you are like me, you already carry a script toolbox from the lifetime of your career (some of those scripts from 20 years ago still work!) that you can easily use. Apart from the Business Glossary, there are so many free options and scripts out there that this should not be a showstopper for you. For the Business Glossary – you are going to have to go to the source – your subject matter experts (SMEs).

Data Lineage

In addition to the previous items we mentioned for data management, I want to call out data lineage specifically.

Data lineage gives you insight into how your data flows. It helps  you understand how your data is connected and the impact of how changes to your data, processes, and structure, affects the flow and quality of your data. KNOW YOUR DATA FLOW. Find out where your data comes from, how it travels, the place(s) it lands, and ultimately, where it else it goes.

There are a lot of tools that will help you with data lineage; with various levels of sophistication. Long gone are the days where you must shift through excel sheets to figure it all out. That’s why graphical tools like Purview are really exciting for me. [Note: from initial insights into Purview costs once it’s past the preview stage – it gets pretty pricey, fast.] This is an image of Azure Purview and I wanted to show how granular it can get at the column level and how it travels through various processes and databases.

The column level feature is really really nice. It’s not necessary at this stage, but it certainly is helpful to you at the testing and troubleshooting phases. What you really need with your data lineage at this stage – and you can still see it in this graph – is how your dataflows between resources. Because this is a great way to discover things you may not be aware of in your data flow process that you need to pull into your migration plan.

What also can data lineage help with? Reporting considerations. Knowing what can break in a report, if you change at at the source is invaluable. While getting a big picture of what reports, models, applications may be impacted after a migration help circumvent some nasty surprises.

Pilot Project

If you haven’t moved anything to the cloud previously that is related to your infrastructure, then consider having a much smaller pilot project. One that will get you a feel for all of these steps but has a lower risk than your overall project.

What items do you look for in a pilot project?

Maybe you have a database that is only used for a small app that is low risk if the migration doesn’t go as expected. Try to keep your pilot project to applications with just a few dependencies. The goal of this is to a.) help you understand the process and b.) get you a quick win that you can show to stakeholders.

You want one that is low-risk, that is small enough to manage easily, but still large enough with a long enough duration to give you a good understanding of the processes involved. Besides size and duration, the criticality of your project is important. You want to incorporate a visible win that is important to your company that supports making bigger moves.

Finally, ff you’ve already done this previously, then this is when you review what you’ve learned from your previous pilot project. What were gotchas? What went really well? What is easily repeatable and what do you need to get down on paper?

Welp, we’ve come to the end of part 2. Feel free to hit me up with items you think I’ve missed or you want more clarification on. Next week is a much needed break for me, so [probably] no updates from me. Hope everyone has a wonderful Mother’s Day!

From,

DataWorkMom

Migrating data solutions to the cloud: a checklist Part 1 of 9

If you are a follower of me in the various places or a reader of this blog, you know that recently I presented the session Migrating data solutions to the cloud: a checklist at SQLBits 2023. As promised in the weekly wrap-up from April 7th, here is the first installment of a more in depth dive to that session. The session had 9 parts to it and so I thought that would make for nice chunks to consume in blog posts. I’ll add links to each one so it will be easy to navigate once all of the posts are up.

The Parts are broken down as follows:

  1. Pre-Planning and Evaluation
  2. Discovery
  3. Assess
  4. Architecture
  5. Costs
  6. Migrate
  7. Testing
  8. Post – Migration
  9. Resources and Closing

This post will focus on the first item in the list Pre-Planning and Evaluation.

Let’s get right to it! Ok ok. I know you are probably chomping at the bit wanting to immediately get into the technical parts – I know how it goes – but that is not where you start.


King from Alice;s Adventures in Wonderland

“Begin at the beginning,” the King said, very gravely, “and go on till you come to the end: then stop.”

― Lewis Carroll, Alice’s Adventures in Wonderland


Goals

What is your company’s goal of moving to the cloud?  Some examples are:

  •   Reduction in CapEx (capital expenditures)  
  • Lower reliance of complex and/or end-of-support technologies
  • Optimization of internal operations
  • Reduction in vendor or technical complexity
  • Preparation for new technical capabilities

Modernization? Cost-savings? Multiple things? Maybe your current infrastructure is at end-of-life, and it just really makes sense to modernize it and move it over to the cloud before you are dead in the water. Or maybe your goal is to reduce downtime for managing infrastructure and upgrades .

What is the problem you are trying to solve? Is there a business requirement driving this or maybe something else? Make sure you understand what the problems are you are trying to solve and if your goals actually address those problems. That is a business decision that has to be determined WITH your technical team and not necessarily BY your technical team. I’m a technical person, and I’m going to be the first person drawn by the shiny new object – which often is not the best thing for the company strategy.

Image matching goals to problems trying to solve.

Ultimately your problems can drive a portion of your goals, but you may have additional objectives. These items need to be determined upfront so the choices you make during the process align. They create value streams for your organization. Higher ups love to hear about value streams!

Once you have your objectives – you can create a plan and KPIs towards those objectives and then show how you met those at the end. This is our objective, this is the state at the beginning, this is the state at the end. Look how far we’ve come!!!

It’s hard to have project success if you don’t define what your objectives are. This is a project, and if you’ve been working on projects for a while, you know how easy it is to go sideways and lose sight of your goals. Sometimes those goals change a bit, and that’s ok, but ultimately your success will be measured by if you achieve clearly defined goals that you can apply metrics towards.

Support

Blocks that contain the text "Executive Support" and "Stakeholder buy-in".

Executive Support

Many times, you will go into a migration project that is being driven by the top level. Maybe your CEO went to a conference or read some important articles – and if so, you are ahead of the game (well, until you have to set expectations properly). But if you don’t already have executive support, then you need to identify an executive sponsor. Someone who is high up and can lead a cultural shift within your organization. A Champion. <insert bugle sounds here>

Stakeholder Buy In.

You need to Identify and involve key stakeholders early in the process. On both the business and IT side. Involvement and communication is key.

People don’t like change. Try telling a bunch of financial analysts that have been doing their job for over 20 years and have lived through multiple technology shifts – try telling them they won’t have Excel anymore. I promise you; it won’t be pretty. Rather than focus on what you are taking away – focus on why the change is needed, and what it gives them. How can you get their buy-in for your project, while still being honest.

If you don’t have that executive support or stakeholder support, then you may need to a little digging. What are your company’s core values? What is the 5- and 10-year plan for business initiatives? How does moving to the cloud address these?  At the department level – what are primary goals and concerns from a technical perspective. How can moving to the cloud help this? Everyone’s answer will not be the same but look at how you can align key stakeholder needs with the goals of the C-Suite. Building a business case that addresses top concerns and how the cloud will help – is ideal for this situation.

Teams

The next thing we need to consider is what we will do in-house versus what we may pull in a partner for.

Internal Team:
-Determine SME Req
-Identify key members
-Assign roles
-Assess skills
-Train

For your internal team – first determine the different areas you will need a SME (subject matter expert) for, and a good idea of what they need to know. Second, identify potential key people you may have in-house to fulfill these roles. Ideally you will have people in your organization from across many teams  that you can assign to roles. Once you have assigned roles to people, you need to assess their skills. This will allow you to see what gaps you need to fill and then plan for how your org will both fill and monitor those domain gaps?

When possible, identify potential team members that either have the skills or can upskill. Build a skills readiness plan. Investing in your people is far cheaper than hiring outside help. Consultants are great – I know, I’ve been one, but there is no replacement for having in-house knowledge, particularly once those consultants have finished and walked out the door – potentially without much knowledge transfer. The people that will be maintaining and growing your systems need to have the skills.

That said, in some cases, you may want to hire outside help. Maybe many of your team members are really spread thin, or the upskill process doesn’t match the timeline – then definitely bring in consultants.

Common things for partners to be involved in:

  • Strategy: Help with defining all things such as business strategy, business cases, and technology strategies
  • Plan: Help with discovery tasks, such as assessment of the digital estate and development of a cloud adoption plan.
  • Ready: Support with landing zone tasks.
  • Migrate: Support or execute all direct migration tasks
  • Innovate: Support changes in architecture
  • Govern: Assist with tasks related to governance
  • Manage: Help with ongoing management of post-migration tasks

Along the lines of consultants – help them help you. Does your company have standards or best practices that are specific to your industry or workflow? Share that with them. Share any and all documentation that is relevant to your migration and environment.

If your company is able – consider Creating a board for key team members to interact and meet regularly: like a Cloud Center of Excellence (CCoE). Key members are chosen from each  IT areas to meet at a specified cadence to discuss all things Azure including identifying domain gaps. Those key members not only gain knowledge about technical aspects, but also get insight into projects that are going on through the different areas of the company. And they can take back that information to their team with a level of consistency. Which is super important when you are wanting establish best practices and standardize policies across teams.

Documentation

Now we need to find out what we have regarding documentation.

From the technical side you need to assess at what your current infrastructure looks like. What type of documentation do you have for your data estate and infrastructure?? Where is your data? What does your data require for things like storage and memory? What are all the components that connect to what you want to move? If you don’t know what your current infrastructure looks like, you need to determine how and who will document that for you.

On the business side: What are business requirements for your systems?  Do you have regulatory requirements? SLAs mandating certain uptimes? What about other Business requirement docs for security, compliance, backup, Recovery Point Objectives (RPO) and Recovery Time Objectives (RTO)?

And finally – have you previously tried to do a cloud migration? What documentation is left over from that? Is it current? What were some of the successes and some of the failures? Any lessons learned?

You don’t have to know everything at this point, but you do need to find out what you have and what you don’t have. And if we are perfectly honest – what may be out-of-date. This is where you need an Enterprise Architect and some in-depth meetings with the different departments. Communication is key to getting the documentation and finding what you have, what you don’t have, and what you may need down the line.

Welp, that concludes Part 1: Pre-Planning and Evaluation. What would you add to this list? Have I missed any additional parts that you’d like to see in the over all series? I love to get feedback to consider so feel free to reach out.

Stay tunned for Part 2: Discovery.

PBI: When you can’t change from a live connection

First off, I want to say I never intended to start a lot of my posts about Power BI. There are plenty of experts out there and I am merely an accidental PBI Admin and advocate for our Power BI platform. So why should I be writing about a topic I’m constantly learning new things about? Well, here’s the thing: when you are in the middle of learning things and you don’t find a quick and easy answer, that may be a good time to say “hey, maybe I should blog about it”.

And I think it was Kimberly Tripp’s keynote at PASS Data Community Summit 2022 that reminded me that it’s 100% ok to write about things other people have written about. In fact, several people there mentioned this same thing. Share what you have learned – it’s quite possible you bring a different perspective that will help someone along the way. And if all else fails, you may forget the solution and years down the road google it only to find your own post. (#LifeGoals)

Rihanna with ponytail - text: "Yup thats me"

Now that we have that out of the way, let’s talk about WHEN YOU CAN’T CHANGE FROM A LIVE CONNECTION in Power BI.

Recently, I’ve been advocating that we consolidate our reporting systems. We have a ton and with an extremely small team, it’s a huge headache to manage them all. Admin-ing reports are only supposed to be a small portion of my week-to-week tasks. (Hint: it’s not.) Plus, some of our reporting systems are just not that good. I won’t list all the different systems, but we are lucky enough to have Power BI as part of our stack and as such, I’m wanting to move as much as we can to our underutilized Power BI service. This includes our Power BI Report Server reports.

Since we needed to make some changes to some reports we had on Power BI RS, and wanted to reap some of the benefits of Power BI service with them, we decided these reports would be a good test group to move over. The changes were made in the newer version of PBI Desktop (instead of the older version of desktop we have to use with PBI RS) and we were ready to load them up to our Power BI service. This is where it got a little sticky.

Sticky buns with sticky spoon next to it.

sticky buns… mmmmmmm

FOCUS.

When I uploaded a new report, I remembered it was going to create a “dataset” with the report. Even if the lineage showed the dataset went to a live connection to a database. (In the case of our test case reports, they were connected to a SSAS database). Note, these datasets don’t seem to actually take any space when connected to a live connection, hence my quotes.

Shows a newly created dataset when you upload a new report.

A dataset for every report? Given the number of reports we needed to move over, all with the same live connection, this didn’t make sense to me. Even if the dataset was just a passthrough. (Did I mention how I really hate unnecessary objects in my view? It eats away at me in a way I really can’t describe.)

Diagram showing current state of a dataset for each report, even when connected to 1 data source.

So I thought – “why not just create 1 live connection dataset and have all the reports in the workspace connect to that?” (We also use shared dataset workspaces, and if you are using that method, this still applies. In this case I wanted to use deployment pipelines, and as of the writing of this post, that wasn’t doable with multiple workspaces per environment .) I built my new dataset, uploaded it, and prepared to connect my report to the new one.

SCREECH. Nope. After all that, when I opened my shiny new report updated RS report in PBI Desktop, I didn’t even get the option to change the connection.

Report showing live connection

Darn it. I couldn’t switch it to the new dataset. My only option was the original live connection. I couldn’t even attempt to add another data source.

I grudgingly loaded the report back up to PBI Service and now I had to look at 2 datasets while I noodled. Blerg. (I’ll mentioned again how much I hate a cluttered view.) Technically I could delete my test case dataset, but I wasn’t ready to give up yet. An idea occurred to me: let me download the newly uploaded file from the PBI Service, because logically it had created a new dataset to use when I uploaded it and the lineage showed it in the path.

I opened the report in the service, choose File–>Download this file, and then selected the “A copy of your report with a live connection to data online (.pbix)” option. (Actually I tried both, but the other way was a fail.)

Then I opened it in PBI Desktop… Meh. It looked the same.

Wait a minute! What is that at the bottom??? “Connected live to the Power BI dataset”!

I check the data source settings again under Transform data – BINGO! Now I had the option to switch the dataset from my Power BI service. Which I happily did.

After this was done, I saved and reloaded the report to PBI Service and checked the data lineage of the dataset – It was connected to the new dataset! YAY!!!!!! Since all the reports in this workspace used the same SSAS database, I could connect them all to the same singular dataset. Bonus that when it came time to setup the deployment pipeline, I only needed to change the data source rules for one dataset in each proceeding environment.

Some may say this is overly obsessive. Maybe. But when you think along the lines of maintenance or verifying data lineage, I now only needed to check 1 dataset instead of going through a whole list. That can be a timesaver when troubleshooting problems.

AND it’s prettier. AND there may be other reasons you want to change that connection and this should help along the way. AND there was another big reason I was going to list, but now I’m thinking of sticky buns again so we will just leave it right there. It’s time for lunch.