Only advice I can give is don't lock in to that ecosystem. You'll be married to what is effect a proprietary wrapper around Apache YARN, beam and Spark. You'll also pay a premium.
If you build it directly against the Apache products listed, you can move it anywhere; other clouds, on prem, etc. Spend some time and host it yourself, save money, get more control.
If you build it on azure data factory, moving involves a ground up rebuild because all your work will be done through MS abstractions. Very little of it will be reusable.
That was I am afraid of. Azure Data Factory is too good at the locking in with its gui.
I was thinking of moving all the data on-site to cloud storage like blob storage first. Then I will do transformation with data factory operation and finally transfer it to Azure Sql Database.
What do you think of my approach?
I am tasked with creating a single database (data warehouse) for analysis purpose based on multiple database created by different teams.
I cannot think of any rollback model other than scraping the whole data warehouse and starting over again if I make mistake in transformation.