7

So I did this https://devrant.io/rants/797965/... which works fine until medium sized data.

However for large data the ETL pegs a 6 core Xeon (2.2GHz) with 50GB of ram. Because of it ends up doing six threaded compares, so 12 different data sets. Other than "pull less data", any tips?

Code (C#) is basically a Linq multi column join between two DataTables and when the compared columns don't match it returns as a var which is turned into a third DataTable to be SqlBulk loaded into the DB.

Table1 is external API return data (no windowing) and Table2 is from our DW.

Comments
  • 0
    Which DBMS?
  • 0
    @iAmNaN SQL Server 2016. Getting the data from there is easy. It's doing the compares on the data from the API that is killing it.
  • 0
    @bkwilliams can you pull the data from the API into SQL Server, into a temp table, and let SQL Server do the work? I'm assuming that SQL is your DW.
  • 0
    @iAmNaN that was going to be my next test. I don't know if I can bulk load into a temp table, plus there would have to be X of them, one for each thread.
  • 0
    And no way to reduce the number of threads, right?
  • 0
    @iAmNaN no it's based on the number of user's logins to the API that I have to pull. A control table sets the active users and I automate pulling the data for them. (API doesn't keep history, my DW does).
  • 1
    @bkwilliams Can you write the API data to flat files, and use an ETL process to load to the DB? I've seen that as one option. We extract data from a Mainframe database (ADABAS) to flat files, and then use ETL to clean and load the data to Oracle.
Add Comment