bkwilliams

8y

So I did this https://devrant.io/rants/797965/... which works fine until medium sized data.

However for large data the ETL pegs a 6 core Xeon (2.2GHz) with 50GB of ram. Because of it ends up doing six threaded compares, so 12 different data sets. Other than "pull less data", any tips?

Code (C#) is basically a Linq multi column join between two DataTables and when the compared columns don't match it returns as a var which is turned into a third DataTable to be SqlBulk loaded into the DB.

Table1 is external API return data (no windowing) and Table2 is from our DW.

undefined

ask a dev

peg a server

scalability

Ranter

Comments

0

iAmNaN

6750

8y

Which DBMS?
0

bkwilliams

7184

8y

@iAmNaN SQL Server 2016. Getting the data from there is easy. It's doing the compares on the data from the API that is killing it.
0

iAmNaN

6750

8y

@bkwilliams can you pull the data from the API into SQL Server, into a temp table, and let SQL Server do the work? I'm assuming that SQL is your DW.
0

bkwilliams

7184

8y

@iAmNaN that was going to be my next test. I don't know if I can bulk load into a temp table, plus there would have to be X of them, one for each thread.
0

iAmNaN

6750

8y

And no way to reduce the number of threads, right?
0

bkwilliams

7184

8y

@iAmNaN no it's based on the number of user's logins to the API that I have to pull. A control table sets the active users and I automate pulling the data for them. (API doesn't keep history, my DW does).
1

iAmNaN

6750

8y

@bkwilliams Can you write the API data to flat files, and use an ETL process to load to the DB? I've seen that as one option. We extract data from a Mainframe database (ADABAS) to flat files, and then use ETL to clean and load the data to Oracle.

Related Rants

devRant © 2021 Hexical Labs LLC
Privacy Policy | Terms of Service