performance - C# Comparing values in 2 DataTables -
i'm processing 2 datatables:
- ssfe: contains values want find
- ffe: larger, smaller or equally large ssfe, not contain every value of ssfe
the values need match between these tables integers, both tables sorted small large. idea start searching on first item in ffe, start looping through ssfe, , when find match -> remember current index -> save match -> select next item ffe , continue previous index.
also ffe can contain integers, can contain strings, why cast values string , compare these.
i made code, takes, time. take minute compare ssfe(1.000 items) ffe(127.000) items.
int whereami = 0; bool firstiteration = true; (int = 0; < ffedata.rows.count - 1; i++) { (int j = 0; j < ssfedata.rows.count - 1; j++) { if (firstiteration) { j = whereami; firstiteration = false; } if (ssfedata.rows[j][0] == ffedata.rows[i][0].tostring()) { found++; whereami = j; firstiteration = true; break; } } }
i'm storing how many occurences have found testing. in example find 490 matches, not relevant.
any suggestions great!
could try datarelation class. creates foreign key / join between 2 datatables in dataset.
using system.data; using system.text; public int getmatches(datatable table1, datatable table2) { dataset set = new dataset(); //wrap tables in dataset. set.tables.add(table1); set.tables.add(table2); //creates foreignkey join between 2 tables. //table1 parent. table2 child. datarelation relation = new datarelation("idjoin", table1.columns[0], table2.columns[0], false); //have dataset perform join. set.relations.add(relation); int found = 0; //loop through table1 without using linq. for(int = 0; < table1.rows.count; i++) { //if rows in table2 have same id current row in table1 if (table1.rows[i].getchildrows(relation).length > 0) { //add counter found++; //for debugging, proof of match: //get id's matched. string id1 = table1.rows[i][0].tostring(); string id2 = table1.rows[i].getchildrows(relation)[0][0].tostring(); } } return found; }
i randomly populated 2 non-indexed tables nvarchar(2) strings, 10,000 rows each. match took sub 1 second, including time spent populating tables. between 3500 , 4000 matches run on average.
however, major caveat datacolumns being matched must same data type. if both columns string, or @ least integers stored string, work.
but if 1 column integer, have add new column, , store integers string in column first. string translations add hefty amount of time.
another option uploading tables database , performing query. large upload take few seconds, query under 1 second well. still better 60 seconds+.
Comments
Post a Comment