python - How to speed up bulk insert to MS SQL Server from CSV using pyodbc -
below code i'd with. having run on 1,300,000 rows meaning takes 40 minutes insert ~300,000 rows.
i figure bulk insert route go speed up? or because i'm iterating on rows via for data in reader:
portion?
#opens prepped csv file open (os.path.join(newpath,outfile), 'r') f: #hooks csv reader file reader = csv.reader(f) #pulls out columns (which match sql table) columns = next(reader) #trims spaces columns = [x.strip(' ') x in columns] #starts sql statement query = 'bulk insert spikedata123({0}) values ({1})' #puts column names in sql query 'query' query = query.format(','.join(columns), ','.join('?' * len(columns))) print 'query is: %s' % query #starts curser cnxn (which works) cursor = cnxn.cursor() #uploads row data in reader: cursor.execute(query, data) cursor.commit()
i dynamically picking column headers on purpose (as create pythonic code possible).
spikedata123 table name.
bulk insert much faster reading source file row-by-row , doing regular insert each row. however, both bulk insert , bcp have significant limitation regarding csv files in cannot handle text qualifiers (ref: here). is, if csv file not have qualified text strings in ...
1,gord thompson,2015-04-15 2,bob loblaw,2015-04-07
... can bulk insert it, if contains text qualifiers (because text values contains commas) ...
1,"thompson, gord",2015-04-15 2,"loblaw, bob",2015-04-07
... bulk insert cannot handle it. still, might faster overall pre-process such csv file pipe-delimited file ...
1|thompson, gord|2015-04-15 2|loblaw, bob|2015-04-07
... or tab-delimited file (where →
represents tab character) ...
1→thompson, gord→2015-04-15 2→loblaw, bob→2015-04-07
... , bulk insert file. latter (tab-delimited) file bulk insert code this:
import pypyodbc conn_str = "dsn=mydb_sqlexpress;" cnxn = pypyodbc.connect(conn_str) crsr = cnxn.cursor() sql = """ bulk insert mydb.dbo.spikedata123 'c:\\__tmp\\bitest.txt' ( fieldterminator='\\t', rowterminator='\\n' ); """ crsr.execute(sql) cnxn.commit() crsr.close() cnxn.close()
note: mentioned in comment, executing bulk insert
statement applicable if sql server instance can directly read source file. cases source file on remote client, see this answer.
Comments
Post a Comment