I designed data warehousing application, but I struggle with poor performance when fetching data from the source and saving them to db. - only approximately 150 kB/s.
Because of the limitations imposed by the customer, I am forced to use Django on 64bit Win machine and save data to MS SQL Express (exact versions below). I am using django-mssql (1.7) backend.
Original data are stored in .dbf
file (Visual FoxPro), dbfread
returns each row from a file as Python dict
(this is not the issue, tested by running just reader discarding data). This dictionary is then checked for data quality (function sanitize_value_for_db()
below), data are copied to Django data model (attributes of the object populated; tables are wide, hence each table/object has about 100 columns/attributes) and objects are saved to db. using Django objects.bulk_create()
(in batches of 50-100).
I run the code through profiler using cProfile
and pstats
modules. Results are below. I see that most of the time is spent in PyWin. But I have no clue if there is something I can do. Any hints or opinions will be greatly appreciated. Thanks.
Configuration:
- Xeon E5-2403 v2 @ 1,8 GHz, 30 GB RAM
- Windows Server 2012 R2 (64bit)
- MS SQL Express 64bit, v 11.02.2100.60
- Python (v3.4.3:9b73f1c3e601, Feb 24 2015, 22:44:40) [MSC v.1600 64 bit (AMD64)] on win32#
- pywin32-219.win-amd64-py3.4
- Django (1.7.10)
- django-mssql (1.7)
Profile (on smaller data sample):
ncalls tottime percall cumtime percall filename:lineno(function)
2306881 332.596 0.000 332.596 0.000 {built-in method compile}
4592205 186.028 0.000 186.028 0.000 {method 'InvokeTypes' of 'PyIDispatch' objects}
39558280 176.963 0.000 176.963 0.000 {method 'Bind' of 'PyITypeComp' objects}
9889570 173.905 0.000 712.091 0.000 c:\Python34\Lib\site-packages\win32com\client\dynamic.py:390(_LazyAddAttr_)
17464949 151.818 0.000 279.692 0.000 c:\Python34\Lib\site-packages\win32com\client\build.py:151(_AddFunc_)
7203 142.584 0.020 2215.686 0.308 c:\Python34\Lib\site-packages\sqlserver_ado\dbapi.py:587(execute)
4 120.012 30.003 120.012 30.003 {built-in method sleep}
11484250 100.385 0.000 1269.561 0.000 c:\Python34\Lib\site-packages\win32com\client\dynamic.py:444(__getattr__)
12167658 67.229 0.000 67.229 0.000 {method 'Invoke' of 'PyIDispatch' objects}
22100535 50.127 0.000 53.982 0.000 <string>:12(__init__)
4599456 48.409 0.000 225.498 0.000 c:\Python34\Lib\site-packages\win32com\client\__init__.py:18(__WrapDispatch)
27390427 45.257 0.000 65.391 0.000 c:\Python34\Lib\site-packages\win32com\client\build.py:420(_ResolveType)
7189 44.294 0.006 2296.538 0.319 c:\Python34\Lib\site-packages\django\db\models\query.py:911(_insert)
9889570 43.621 0.000 759.155 0.000 c:\Python34\Lib\site-packages\win32com\client\dynamic.py:381(__LazyMap__)
2306881 43.605 0.000 130.943 0.000 c:\Python34\Lib\site-packages\win32com\client\build.py:303(MakeDispatchFuncMethod)
5275612 33.300 0.000 402.786 0.000 c:\Python34\Lib\site-packages\win32com\client\dynamic.py:524(__setattr__)
17464949 32.793 0.000 32.793 0.000 c:\Python34\Lib\site-packages\win32com\client\build.py:58(__init__)
4599456 32.494 0.000 124.536 0.000 c:\Python34\Lib\site-packages\win32com\client\dynamic.py:120(Dispatch)
2292471 30.889 0.000 843.841 0.000 c:\Python34\Lib\site-packages\sqlserver_ado\dbapi.py:266(_configure_parameter)
7007079 27.918 0.000 53.176 0.000 c:\Python34\Lib\site-packages\win32com\client\build.py:516(MakePublicAttributeName)
2306881 27.907 0.000 46.266 0.000 c:\Python34\Lib\site-packages\win32com\client\build.py:483(_BuildArgList)
15201179 25.593 0.000 25.593 0.000 {method 'GetTypeAttr' of 'PyITypeInfo' objects}
68338972 24.271 0.000 24.758 0.000 {built-in method isinstance}
2306881 23.056 0.000 538.037 0.000 c:\Python34\Lib\site-packages\win32com\client\dynamic.py:314(_make_method_)
17464949 21.683 0.000 21.683 0.000 {method 'GetNames' of 'PyITypeInfo' objects}
7007079 20.396 0.000 20.396 0.000 c:\Python34\Lib\site-packages\win32com\client\build.py:546(<listcomp>)
4599456 18.166 0.000 18.166 0.000 c:\Python34\Lib\site-packages\win32com\client\dynamic.py:172(__init__)
3422203 15.624 0.000 20.870 0.000 C:\MRI\mri\dwh\daq\daq_utils.py:152(sanitize_value_for_db)