Category Archives: Software - Page 2

Interesting python + MySQL speed test

I’m currently working on a project involving a large amount of database processing on the backend. The last week has been dedicated to getting a script running to take the crappy data from multiple DBs, process it into something nice, and inserting it into my working database. The original stab at the code with no attempt to speed it up wound up taking about 4 seconds per processed item. This particular database had about 170,000 items to process, so the import script would take something near 189 hours. I broke out the profiler and found over 99% of the time was wasted while waiting for mysql select queries to come back. My initial round of changes helped immensely (from 4.0 to 0.25 seconds per item), but we were still at almost 12 hours to run. That’s when I ported the code to python and started testing threads.

Before I get into it, let’s make this crystal clear: THIS IS NOT GREAT TEST DATA. DRAW NO CONCLUSIONS FROM THIS. This was not a robust test in any way. I continued to make time saving changes and bug fixes as they were discovered, so even the time between runs can’t be relied upon. Nor was this on an isolated server — this was done on a production server currently hosting dozens of websites with unknown traffic and load. Even the test numbers are too small to make real assessments of performance. Finally, this applies ONLY to my precise script and the algorithm I’m using. You will get different results with a different project. It would be foolish to take anything concrete from these numbers.

That being said, the results are still interesting to consider and they should encourage you to do your own testing when building your next database heavy project.

Up to this point I had been looking at the time/processed item, so I started there after throwing together the threaded importer. Here are the results of running the script with a set number of entries for each thread to process with varying numbers of threads.

Seconds Per Processed Item
Threads 100 Items per Thread (Total) 250 Items per Thread (Total)
1 .118 (100) .106 (100)
2 .097 (200) .106 (250)
3 .095 (300) .110 (500)
4 .102 (400) .114 (750)
5 .106 (500) .115 (1250)
8 .112 (800)
10 .116 (1000)
12 .120 (1200)
15 .120 (1500)
Seconds/ Item by Items/Thread


Not at all what I expected, but with the vastly different number of total items being processed, I decided to take another approach. Here is the same script, but this time with a set TOTAL number of entries to process (5000). In other words, the above data did X items per thread, so the total number of processed items was threads * X. Now, I’m assigning each thread 5000/threads entries. This makes much more sense for the project (the first test was plain stupid) and, as you can see, the results are more what you would expect.

5000 Items Evenly Across All Threads
Threads Total Time Time per Item
1 08:17.21 0.0994
2 07:13.33 0.0867
3 06:41.97 0.0804
4 06:32.97 0.0786
5 06:27.36 0.0775
8 06:23.59 0.0767
10 06:24.21 0.0768
12 06:24.24 0.0769
15 06:23.25 0.0767
5000 Items - Time/Item by Threads


Now you can see that the threading is making a sizeable difference – almost 23% at its best. An interesting thing to note is that, for this script, the benefits of threading drop off at about 8. Obviously at that point the threads just get in each other’s way instead of keeping the database fully loaded. Overall, the lessons to be learned here are 1. test your project in multiple different ways and 2. threading will likely help out if you’re wasting the majority of your time waiting for database selections (as opposed to processing data).

DokuWiki / WordPress 3.x Integration Fix

As many people noticed, wordpress 3.0 broke the DokuWiki/Wordpress integration I had up. I didn’t find an elegant solution, but I got a hack working and posted it here.

Another Chronolapse Update

I’ve been feeling productive and I knocked down the rest of the todo list for Chronolapse.

You can see the project page here

Here is the change log for version 1.0.3

  • fixed a bug with times under 1 second being fired 1000 times too fast
  • silently fails if set to capture 2 monitors but they are duplicated
  • added system wide hotkey — doesnt seem to work right in wx so it is only part implemented
  • added audio tab to dub audio onto your timelapse
  • added a drop shadow option for annotations
  • added a schedule tab for scheduling starts and/or stops

Building CX_Freeze on the Mac

This isn’t my info — it is from the SourceForge mailing list — but it is so damn helpful that I wanted to re-post it in case it helps someone find it in the future.

Where I originally saw it: http://www.mail-archive.com/cx-freeze-users@lists.sourceforge.net/msg00334.html

I had to make the following changes to get cx_Freeze to work on the Mac

Before building

change ldd in freezer.py to otool -L
mkdir -p Python.framework/Versions/2.6/
ln -s /Library/Frameworks/Python.framework/Versions/2.6/Python
Python.framework/Versions/2.6/

Before running a built executable

mkdir /lib /include
ln -s /Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6
/lib/python2.6
ln -s /Library/Frameworks/Python.framework/Versions/2.6/include/python2.6
/include/python2.6

Jeffery G. Smith
MedPlus, A Quest Diagnostics Company | Senior SCM Specialist | 4690 Parkway
Drive | Mason, OH 45040 USA | phone +1.513.204.2601 | fax +1.513.229.5505 |
mobile +1.513.335.1517 | jsm…@medplus.com | www.MedPlus.com

Updated DokuWiki + WordPress Integration

I finally got around to updating the dokuwiki/wordpress integration plugin to work with the newer versions of WP and DW. It ended up being only one additional step which was tracked down by the commenters, so a big thanks to them for keeping this up to date.

http://keeyai.com/projects-and-releases/dokuwiki-tools/dokuwiki-and-wordpress-integration/

My test installs I just used are WordPress 2.9.1 and Dokuwiki 2009-12-25, which are the two latest stable releases as of this moment.

Good luck everyone and if you have any problems, please post in the comments.