Chronolapse has a new test release that needs your help. Most importantly, this is the linux/mac code (no webcams yet, sorry). Please download and leave your feedback.
http://code.google.com/p/chronolapse
Chronolapse has hit version 1.0.4!
The last version was built without the console window, but when frozen into an executable this was causing the mencoder process to freeze indefinitely. This addresses the issue, fixes the update checking code, and makes the mencoder feedback not hang the GUI while it runs.
Just in time for LD!
I’m currently working on a project involving a large amount of database processing on the backend. The last week has been dedicated to getting a script running to take the crappy data from multiple DBs, process it into something nice, and inserting it into my working database. The original stab at the code with no attempt to speed it up wound up taking about 4 seconds per processed item. This particular database had about 170,000 items to process, so the import script would take something near 189 hours. I broke out the profiler and found over 99% of the time was wasted while waiting for mysql select queries to come back. My initial round of changes helped immensely (from 4.0 to 0.25 seconds per item), but we were still at almost 12 hours to run. That’s when I ported the code to python and started testing threads.
Before I get into it, let’s make this crystal clear: THIS IS NOT GREAT TEST DATA. DRAW NO CONCLUSIONS FROM THIS. This was not a robust test in any way. I continued to make time saving changes and bug fixes as they were discovered, so even the time between runs can’t be relied upon. Nor was this on an isolated server — this was done on a production server currently hosting dozens of websites with unknown traffic and load. Even the test numbers are too small to make real assessments of performance. Finally, this applies ONLY to my precise script and the algorithm I’m using. You will get different results with a different project. It would be foolish to take anything concrete from these numbers.
That being said, the results are still interesting to consider and they should encourage you to do your own testing when building your next database heavy project.
Up to this point I had been looking at the time/processed item, so I started there after throwing together the threaded importer. Here are the results of running the script with a set number of entries for each thread to process with varying numbers of threads.
| Seconds Per Processed Item | ||
|---|---|---|
| Threads | 100 Items per Thread (Total) | 250 Items per Thread (Total) |
| 1 | .118 (100) | .106 (100) |
| 2 | .097 (200) | .106 (250) |
| 3 | .095 (300) | .110 (500) |
| 4 | .102 (400) | .114 (750) |
| 5 | .106 (500) | .115 (1250) |
| 8 | .112 (800) | |
| 10 | .116 (1000) | |
| 12 | .120 (1200) | |
| 15 | .120 (1500) | |
Not at all what I expected, but with the vastly different number of total items being processed, I decided to take another approach. Here is the same script, but this time with a set TOTAL number of entries to process (5000). In other words, the above data did X items per thread, so the total number of processed items was threads * X. Now, I’m assigning each thread 5000/threads entries. This makes much more sense for the project (the first test was plain stupid) and, as you can see, the results are more what you would expect.
| 5000 Items Evenly Across All Threads | ||
|---|---|---|
| Threads | Total Time | Time per Item |
| 1 | 08:17.21 | 0.0994 |
| 2 | 07:13.33 | 0.0867 |
| 3 | 06:41.97 | 0.0804 |
| 4 | 06:32.97 | 0.0786 |
| 5 | 06:27.36 | 0.0775 |
| 8 | 06:23.59 | 0.0767 |
| 10 | 06:24.21 | 0.0768 |
| 12 | 06:24.24 | 0.0769 |
| 15 | 06:23.25 | 0.0767 |
Now you can see that the threading is making a sizeable difference – almost 23% at its best. An interesting thing to note is that, for this script, the benefits of threading drop off at about 8. Obviously at that point the threads just get in each other’s way instead of keeping the database fully loaded. Overall, the lessons to be learned here are 1. test your project in multiple different ways and 2. threading will likely help out if you’re wasting the majority of your time waiting for database selections (as opposed to processing data).
As many people noticed, wordpress 3.0 broke the DokuWiki/Wordpress integration I had up. I didn’t find an elegant solution, but I got a hack working and posted it here.
I’ve been feeling productive and I knocked down the rest of the todo list for Chronolapse.
You can see the project page here
Here is the change log for version 1.0.3
Recent Comments