Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Summary
- This is a simple python script that can measure python program running time in fine levels.
- It’s simpler than a full profiler, easier to use than other currently available similar scripts.
The need of a simple timing script
I have been revisiting Python recently and walked through Google’s tutorial. Interestingly my first solution to one of the exercise word count took a lot of time to analysis a 600K text.
I wanted some profile tool to locate the bottle neck. There are lots of profiler tools but they seemed to be too “heavy weighted” for my simple purpose. This great Python profiler guide introduced a simple timing context manager script to measure script execution. You will need to
wrap blocks of code that you want to time with Python’s with keyword and this Timer context manager. It will take care of starting the timer when your code block begins execution and stopping the timer when your code block ends.
Like this
from timer import Timer from redis import Redis rdb = Redis() with Timer() as t: rdb.lpush("foo", "bar") print "=> elasped lpush: %s s" % t.secs with Timer() as t: rdb.lpop("foo") print "=> elasped lpop: %s s" % t.secs |
I used this method and found out the problem in my script is that I usedif key in dict.keys()
which seems natural since I saw this before:for key in sorted(dict.keys()):
However, this is a linear look up in a list which didn’t utilize the constant time access of HashMap. The right way is to use dict directly without keys().if key in dict
So the problem was solved but I still found the timing context manager not simple and intuitive enough. Adding timer means lots of edits and indents of original code, and moving checkpoints involves more editing and indenting.
My version of an even simpler timing script
Therefore, I wrote a simple script that have similar function of timing code blocks by checkpoints, also kept the usage effort minimal.
- import the script
- Place
times.start(digit)
in the place that you want the timer start.digit
control the digits after the decimal point, default at 7. - Use
times.seg_start("msg")
andtimes.seg_stop("msg")
enclose a code block and print the time of start and stop.msg
can be used to identify the code block in the output. - You can also add single checkpoint
times.last_seg
anywhere, it will print the time elapsed since last checkpoint which could be of any type.
Here is an example of using timing script in Wordcount. You can search and highlight times
to see all the edits needed in script.
#!/usr/bin/python -tt # Copyright 2010 Google Inc. # Licensed under the Apache License, Version 2.0 # http://www.apache.org/licenses/LICENSE-2.0 # Google's Python Class # http://code.google.com/edu/languages/google-python-class/ """Wordcount exercise Google's Python class """ import sys from basic.util import times ### # This basic command line argument parsing code is provided and # calls the print_words() and print_top() functions which you must define. def count_words(filename): """helper method for the other 2, return dict""" # read file into words times.seg_start('count words start') f = open(filename, 'rU') word_count = {} for line in f: words = line.split() for word in words: word = word.lower() if word in word_count: # 0.069 for 640k #if word_count.get(word): # 0.03s for 250k text, 0.084 for 640k word_count[word] += 1 else: word_count[word] = 1 times.seg_stop('count words stop') return word_count def print_words(filename): "count words" #sort dict and print word_count = count_words(filename) for word in sorted(word_count.keys()): print word, word_count[word] def print_top(filename): word_count = count_words(filename) word_count_pairs = word_count.items() times.seg_start('sorting start') word_count_pairs = sorted(word_count_pairs, key = lambda pair: pair[-1], reverse=True) times.seg_stop('sorting stop') for i in range(3): print word_count_pairs[i][0], word_count_pairs[i][1] times.last_seg() def main(): if len(sys.argv) != 3: print 'usage: ./wordcount.py {--count | --topcount} file' sys.exit(1) times.start(5) option = sys.argv[1] filename = sys.argv[2] if option == '--count': print_words(filename) elif option == '--topcount': print_top(filename) else: print 'unknown option: ' + option sys.exit(1) times.end() if __name__ == '__main__': main() |
I put some efforts in the print indenting to make the output aligned:
==>| Timer start | set to 5 digits after decimal point => << | 0 count words start | 0.06900 s count words stop >> => << | 0 sorting start | 0.00800 s sorting stop >> the 5027 to 3353 and 2831 => | 0.00100 s since last point => | 0.00300 s since last point. Timer end. | 0.08600 s Total time elapsed | |
Here is the script. You can also download it from the link in right top corner of code block.
import time __author__ = 'dracodoc' # measure script time duration in segments. http://dracodoc.github.io/ # the clock value list: start, segments, end # usage: import times, put times.start, seg_start, end etc in line. T = [] Digit = [7] def start(digit=7): """Timer start. digit control the number width to align""" del T[:] # clean up first Digit[0] = digit T.append(time.time()) print '==>| Timer start | set to', Digit[0], 'digits after decimal point' def last_seg(s='since last point'): """calculate the duration between last point till this one""" T.append(time.time()) duration = T[-1] - T[-2] print "=> | %.*f s" % (Digit[0], duration), s def seg_start(s='start...'): """set a segment start, always used with seg_stop in pairs""" T.append(time.time()) print "=> << | 0", ' ' * (Digit[0] + 3), s def seg_stop(s='...stop'): """set a segment end, always used with seg_start in pairs""" T.append(time.time()) duration = T[-1] - T[-2] print " | %.*f s " % (Digit[0], duration), s, ' >>' def end(s='since last point. Timer end.'): T.append(time.time()) duration = T[-1] - T[-2] total = T[-1] - T[0] print "=> | %.*f s" % (Digit[0], duration), s print "==>| %.*f s" % (Digit[0], total), 'Total time elapsed' |
Of course sometimes you want more features like execution frequency and memory analysis, then you can always use more powerful profiler like line_profiler and memory profiler. The profiler guide I mentioned earlier have detailed introductions on them.
That being said, I still found my simple timing script is often useful enough and easy to use with minimal overhead.
Version History
- 2015-01-15 : The first version of this article on wordpress
- 2015-10-20 : Some edits when I moved my blog to github pages.
- 2016-05-10 : Added Summary.
- 2016-08-19 : Syntax highlighting for all code chunks
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.