Am I slowing down the interpreter whenever I leave a comment in my code? According to my limited understanding of an interpreter, it reads program expressions in as strings and then converts those strings into code. It seems that every time it parses a comment, that is wasted time.
Is this the case? Is there some convention for comments in interpreted languages, or is the effect negligible?
For the case of Python, source files are compiled before being executed (the
.pyc files), and the comments are stripped in the process. So comments could slow down the compilation time if you have gazillions of them, but they won’t impact the execution time.
Comments are usually stripped out in or before the parsing stage, and parsing is very fast, so effectively comments will not slow down the initialization time.
Well, I wrote a short python program like this:
for i in range (1,1000000): a = i*10
The idea is, do a simple calculation loads of times.
By timing that, it took 0.35±0.01 seconds to run.
I then rewrote it with the whole of the King James Bible inserted like this:
for i in range (1,1000000): """ The Old Testament of the King James Version of the Bible The First Book of Moses: Called Genesis 1:1 In the beginning God created the heaven and the earth. 1:2 And the earth was without form, and void; and darkness was upon the face of the deep. And the Spirit of God moved upon the face of the waters. 1:3 And God said, Let there be light: and there was light. ... ... ... ... Even so, come, Lord Jesus. 22:21 The grace of our Lord Jesus Christ be with you all. Amen. """ a = i*10
This time it took 0.4±0.05 seconds to run.
So the answer is yes. 4MB of comments in a loop make a measurable difference.
Did up a script like Rich’s with some comments (only about 500kb text):
# -*- coding: iso-8859-15 -*- import timeit no_comments = """ a = 30 b = 40 for i in range(10): c = a**i * b**i """ yes_comment = """ a = 30 b = 40 # full HTML from http://en.wikipedia.org/ # wiki/Line_of_succession_to_the_British_throne for i in range(10): c = a**i * b**i """ loopcomment = """ a = 30 b = 40 for i in range(10): # full HTML from http://en.wikipedia.org/ # wiki/Line_of_succession_to_the_British_throne c = a**i * b**i """ t_n = timeit.Timer(stmt=no_comments) t_y = timeit.Timer(stmt=yes_comment) t_l = timeit.Timer(stmt=loopcomment) print "Uncommented block takes %.2f usec/pass" % ( 1e6 * t_n.timeit(number=100000)/1e5) print "Commented block takes %.2f usec/pass" % ( 1e6 * t_y.timeit(number=100000)/1e5) print "Commented block (in loop) takes %.2f usec/pass" % ( 1e6 * t_l.timeit(number=100000)/1e5)
C:\Scripts>timecomment.py Uncommented block takes 15.44 usec/pass Commented block takes 15.38 usec/pass Commented block (in loop) takes 15.57 usec/pass C:\Scripts>timecomment.py Uncommented block takes 15.10 usec/pass Commented block takes 14.99 usec/pass Commented block (in loop) takes 14.95 usec/pass C:\Scripts>timecomment.py Uncommented block takes 15.52 usec/pass Commented block takes 15.42 usec/pass Commented block (in loop) takes 15.45 usec/pass
Edit as per David’s comment:
-*- coding: iso-8859-15 -*- import timeit init = "a = 30\nb = 40\n" for_ = "for i in range(10):" loop = "%sc = a**%s * b**%s" historylesson = """ # <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" # blah blah... # --></body></html> """ tabhistorylesson = """ # <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" # blah blah... # --></body></html> """ s_looped = init + "\n" + for_ + "\n" + tabhistorylesson + loop % (' ','i','i') s_unroll = init + "\n" for i in range(10): s_unroll += historylesson + "\n" + loop % ('',i,i) + "\n" t_looped = timeit.Timer(stmt=s_looped) t_unroll = timeit.Timer(stmt=s_unroll) print "Looped length: %i, unrolled: %i." % (len(s_looped), len(s_unroll)) print "For block takes %.2f usec/pass" % ( 1e6 * t_looped.timeit(number=100000)/1e5) print "Unrolled it takes %.2f usec/pass" % ( 1e6 * t_unroll.timeit(number=100000)/1e5)
C:\Scripts>timecomment_unroll.py Looped length: 623604, unrolled: 5881926. For block takes 15.12 usec/pass Unrolled it takes 14.21 usec/pass C:\Scripts>timecomment_unroll.py Looped length: 623604, unrolled: 5881926. For block takes 15.43 usec/pass Unrolled it takes 14.63 usec/pass C:\Scripts>timecomment_unroll.py Looped length: 623604, unrolled: 5881926. For block takes 15.10 usec/pass Unrolled it takes 14.22 usec/pass
The effect is negligable for everyday usage. It’s easy to test, but if you consider a simple loop such as:
For N = 1 To 100000: Next
Your computer can process that (count to 100,000) quicker than you can blink. Ignoring a line of text that starts with a certain character will be more than 10,000 times faster.
Don’t worry about it.
It depends on how the interpreter is implemented. Most reasonably modern interpreters do at least a bit of pre-processing on the source code before any actual execution, and that will include stripping out the comments so they make no difference from that point onward.
At one time, when memory was severely constrained (e.g., 64K total addressable memory, and cassette tapes for storage) you couldn’t take things like that for granted. Back in the day of the Apple II, Commodore PET, TRS-80, etc., it was fairly routine to strip out comments (and even white-space) to improve execution speed.
Of course, it also helped that those machines had CPUs that could only execute one instruction at a time, had clock speeds around 1 MHz, and had only 8-bit processor registers. Even a machine you’d now find only in a dumpster is so much faster than those were that it’s not even funny…
Having comments will slow down the startup time, as the scripts will get parsed into an executable form. However, in most cases comments don’t slow down runtime.
Additionally in python, you can compile the .py files into .pyc, which won’t contain the comments (I should hope) – this means that you won’t get a startup hit either if the script is already compiled.
My limited understanding of an
interpreter is that it reads program
expressions in as strings and converts
those strings into code.
Most interpreters read the text (code) and produce an Abstract Syntax Tree data structure.
That structure contains no code, in text form, and of course no comments either. Just that tree is enough for executing programs. But interpreters, for efficiency reasons, go one step further and produce byte code. And Python does exactly that.
We could say that the code and the comments, in the form you wrote them, are simply not present,
when the program is running. So no, comments do not slow down the programs at run-time.
(*) Interpreters that do not use some other inner structure to represent the code other than text,
ie a syntax tree, must do exactly what you mentioned. Interpret again and again the code at run-time.
As the other answers have already stated, a modern interpreted language like Python first parses and compiles the source into bytecode, and the parser simply ignores the comments. This clearly means that any loss of speed would only occur at startup when the source is actually parsed.
Because the parser ignores comments, the compiling phase is basically unaffected by any comments you put in. But the bytes in the comments themselves are actually being read in, and then skipped over during parsing. This means, if you have a crazy amount of comments (e.g. many hundreds of megabytes), this would slow down the interpreter. But then again this would slow any compiler as well.
I wonder if it matters on how comments are used. For example, triple quotes is a docstring. If you use them, the content is validated. I ran into a problem awhile back where I was importing a library into my Python 3 code… I got this error regarding syntax on \N. I looked at the line number and it was content within a triple quote comment. I was somewhat surprised. New to Python, I never thought a block comment would be interpreted for syntax errors.
Simply if you type:
''' (i.e. \Device\NPF_..) '''
Python 2 doesn’t throw an error, but Python 3 reports:
SyntaxError: (unicode error) ‘unicodeescape’ codec can’t decode bytes in position 14-15: malformed \N character escape
So Python 3 is evidently interpreting the triple quote, making sure it’s valid syntax.
However, if turned into a single line comment: # (i.e. \Device\NPF_..)
No error results.
I wonder if the triple quote comments wer replaced with single lines, if a performance change would be seen.