Home » C++ » Remove comments from C/C++ code

Remove comments from C/C++ code

Posted by: admin November 30, 2017 Leave a comment

Questions:

Is there an easy way to remove comments from a C/C++ source file without doing any preprocessing. (ie, I think you can use gcc -E but this will expand macros.) I just want the source code with comments stripped, nothing else should be changed.

EDIT:

Preference towards an existing tool. I don’t want to have to write this myself with regexes, I foresee too many surprises in the code.

Answers:

Run the following command on your source file:

gcc -fpreprocessed -dD -E test.c

Thanks to KennyTM for finding the right flags. Here’s the result for completeness:

test.c:

#define foo bar
foo foo foo
#ifdef foo
#undef foo
#define foo baz
#endif
foo foo
/* comments? comments. */
// c++ style comments

gcc -fpreprocessed -dD -E test.c:

#define foo bar
foo foo foo
#ifdef foo
#undef foo
#define foo baz
#endif
foo foo

Questions:
Answers:

gcc -fpreprocessed -dD -E did not work for me but this program does it:

#include <stdio.h>

static void process(FILE *f)
{
 int c;
 while ( (c=getc(f)) != EOF )
 {
  if (c=='\'' || c=='"')            /* literal */
  {
   int q=c;
   do
   {
    putchar(c);
    if (c=='\') putchar(getc(f));
    c=getc(f);
   } while (c!=q);
   putchar(c);
  }
  else if (c=='/')              /* opening comment ? */
  {
   c=getc(f);
   if (c!='*')                  /* no, recover */
   {
    putchar('/');
    ungetc(c,f);
   }
   else
   {
    int p;
    putchar(' ');               /* replace comment with space */
    do
    {
     p=c;
     c=getc(f);
    } while (c!='/' || p!='*');
   }
  }
  else
  {
   putchar(c);
  }
 }
}

int main(int argc, char *argv[])
{
 process(stdin);
 return 0;
}

Questions:
Answers:

It depends on how perverse your comments are. I have a program scc to strip C and C++ comments. I also have a test file for it, and I tried GCC (4.2.1 on MacOS X) with the options in the currently selected answer – and GCC doesn’t seem to do a perfect job on some of the horribly butchered comments in the test case.

NB: This isn’t a real-life problem – people don’t write such ghastly code.

Consider the (subset – 36 of 135 lines total) of the test case:

/\
*\
Regular
comment
*\
/
The regular C comment number 1 has finished.

/\
\/ This is not a C++/C99 comment!

This is followed by C++/C99 comment number 3.
/\
\
\
/ But this is a C++/C99 comment!
The C++/C99 comment number 3 has finished.

/\
\* This is not a C or C++ comment!

This is followed by regular C comment number 2.
/\
*/ This is a regular C comment *\
but this is just a routine continuation *\
and that was not the end either - but this is *\
\
/
The regular C comment number 2 has finished.

This is followed by regular C comment number 3.
/\
\
\
\
* C comment */

On my Mac, the output from GCC (gcc -fpreprocessed -dD -E subset.c) is:

/\
*\
Regular
comment
*\
/
The regular C comment number 1 has finished.

/\
\/ This is not a C++/C99 comment!

This is followed by C++/C99 comment number 3.
/\
\
\
/ But this is a C++/C99 comment!
The C++/C99 comment number 3 has finished.

/\
\* This is not a C or C++ comment!

This is followed by regular C comment number 2.
/\
*/ This is a regular C comment *\
but this is just a routine continuation *\
and that was not the end either - but this is *\
\
/
The regular C comment number 2 has finished.

This is followed by regular C comment number 3.
/\
\
\
\
* C comment */

The output from ‘scc’ is:

The regular C comment number 1 has finished.

/\
\/ This is not a C++/C99 comment!

This is followed by C++/C99 comment number 3.
/\
\
\
/ But this is a C++/C99 comment!
The C++/C99 comment number 3 has finished.

/\
\* This is not a C or C++ comment!

This is followed by regular C comment number 2.

The regular C comment number 2 has finished.

This is followed by regular C comment number 3.

The output from ‘scc -C’ (which recognizes double-slash comments) is:

The regular C comment number 1 has finished.

/\
\/ This is not a C++/C99 comment!

This is followed by C++/C99 comment number 3.

The C++/C99 comment number 3 has finished.

/\
\* This is not a C or C++ comment!

This is followed by regular C comment number 2.

The regular C comment number 2 has finished.

This is followed by regular C comment number 3.

Source for SCC now available on GitHub

The current version of SCC is 6.60 (dated 2016-06-12), though the Git versions were created on 2017-01-18 (in the US/Pacific time zone). The code is available from GitHub at https://github.com/jleffler/scc-snapshots. You can also find snapshots of the previous releases (4.03, 4.04, 5.05) and two pre-releases (6.16, 6.50) — these are all tagged release/x.yz.

The code is still primarily developed under RCS. I’m still working out how I want to use sub-modules or a similar mechanism to handle common library files like stderr.c and stderr.h (which can also be found in https://github.com/jleffler/soq).

SCC version 6.60 attempts to understand C++11, C++14 and C++17 constructs such as binary constants, numeric punctuation, raw strings, and hexadecimal floats. It defaults to C11 mode operation. (Note that the meaning of the -C flag — mentioned above — flipped between version 4.0x described in the main body of the answer and version 6.60 which is currently the latest release.)

Questions:
Answers:

There is a stripcmt program than can do this:

StripCmt is a simple utility written in C to remove comments from C, C++, and Java source files. In the grand tradition of Unix text processing programs, it can function either as a FIFO (First In – First Out) filter or accept arguments on the command line.

(per hlovdal‘s answer to: question about Python code for this)

Questions:
Answers:

This is a perl script to remove //one-line and /* multi-line */ comments

  #!/usr/bin/perl

  undef $/;
  $text = <>;

  $text =~ s/\/\/[^\n\r]*(\n\r)?//g;
  $text =~ s/\/\*+([^*]|\*(?!\/))*\*+\///g;

  print $text;

It requires your source file as a command line argument.
Save the script to a file, let say remove_comments.pl
and call it using the following command: perl -w remove_comments.pl [your source file]

Hope it will be helpful

Questions:
Answers:

I had this problem as well. I found this tool (Cpp-Decomment) , which worked for me. However it ignores if the comment line extends to next line. Eg:

// this is my comment \
comment continues ...

In this case, I couldn’t find a way in the program so just searched for ignored lines and fixed in manually. I believe there would be an option for that or maybe you could change the program’s source file to do so.

Questions:
Answers:

I Believe If you use one statement you can easily remove Comments from C

perl -i -pe ‘s/\\*(.*)/g’ file.c This command Use for removing * C style comments 
perl -i -pe 's/\\(.*)/g' file.cpp This command Use for removing \ C++ Style Comments

Only Problem with this command it cant remove comments that contains more than one line.but by using this regEx you can easily implement logic for Multiline Removing comments

Questions:
Answers:

Because you use C, you might want to use something that’s “natural” to C. You can use the C preprocessor to just remove comments. The examples given below work with the C preprocessor from GCC. They should work the same or in similar ways with other C perprocessors as well.

For C, use

cpp -dD -fpreprocessed -o output.c input.c

It also works for removing comments from JSON, for example like this:

cpp -P -o - - <input.json >output.json

In case your C preprocessor is not accessible directly, you can try to replace cpp with cc -E, which calls the C compiler telling it to stop after the preprocessor stage.
In case your C compiler binary is not cc you can replace cc with the name of your C compiler binary, for example clang. Note that not all preprocessors support -fpreprocessed.

Questions:
Answers:

Recently I wrote some Ruby code to solve this problem. I have considered following exceptions:

  • comment in strings
  • multiple line comment on one line, fix greedy match.
  • multiple lines on multiple lines

Here is the code:Github, Remove comments

It uses following code to preprocess each line in case those comments appear in strings. If it appears in your code, uh, bad luck. You can replace it with a more complex strings.

  • MUL_REPLACE_LEFT = “MUL_REPLACE_LEFT
  • MUL_REPLACE_RIGHT = “MUL_REPLACE_RIGHT
  • SIG_REPLACE = “SIG_REPLACE

Usage: ruby -w inputfile outputfile

Questions:
Answers:

I know it’s late, but I thought I’d share my code and my first attempt at writing a compiler.

Note: this does not account for "\*/" inside a multiline comment e.g /\*...."*/"...\*. Then again, gcc 4.8.1 doesn’t either.

void function_removeComments(char *pchar_sourceFile, long long_sourceFileSize)
{
    long long_sourceFileIndex = 0;
    long long_logIndex = 0;

    int int_EOF = 0;

    for (long_sourceFileIndex=0; long_sourceFileIndex < long_sourceFileSize;long_sourceFileIndex++)
    {
        if (pchar_sourceFile[long_sourceFileIndex] == '/' && int_EOF == 0)
        {
            long_logIndex = long_sourceFileIndex;  // log "possible" start of comment

            if (long_sourceFileIndex+1 < long_sourceFileSize)  // array bounds check given we want to peek at the next character
            {
                if (pchar_sourceFile[long_sourceFileIndex+1] == '*') // multiline comment
                {
                    for (long_sourceFileIndex+=2;long_sourceFileIndex < long_sourceFileSize; long_sourceFileIndex++)
                    {
                        if (pchar_sourceFile[long_sourceFileIndex] == '*' && pchar_sourceFile[long_sourceFileIndex+1] == '/')
                        {
                            // since we've found the end of multiline comment
                            // we want to increment the pointer position two characters
                            // accounting for "*" and "/"
                            long_sourceFileIndex+=2;  

                            break;  // terminating sequence found
                        }
                    }

                    // didn't find terminating sequence so it must be eof.
                    // set file pointer position to initial comment start position
                    // so we can display file contents.
                    if (long_sourceFileIndex >= long_sourceFileSize)
                    {
                        long_sourceFileIndex = long_logIndex;

                        int_EOF = 1;
                    }
                }
                else if (pchar_sourceFile[long_sourceFileIndex+1] == '/')  // single line comment
                {
                    // since we know its a single line comment, increment file pointer
                    // until we encounter a new line or its the eof 
                    for (long_sourceFileIndex++; pchar_sourceFile[long_sourceFileIndex] != '\n' && pchar_sourceFile[long_sourceFileIndex] != '
void function_removeComments(char *pchar_sourceFile, long long_sourceFileSize) { long long_sourceFileIndex = 0; long long_logIndex = 0; int int_EOF = 0; for (long_sourceFileIndex=0; long_sourceFileIndex < long_sourceFileSize;long_sourceFileIndex++) { if (pchar_sourceFile[long_sourceFileIndex] == '/' && int_EOF == 0) { long_logIndex = long_sourceFileIndex; // log "possible" start of comment if (long_sourceFileIndex+1 < long_sourceFileSize) // array bounds check given we want to peek at the next character { if (pchar_sourceFile[long_sourceFileIndex+1] == '*') // multiline comment { for (long_sourceFileIndex+=2;long_sourceFileIndex < long_sourceFileSize; long_sourceFileIndex++) { if (pchar_sourceFile[long_sourceFileIndex] == '*' && pchar_sourceFile[long_sourceFileIndex+1] == '/') { // since we've found the end of multiline comment // we want to increment the pointer position two characters // accounting for "*" and "/" long_sourceFileIndex+=2; break; // terminating sequence found } } // didn't find terminating sequence so it must be eof. // set file pointer position to initial comment start position // so we can display file contents. if (long_sourceFileIndex >= long_sourceFileSize) { long_sourceFileIndex = long_logIndex; int_EOF = 1; } } else if (pchar_sourceFile[long_sourceFileIndex+1] == '/') // single line comment { // since we know its a single line comment, increment file pointer // until we encounter a new line or its the eof for (long_sourceFileIndex++; pchar_sourceFile[long_sourceFileIndex] != '\n' && pchar_sourceFile[long_sourceFileIndex] != '\0'; long_sourceFileIndex++); } } } printf("%c",pchar_sourceFile[long_sourceFileIndex]); } } 
'; long_sourceFileIndex++); } } } printf("%c",pchar_sourceFile[long_sourceFileIndex]); } }

Questions:
Answers:
#include<stdio.h>
{        
        char c;
        char tmp = '
#include<stdio.h> { char c; char tmp = '\0'; int inside_comment = 0; // A flag to check whether we are inside comment while((c = getchar()) != EOF) { if(tmp) { if(c == '/') { while((c = getchar()) !='\n'); tmp = '\0'; putchar('\n'); continue; }else if(c == '*') { inside_comment = 1; while(inside_comment) { while((c = getchar()) != '*'); c = getchar(); if(c == '/'){ tmp = '\0'; inside_comment = 0; } } continue; }else { putchar(c); tmp = '\0'; continue; } } if(c == '/') { tmp = c; } else { putchar(c); } } return 0; } 
'; int inside_comment = 0; // A flag to check whether we are inside comment while((c = getchar()) != EOF) { if(tmp) { if(c == '/') { while((c = getchar()) !='\n'); tmp = '
#include<stdio.h> { char c; char tmp = '\0'; int inside_comment = 0; // A flag to check whether we are inside comment while((c = getchar()) != EOF) { if(tmp) { if(c == '/') { while((c = getchar()) !='\n'); tmp = '\0'; putchar('\n'); continue; }else if(c == '*') { inside_comment = 1; while(inside_comment) { while((c = getchar()) != '*'); c = getchar(); if(c == '/'){ tmp = '\0'; inside_comment = 0; } } continue; }else { putchar(c); tmp = '\0'; continue; } } if(c == '/') { tmp = c; } else { putchar(c); } } return 0; } 
'; putchar('\n'); continue; }else if(c == '*') { inside_comment = 1; while(inside_comment) { while((c = getchar()) != '*'); c = getchar(); if(c == '/'){ tmp = '
#include<stdio.h> { char c; char tmp = '\0'; int inside_comment = 0; // A flag to check whether we are inside comment while((c = getchar()) != EOF) { if(tmp) { if(c == '/') { while((c = getchar()) !='\n'); tmp = '\0'; putchar('\n'); continue; }else if(c == '*') { inside_comment = 1; while(inside_comment) { while((c = getchar()) != '*'); c = getchar(); if(c == '/'){ tmp = '\0'; inside_comment = 0; } } continue; }else { putchar(c); tmp = '\0'; continue; } } if(c == '/') { tmp = c; } else { putchar(c); } } return 0; } 
'; inside_comment = 0; } } continue; }else { putchar(c); tmp = '
#include<stdio.h> { char c; char tmp = '\0'; int inside_comment = 0; // A flag to check whether we are inside comment while((c = getchar()) != EOF) { if(tmp) { if(c == '/') { while((c = getchar()) !='\n'); tmp = '\0'; putchar('\n'); continue; }else if(c == '*') { inside_comment = 1; while(inside_comment) { while((c = getchar()) != '*'); c = getchar(); if(c == '/'){ tmp = '\0'; inside_comment = 0; } } continue; }else { putchar(c); tmp = '\0'; continue; } } if(c == '/') { tmp = c; } else { putchar(c); } } return 0; } 
'; continue; } } if(c == '/') { tmp = c; } else { putchar(c); } } return 0; }

This program runs for both the conditions i.e // and /…../