Rovol<

Nested Multiline Comment

Nested multiline comments are an extension of multiline comment, which is handled during the lexical analysis stage. It is interesting that most programming languages do not support the nested multiline comment. I guess they likely consider that it is unnecessary and makes more complexity in lexical analysis, moreover, they might inherit this style from C or just because of some historical issues. However, a nestable mechanism of multiline comments could be very useful in some cases, e.g., comment out a huge selection of the code which includes some multiline comments already.

Anyway, the implementations of nested multiline comments usually (or, always) use an accumulator or stack to record the depth of the nested multiline comment and checks their well-terminatedness of them and reports the error messages to the programmers.

This write-up describes the basic principles and implementation of nested multiline comment, and discusses how to design a user-friendly and intuitive handling of the exceptions of nested multiline comment. And, it is also used to practice my writing skill.

Implementation

A minimal implementation of the nested multiline comment could be very simple and clear. Look at the following C code:

typedef struct {
  FILE *in;
  char  ch;
  char  ne;
  size_t ln;
  size_t col;
} lex_t;

void adv(lex_t *lex) { ... }

void skip_multiline_comment(lex_t *lex) {
  adv(lex); // skip /
  adv(lex); // skip *
  
  size_t depth = 1;
  
  while (depth > 0) {
    if (lex->ch == EOF)
      // TODO: error handling here
    
    if (lex->ch == '/' && lex->ne == '*') {
      adv(lex); // skip /
      adv(lex); // skip *
      depth++;
    } else if (lex->ch == '*' && lex->ne == '/') {
      adv(lex); // skip *
      adv(lex); // skip /
      depth--;
    } else {
      adv(lex);
    }
  }
}

Because this function assumes that we have matched the beginning of the multiline comment already, we need to advance the lexical analyzer twice to skip the first two characters '/' and '*' firstly.

After advancing, we declare a variable depth to record the depth of the nested multiline comment we scan, which is simply our “accumulator”. The following control flow is a while loop, which will operate the accumulator by these rules: if the analyzer encounters a beginning marker (it is /* in this example) of a multiline comment, then the accumulator will be increased by 1; if the analyzer encounters an ending marker (*/) , then the accumulator will be decreased by 1; otherwise, we just ignore the content of the comment and keep advancing.

Depending on the terminating condition of the loop, the accumulator will be increased or decreased until it reaches to zero — which means that the comment is terminated normally. If the comment is not terminated normally, then the accumulator will not be decreased to zero, and the loop will keep running until the analyzer encounters an “EOF”, meaning “the end of file”.

You might note that: in this example, I do not show you the error handling of this scanner — because it will be discussed in the next section: the ways to handle the exceptions.

Error Handling

A main exception is the unterminated multiline comment. But because of the nestedness, the error handling of nested multiline comment requires more thought. Here, the core question is: how do we best locate and report these errors?

Well, we could record the latest unterminated comment encountered by the analyzer. For implementation, we need to record the line and column of every comment we occur. Let's incorporate this into the code:

size_t ln = lex->ln, col = lex->col;

// ...

if (lex->ch == EOF) {
  LOG(ERRO, "unterminated multiline comment occurred at %zu:%zu", ln, col);
  return;
}
Now we have two variables ln and col for recording the location of the comment, Accordingly, we need to update them in the loop:
if (lex->ch == '/' && lex->ne == '*') {
  ln = lex->ln;
  col = lex->col;
  // ...
}
Assume there is the source file a.rem, then this code will work like this:

$ cat rem.txt
/*
/*
*/
$ ./lexer rem.txt
[ERRO] unterminated multiline comment occurred at 2:1

There is a problem — in the rem.txt, the unterminated comment is in the first line in fact, but the logger reported that the unterminated comment is in the second line! Actually, we only need to record the outermost multiline comment of the entire nested multiline comment. Because if the multiline comment is unterminated, then the outermost comment is always unenclosed. So, we do not need to record every beginning marker, we can just record the outermost beginning marker:

if (lex->ch == '/' && lex->ne == '*') {
  // delete the updating here.
  // ...
}
Now, the error message is much more clear:
$ cat rem.txt
/*
/*
*/
$ ./lexer rem.txt
[ERRO] unterminated multiline comment occurred at 1:1

In fact, this implementation still has a defect, which is that the reporter does not handle the multiple unterminated comments well. Think about this example:

/*
/*
Given our current implementation, it will only report the beginning of this structure:
[ERRO] unterminated multiline comment occurred at 1:1

It seems useless, whereas the ideal output is: the reporter should iterate through the comments and find all unterminated comments, then report them one by one. To solve this problem, we need a stack to record the entire structure of the comment. You need to weigh — is it the current solution enough? Will the introduction of the stack make more unnecessary complexity? I think the former solution is enough in most cases, but in this write-up, I will still show you my implementation of the “ideal solution”.

Stack-based Error Handling

Mentioned before, we need a stack to implement the solution. The implementation is shown in the following C code:

#define COMMENT_MAX_DEPTH 8

typedef struct {
  size_t ln;
  size_t col;
} loc_t;

void skip_multiline_comment(lex_t *lex) {
  loc_t stack[COMMENT_MAX_DEPTH];
  
  adv(lex); // skip /
  adv(lex); // skip *

  size_t depth = 0;
  stack[depth].ln = lex->ln;
  stack[depth].col = lex->col;
  depth++;
  
  while (depth > 0) {
    if (lex->ch == EOF) {
      for (size_t i = 0; i < depth; i++)
        LOG(ERRO, "unterminated multiline comment occurred at %zu:%zu",
          stack[i].ln, stack[i].col);
      return;
    }
    
    if (depth >= COMMENT_MAX_DEPTH) {
      LOG(ERRO, "comment nesting too deep, maximum depth is %d",
        COMMENT_MAX_DEPTH);
      return;
    }
    
    if (lex->ch == '/' && lex->ne == '*') {
      stack[depth].ln = lex->ln;
      stack[depth].col = lex->col;
      depth++;
      
      adv(lex); // skip /
      adv(lex); // skip *
    } else if (lex->ch == '*' && lex->ne == '/') {
      adv(lex); // skip *
      adv(lex); // skip /
      depth--;
    } else {
      adv(lex);
    }
  }
}

In this implementation, we declare a structure loc_t for recording the location information of the comments and a variable stack for recording the locations of comments we encountered. We use the variable depth to access the elements of the stack in this example.

We do not need any further work to maintain the stack, but only need to record the locations of beginning markers. Because the stack is indexed by the variable depth. When the analyzer encounters the beginning marker, then the stack will be pushed the location indexed by the current depth. When the analyzer encounters the ending marker, then depth will be decreased by 1 — and since that, when the analyzer encounters a new beginning marker, and the new one will replace the old one, which is verified that it is well-terminated.

After this process, the remaining elements in the stack is always unterminated. So, we could simply iterate through the stack and report the location information for the error messages.

Notice that: because we use a stack-allocated stack to record the location of comments, so it will introduce a depth limitation, which is defined by the macro COMMENT_MAX_DEPTH, which is 8 in this implementation. In most cases, the depth of the comments will not be too big, so this limitation is enough for most usages. But whereas the previous implementation allows almost infinite depth, it is kind of annoying.

Conclusion

In this write-up, we have successfully explained the principles of handling the nested multiline comment, and built a nice handling of the exceptions.

We firstly used an accumulator to record the depth of the comments to check the termination of the comments. Then, we introduced two ways to report the exception of the unterminated comments, whose main point is that the recording of the locations of unterminated comments. The first one is that recording the outermost unterminated comments straightforward, because all unterminated nested multiline comments’ outermost comments are unterminated. But we found its limitation soon. So we introduced the second one: using a stack to record all unterminated comments’ locations with a subtle mechanism of the accumulator depth.


No right reserved; published in 2026.
Licensed under CC0 1.0 Universal, i.e., Public Domain. The license to this article was changed from CC BY 4.0 International to CC0 1.0 Universal, effective Feb. 2, 2026. This declaration takes effect at 00:00 (midnight) on Feb. 3, 2026.