INET Framework for OMNeT++/OMNEST
inet::PatternMatcher Class Reference

Glob-style pattern matching class, adopted to special OMNeT++ requirements. More...

#include <PatternMatcher.h>

Classes

struct  Elem
 

Public Member Functions

 PatternMatcher ()
 Constructor. More...
 
 PatternMatcher (const char *pattern, bool dottedpath, bool fullstring, bool casesensitive)
 Constructor. More...
 
 ~PatternMatcher ()
 Destructor. More...
 
void setPattern (const char *pattern, bool dottedpath, bool fullstring, bool casesensitive)
 Sets the pattern to be used by subsequent calls to matches(). More...
 
bool matches (const char *line)
 Returns true if the line matches the pattern with the given settings. More...
 
const char * patternPrefixMatches (const char *line, int suffixoffset)
 Similar to matches(): it returns non-nullptr iif (1) the pattern ends in a string literal (and not, say, '*' or '**') which contains the line suffix (which begins at suffixoffset characters of line) and (2) pattern matches the whole line, except that (3) in matching the pattern's last string literal, it is also accepted if line is shorter than the pattern. More...
 
std::string debugStr ()
 Returns the internal representation of the pattern as a string. More...
 
void dump ()
 Prints the internal representation of the pattern on the standard output. More...
 

Static Public Member Functions

static bool containsWildcards (const char *pattern)
 Utility function to determine whether a given string contains wildcards. More...
 

Private Types

enum  ElemType {
  LITERALSTRING = 0, ANYCHAR, COMMONCHAR, SET,
  NEGSET, NUMRANGE, ANYSEQ, COMMONSEQ,
  END
}
 

Private Member Functions

void parseSet (const char *&s, Elem &e)
 
void parseNumRange (const char *&s, Elem &e)
 
void parseLiteralString (const char *&s, Elem &e)
 
bool parseNumRange (const char *&str, char closingchar, long &lo, long &up)
 
std::string debugStrFrom (int from)
 
bool isInSet (char c, const char *set)
 
bool doMatch (const char *line, int patternpos, int suffixlen)
 

Private Attributes

std::vector< Elempattern
 
bool iscasesensitive = false
 
std::string rest
 

Detailed Description

Glob-style pattern matching class, adopted to special OMNeT++ requirements.

One instance represents a pattern to match.

Pattern syntax:

  • ? : matches any character except '.'
  • * : matches zero or more characters except '.'
  • ** : matches zero or more character (any character)
  • {a-z} : matches a character in range a-z
  • {^a-z} : matches a character NOT in range a-z
  • {32..255} : any number (ie. sequence of digits) in range 32..255 (e.g. "99")
  • [32..255] : any number in square brackets in range 32..255 (e.g. "[99]")
  • backslash \ : takes away the special meaning of the subsequent character

The "except '.'" phrases in the above rules apply only in "dottedpath" mode (see below).

There are three option switches (see setPattern() method):

  • dottedpath: dottedpath=yes is the mode used in omnetpp.ini for matching module parameters, like this: "**.mac[*].retries=9". In this mode mode, '*' cannot "eat" dot, so it can only match one component (module name) in the path. '**' can be used to match more components. (This is similar to e.g. Java Ant's usage of the asterisk.) In dottedpath=false mode, '*' will match anything.
  • fullstring: selects between full string and substring match. The pattern "ate" will match "whatever" in substring mode, but not in full string mode.
  • case sensitive: selects between case sensitive and case insensitive mode.

Rule details:

  • sets, negated sets: They can contain several character ranges and also enumeration of characters. For example: "{_a-zA-Z0-9}","{xyzc-f}". To include '-' in the set, put it at a position where it cannot be interpreted as character range, for example: "{a-z-}" or "{-a-z}". If you want to include '}' in the set, it must be the first character: "{}a-z}", or as a negated set: "{^}a-z}". A backslash is always taken as literal backslash (and NOT as escape character) within set definitions. When doing case-insensitive match, avoid ranges that include both alpha (a-zA-Z) and non-alpha characters, because they might cause funny results.
  • numeric ranges: only nonnegative integers can be matched. The start or the end of the range (or both) can be omitted: "{10..}", "{..99}" or "{..}" are valid numeric ranges (the last one matches any number). The specification must use exactly two dots. Caveat: "*{17..19}" will match "a17","117" and "963217" as well.

Member Enumeration Documentation

Enumerator
LITERALSTRING 
ANYCHAR 
COMMONCHAR 
SET 
NEGSET 
NUMRANGE 
ANYSEQ 
COMMONSEQ 
END 
81  {
82  LITERALSTRING = 0,
83  ANYCHAR,
84  COMMONCHAR, // any char except "."
85  SET,
86  NEGSET,
87  NUMRANGE,
88  ANYSEQ, // "**": sequence of any chars
89  COMMONSEQ, // "*": seq of any chars except "."
90  END
91  };
Definition: PatternMatcher.h:89
Definition: PatternMatcher.h:88
Definition: PatternMatcher.h:82
Definition: PatternMatcher.h:87
Definition: PatternMatcher.h:84
Definition: PatternMatcher.h:85
Definition: PatternMatcher.h:90
Definition: PatternMatcher.h:86
Definition: PatternMatcher.h:83

Constructor & Destructor Documentation

inet::PatternMatcher::PatternMatcher ( )

Constructor.

37 {
38 }
inet::PatternMatcher::PatternMatcher ( const char *  pattern,
bool  dottedpath,
bool  fullstring,
bool  casesensitive 
)

Constructor.

41 {
42  setPattern(pattern, dottedpath, fullstring, casesensitive);
43 }
std::vector< Elem > pattern
Definition: PatternMatcher.h:101
void setPattern(const char *pattern, bool dottedpath, bool fullstring, bool casesensitive)
Sets the pattern to be used by subsequent calls to matches().
Definition: PatternMatcher.cc:49
inet::PatternMatcher::~PatternMatcher ( )

Destructor.

46 {
47 }

Member Function Documentation

bool inet::PatternMatcher::containsWildcards ( const char *  pattern)
static

Utility function to determine whether a given string contains wildcards.

If it does not, a simple strcmp() might be a faster option than using PatternMatcher.

411 {
412  return strchr(pattern, '?') || strchr(pattern, '*') ||
413  strchr(pattern, '\\') || strchr(pattern, '{') ||
414  strstr(pattern, "..");
415 }
std::vector< Elem > pattern
Definition: PatternMatcher.h:101
std::string inet::PatternMatcher::debugStr ( )
inline

Returns the internal representation of the pattern as a string.

May be useful for debugging purposes.

171 { return debugStrFrom(0); }
std::string debugStrFrom(int from)
Definition: PatternMatcher.cc:193
std::string inet::PatternMatcher::debugStrFrom ( int  from)
private
194 {
195  std::string result;
196  for (int k = from; k < (int)pattern.size(); k++) {
197  Elem& e = pattern[k];
198  switch (e.type) {
199  case LITERALSTRING:
200  result = result + "\"" + e.literalstring + "\"";
201  break;
202 
203  case ANYCHAR:
204  result += "?!";
205  break;
206 
207  case COMMONCHAR:
208  result += "?";
209  break;
210 
211  case SET:
212  result = result + "SET(" + e.setchars + ")";
213  break;
214 
215  case NEGSET:
216  result = result + "NEGSET(" + e.setchars + ")";
217  break;
218 
219  case NUMRANGE: {
220  char buf[100];
221  sprintf(buf, "%ld..%ld", e.fromnum, e.tonum);
222  result += buf;
223  } break;
224 
225  case ANYSEQ:
226  result += "**";
227  break;
228 
229  case COMMONSEQ:
230  result += "*";
231  break;
232 
233  case END:
234  break;
235 
236  default:
237  ASSERT(0);
238  break;
239  }
240  result += " ";
241  }
242  return result;
243 }
Definition: PatternMatcher.h:89
Definition: PatternMatcher.h:88
Definition: PatternMatcher.h:82
Definition: PatternMatcher.h:87
std::vector< Elem > pattern
Definition: PatternMatcher.h:101
Definition: PatternMatcher.h:84
Definition: PatternMatcher.h:85
Definition: PatternMatcher.h:90
const value< double, units::C > e(1.602176487e-19)
Definition: PatternMatcher.h:86
Definition: PatternMatcher.h:83
const double k
Definition: QAM16Modulation.cc:24
bool inet::PatternMatcher::doMatch ( const char *  line,
int  patternpos,
int  suffixlen 
)
private

Referenced by matches(), and patternPrefixMatches().

259 {
260  while (true) {
261  Elem& e = pattern[k];
262  long num; // case NUMRANGE
263  int len; // case LITERALSTRING
264  switch (e.type) {
265  case LITERALSTRING:
266  len = e.literalstring.length();
267  // special case: last string literal with prefix match: allow s to be shorter
268  if (suffixlen > 0 && k == (int)pattern.size() - 2)
269  len -= suffixlen;
270  // compare
271  if (iscasesensitive ?
272  strncmp(s, e.literalstring.c_str(), len) :
273  strncasecmp(s, e.literalstring.c_str(), len)
274  )
275  return false;
276  s += len;
277  break;
278 
279  case ANYCHAR:
280  if (!*s)
281  return false;
282  s++;
283  break;
284 
285  case COMMONCHAR:
286  if (!*s || *s == '.')
287  return false;
288  s++;
289  break;
290 
291  case SET:
292  if (!*s)
293  return false;
294  if (!isInSet(*s, e.setchars.c_str()))
295  return false;
296  s++;
297  break;
298 
299  case NEGSET:
300  if (!*s)
301  return false;
302  if (isInSet(*s, e.setchars.c_str()))
303  return false;
304  s++;
305  break;
306 
307  case NUMRANGE:
308  if (!opp_isdigit(*s))
309  return false;
310  num = atol(s);
311  while (opp_isdigit(*s))
312  s++;
313  if ((e.fromnum >= 0 && num < e.fromnum) || (e.tonum >= 0 && num > e.tonum))
314  return false;
315  break;
316 
317  case ANYSEQ:
318  // potential shortcuts: if pattern ends in ANYSEQ, rest of the input
319  // can be anything; if pattern ends in ANYSEQ LITERAL, it's enough if
320  // input ends in the literal string
321  if (k == (int)pattern.size() - 2)
322  return true;
323  if (k == (int)pattern.size() - 3 && pattern[k + 1].type == LITERALSTRING)
324  return opp_stringendswith(s, pattern[k + 1].literalstring.c_str());
325 
326  // general case
327  while (true) {
328  if (doMatch(s, k + 1, suffixlen))
329  return true;
330  if (!*s)
331  return false;
332  s++;
333  }
334  break; // at EOS
335 
336  case COMMONSEQ:
337  while (true) {
338  if (doMatch(s, k + 1, suffixlen))
339  return true;
340  if (!*s || *s == '.')
341  return false;
342  s++;
343  }
344  break;
345 
346  case END:
347  return !*s;
348 
349  default:
350  ASSERT(0);
351  break;
352  }
353  k++;
354  ASSERT(k < (int)pattern.size());
355  }
356 }
Definition: PatternMatcher.h:89
uint16_t len
Definition: TCP_NSC.cc:85
Definition: PatternMatcher.h:88
Definition: PatternMatcher.h:82
Definition: PatternMatcher.h:87
bool opp_isdigit(unsigned char c)
Definition: PatternMatcher.cc:27
std::vector< Elem > pattern
Definition: PatternMatcher.h:101
Definition: PatternMatcher.h:84
Definition: PatternMatcher.h:85
Definition: PatternMatcher.h:90
const value< double, units::C > e(1.602176487e-19)
bool doMatch(const char *line, int patternpos, int suffixlen)
Definition: PatternMatcher.cc:258
Definition: PatternMatcher.h:86
value< double, units::s > s
Definition: Units.h:1049
bool isInSet(char c, const char *set)
Definition: PatternMatcher.cc:245
bool iscasesensitive
Definition: PatternMatcher.h:102
Definition: PatternMatcher.h:83
const double k
Definition: QAM16Modulation.cc:24
void inet::PatternMatcher::dump ( )
inline

Prints the internal representation of the pattern on the standard output.

May be useful for debugging purposes.

177 { printf("%s", debugStr().c_str()); }
std::string debugStr()
Returns the internal representation of the pattern as a string.
Definition: PatternMatcher.h:171
bool inet::PatternMatcher::isInSet ( char  c,
const char *  set 
)
private

Referenced by doMatch().

246 {
247  ASSERT((strlen(set) & 1) == 0);
248  if (!iscasesensitive)
249  c = opp_toupper(c); // set is already uppercase here
250  while (*set) {
251  if (c >= *set && c <= *(set + 1))
252  return true;
253  set += 2;
254  }
255  return false;
256 }
char opp_toupper(unsigned char c)
Definition: PatternMatcher.cc:28
const value< double, compose< units::m, pow< units::s,-1 > > > c(299792458)
bool iscasesensitive
Definition: PatternMatcher.h:102
bool inet::PatternMatcher::matches ( const char *  line)

Returns true if the line matches the pattern with the given settings.

See setPattern().

Referenced by inet::ospf::OSPFConfigReader::loadConfigFromXML(), and inet::IPv4RoutingTable::updateNetmaskRoutes().

359 {
360  ASSERT(pattern[pattern.size() - 1].type == END);
361 
362  // shortcut: omnetpp.ini keys often begin with "*" or "**"
363  // but end in a string literal. So it's usually a performance win to
364  // to first check that the last string literal of the pattern matches
365  // the end of the string. (We do the shortcut only in the case-sensitive
366  // case. omnetpp.ini is case sensitive.)
367 
368  if (pattern.size() >= 2 && iscasesensitive) {
369  Elem& e = pattern[pattern.size() - 2];
370  if (e.type == LITERALSTRING) {
371  // return if last 2 chars don't match
372  int pattlen = e.literalstring.size();
373  int linelen = strlen(line);
374  if (pattlen >= 2 && linelen >= 2 && (line[linelen - 1] != e.literalstring.at(pattlen - 1) ||
375  line[linelen - 2] != e.literalstring.at(pattlen - 2))) //FIXME why doesn't work for pattlen==1 ?
376  return false;
377  }
378  }
379 
380  // perform full-blown pattern matching
381  return doMatch(line, 0, 0);
382 }
Definition: PatternMatcher.h:82
std::vector< Elem > pattern
Definition: PatternMatcher.h:101
Definition: PatternMatcher.h:90
const value< double, units::C > e(1.602176487e-19)
bool doMatch(const char *line, int patternpos, int suffixlen)
Definition: PatternMatcher.cc:258
bool iscasesensitive
Definition: PatternMatcher.h:102
void inet::PatternMatcher::parseLiteralString ( const char *&  s,
Elem e 
)
private

Referenced by setPattern().

149 {
150  e.type = LITERALSTRING;
151  while (*s && *s != '?' && *s != '{' && *s != '*') {
152  long dummy;
153  const char *s1;
154  if (*s == '\\')
155  e.literalstring += *(++s);
156  else
157  e.literalstring += *s;
158  if (*s == '[' && parseNumRange((s1 = s), ']', dummy, dummy))
159  break;
160  s++;
161  }
162 }
Definition: PatternMatcher.h:82
const value< double, units::C > e(1.602176487e-19)
value< double, units::s > s
Definition: Units.h:1049
void parseNumRange(const char *&s, Elem &e)
void inet::PatternMatcher::parseNumRange ( const char *&  s,
Elem e 
)
private

Referenced by parseLiteralString(), and setPattern().

bool inet::PatternMatcher::parseNumRange ( const char *&  str,
char  closingchar,
long &  lo,
long &  up 
)
private
165 {
166  //
167  // try to parse "[n..m]" or "{n..m}" and return true on success.
168  // str should point at "[" or "{"; on success return it'll point to "]" or "}",
169  // and on failure it'll be unchanged. n and m will be stored in lo and up.
170  // They are optional -- if missing, lo or up will be set to -1.
171  //
172  lo = up = -1L;
173  const char *s = str + 1; // skip "[" or "{"
174  if (opp_isdigit(*s)) {
175  lo = atol(s);
176  while (opp_isdigit(*s))
177  s++;
178  }
179  if (*s != '.' || *(s + 1) != '.')
180  return false;
181  s += 2;
182  if (opp_isdigit(*s)) {
183  up = atol(s);
184  while (opp_isdigit(*s))
185  s++;
186  }
187  if (*s != closingchar)
188  return false;
189  str = s;
190  return true;
191 }
bool opp_isdigit(unsigned char c)
Definition: PatternMatcher.cc:27
value< double, units::s > s
Definition: Units.h:1049
void inet::PatternMatcher::parseSet ( const char *&  s,
Elem e 
)
private

Referenced by setPattern().

113 {
114  s++; // skip "{"
115  e.type = SET;
116  if (*s == '^') {
117  e.type = NEGSET;
118  s++;
119  }
120  // Note: to make "}" part of the set, it must be first within the braces
121  const char *sbeg = s;
122  while (*s && (*s != '}' || s == sbeg)) {
123  char range[3];
124  range[2] = 0;
125  if (*(s + 1) == '-' && *(s + 2) && *(s + 2) != '}') {
126  // store "A-Z" as "AZ"
127  range[0] = *s;
128  range[1] = *(s + 2);
129  s += 3;
130  }
131  else {
132  // store "X" as "XX"
133  range[0] = range[1] = *s;
134  s++;
135  }
136  if (!iscasesensitive) {
137  // if one end of range is alpha and the other is not, funny things will happen
138  range[0] = opp_toupper(range[0]);
139  range[1] = opp_toupper(range[1]);
140  }
141  e.setchars += range;
142  }
143  if (!*s)
144  throw cRuntimeError("unmatched '}' in expression");
145  s++; // skip "}"
146 }
char opp_toupper(unsigned char c)
Definition: PatternMatcher.cc:28
Definition: PatternMatcher.h:85
const value< double, units::C > e(1.602176487e-19)
Definition: PatternMatcher.h:86
value< double, units::s > s
Definition: Units.h:1049
bool iscasesensitive
Definition: PatternMatcher.h:102
const char * inet::PatternMatcher::patternPrefixMatches ( const char *  line,
int  suffixoffset 
)

Similar to matches(): it returns non-nullptr iif (1) the pattern ends in a string literal (and not, say, '*' or '**') which contains the line suffix (which begins at suffixoffset characters of line) and (2) pattern matches the whole line, except that (3) in matching the pattern's last string literal, it is also accepted if line is shorter than the pattern.

If the above conditions hold, it returns the rest of the pattern. The returned pointer is valid until the next call to this method.

This method is used by cIniFile's getEntriesWithPrefix(), used e.g. to find RNG mapping entries for a module. For that, we have to find all ini file entries (keys) like "net.host1.gen.rng-NN" where NN=0,1,2,... In cIniFile, every entry is a pattern ("**.host*.gen.rng-1", "**.*.gen.rng-0", etc.). So we'd invoke patternPrefixMatches("net.host1.gen.rng-", 13) (i.e. suffix=".rng-") to find those entries (patterns) which can expand to "net.host1.gen.rng-0", "net.host1.gen.rng-1", etc.

See matches().

385 {
386  if (!iscasesensitive)
387  throw cRuntimeError("PatternMatcher: patternPrefixMatches() doesn't support case-insensitive match");
388 
389  // pattern must end in a literal string...
390  ASSERT(pattern[pattern.size() - 1].type == END);
391  if (pattern.size() < 2)
392  return nullptr;
393  Elem& e = pattern[pattern.size() - 2];
394  if (e.type != LITERALSTRING)
395  return nullptr;
396 
397  // ...with the suffixlen characters at the end of 'line'
398  const char *pattstring = e.literalstring.c_str();
399  const char *p = strstr(pattstring, line + suffixoffset);
400  if (!p)
401  return nullptr;
402  p += strlen(line + suffixoffset);
403  rest = p;
404  int pattsuffixlen = e.literalstring.size() - (p - pattstring);
405 
406  // pattern, if we cut off the 'rest', must exactly match 'line'
407  return doMatch(line, 0, pattsuffixlen) ? rest.c_str() : nullptr;
408 }
Definition: PatternMatcher.h:82
std::string rest
Definition: PatternMatcher.h:104
std::vector< Elem > pattern
Definition: PatternMatcher.h:101
Definition: PatternMatcher.h:90
const value< double, units::C > e(1.602176487e-19)
bool doMatch(const char *line, int patternpos, int suffixlen)
Definition: PatternMatcher.cc:258
bool iscasesensitive
Definition: PatternMatcher.h:102
void inet::PatternMatcher::setPattern ( const char *  pattern,
bool  dottedpath,
bool  fullstring,
bool  casesensitive 
)

Sets the pattern to be used by subsequent calls to matches().

See the general class description for the meaning of the rest of the arguments. Throws cException if the pattern is bogus.

Referenced by PatternMatcher().

50 {
51  pattern.clear();
52  iscasesensitive = casesensitive;
53 
54  // "tokenize" pattern
55  const char *s = patt;
56  while (*s != '\0') {
57  Elem e;
58  switch (*s) {
59  case '?':
60  e.type = dottedpath ? COMMONCHAR : ANYCHAR;
61  s++;
62  break;
63 
64  case '[':
65  if (pattern.empty() || pattern.back().type != LITERALSTRING || !parseNumRange(s, ']', e.fromnum, e.tonum))
66  parseLiteralString(s, e);
67  else
68  e.type = NUMRANGE;
69  break;
70 
71  case '{':
72  if (parseNumRange(s, '}', e.fromnum, e.tonum)) {
73  e.type = NUMRANGE;
74  s++;
75  }
76  else
77  parseSet(s, e);
78  break;
79 
80  case '*':
81  if (*(s + 1) == '*') {
82  e.type = ANYSEQ;
83  s += 2;
84  }
85  else {
86  e.type = dottedpath ? COMMONSEQ : ANYSEQ;
87  s++;
88  }
89  break;
90 
91  default:
92  parseLiteralString(s, e);
93  break;
94  }
95  pattern.push_back(e);
96  }
97 
98  if (!fullstring) {
99  // for substring match, we add "**" at both ends of the pattern (unless already there)
100  Elem e;
101  e.type = ANYSEQ;
102  if (pattern.empty() || pattern.back().type != ANYSEQ)
103  pattern.push_back(e);
104  if (pattern.front().type != ANYSEQ)
105  pattern.insert(pattern.begin(), e);
106  }
107  Elem e;
108  e.type = END;
109  pattern.push_back(e);
110 }
Definition: PatternMatcher.h:89
Definition: PatternMatcher.h:88
Definition: PatternMatcher.h:82
Definition: PatternMatcher.h:87
void parseSet(const char *&s, Elem &e)
Definition: PatternMatcher.cc:112
std::vector< Elem > pattern
Definition: PatternMatcher.h:101
void parseLiteralString(const char *&s, Elem &e)
Definition: PatternMatcher.cc:148
Definition: PatternMatcher.h:84
Definition: PatternMatcher.h:90
const value< double, units::C > e(1.602176487e-19)
value< double, units::s > s
Definition: Units.h:1049
bool iscasesensitive
Definition: PatternMatcher.h:102
void parseNumRange(const char *&s, Elem &e)
Definition: PatternMatcher.h:83

Member Data Documentation

bool inet::PatternMatcher::iscasesensitive = false
private
std::vector<Elem> inet::PatternMatcher::pattern
private
std::string inet::PatternMatcher::rest
private

Referenced by patternPrefixMatches().


The documentation for this class was generated from the following files: