Skip to content

Commit

Permalink
Fixing dependency check and adding more regex
Browse files Browse the repository at this point in the history
  • Loading branch information
jah12014 committed Apr 10, 2018
1 parent e0069ca commit d4bcffe
Show file tree
Hide file tree
Showing 52 changed files with 132,998 additions and 2,214 deletions.
51 changes: 51 additions & 0 deletions IncrementalMinimization/regex/PowerEN_PME/README
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
PowerEN PME Synthetic Workloads
Timothy Heil
timothy.heil@us.ibm.com
IBM Corporation
March 20th, 2012


OVERVIEW

This distribution contains workloads used to drive the Pattern
Matching Engine (PME) of the PowerEN processor. Each workload
consist of:

* A set of regular expression patterns
* A set of search traces

The patterns represent the patterns searched for. The search traces
represent the data to be searched.

DIRECTORY HIERARCHY

./simple : Simple patterns - all fixed strings.
./cmplex : More complex regular expressions.

./(simple|cmplex)/single_ctx : All patterns in one contex.
./(simple|cmplex)/multi_ctx : Multiple contexts, each with 1000 patterns.

./(simple|cmplex)/(single_ctx|multi_ctx)/patterns : *.pat pattern files.
./(simple|cmplex)/(single_ctx|multi_ctx)/traces : *.strace.gz compressed
search trace files

PATTERN FILE FORMAT

The *.pat file format is fairly straightforward. Patterns use only
normal well-understood regular expression syntax.

Some pattern files use the Options keyword.

Option s : "." construct matches any character, rather than any
character other than carriage-return and line-feed.

Option i : Case-insensitive pattern

SEARCH TRACE FORMAT

The *.strace.gz files are (when uncompressed) a simple ASCII readable
format. See ./Search_Trace_Format.txt for details.

The search traces provided here do not use many of the features of the
full trace format, as described in Search_Trace_Format.txt.

249 changes: 249 additions & 0 deletions IncrementalMinimization/regex/PowerEN_PME/Search_Trace_Format.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,249 @@
Search Trace Format
Timothy H. Heil
March 16, 2012

INTRODUCTION

Search traces represent a sequence of search commands over data. The
trace specifies:

* Data to be searched, including multiple searches on the same data
* Which contexts (sets of patterns) to search for
* Dependencies between searches in the form of flows

To define a search, the trace first defines one or more buffers
of data, and then defines searches on that data. A search may
be over part or all of one buffer.

Flows represent the continuation of a search on a new buffer at a
later time. Search state is saved at the end of one search,
and restored for the next search in the same flow. The search results
are identical to performing all searches in the flow in a single
long search.

To define a flow, the trace first defines a flow ID. The trace
then defines multiple searches on that flow.

GENERAL FORMAT

See the EXAMPLE FILE at the end of this document for a quick overview
of the format.

The file format is an ASCII readable format that contains one
"command" per line. The first character of the line defines the
command.

The file format is completely case-insensitive.

A hash sign ('#') starts a comment. A comment may contain any ASCII
codes, except carriage return and new line.

Blank lines are ignored.

The maximum legal line length is 1024 bytes, including terminating
carriage return and line feed.

All identifiers are C-like identifiers, made up of letters, digits,
and the underscore ('_'). Identifiers may not start with a digit.

ID's are case insensitive. All identifiers must be 128B long or less.

Flow IDs and Buffer IDs are different name spaces. It is legal to
have flows and buffers with the same name.

IDs may be reused after being deleted.

All integers are unsigned 64b values. The legal range is zero to
0xffffffffffffffff = 18446744073709551615 inclusive. All integers
are expressed in decimal.

Command fields are separated by one or more spaces or tabs.

Whitespace is permitted at the start and end of each line.

COMMANDS

V <Version>

Indicates the file format and version.

<Version> follows ID syntax.

This must be the first line of the file.
The current and only legal version is "SEARCH_TRACE_1_0"
Like everything else, the version is case insensitive.
The trace reader should make sure the version string is a version/format
that it understands.

The V command can occur multiple times in a file. This allows
traces to be concatonated easily. This implies, in theory, that
the format version can change in the middle of a file.

B <Buffer_ID>

Define a new data buffer

Buffer_ID may be '-', in which case it is an "anonymous" buffer.
Anonymous buffers can only be referenced by the "-" feature in the
search commands (S, N, T). Anonymous buffers are deleted
automatically as soon as the next buffer is defined.

Q <Buffer_ID>

Delete a buffer

This should be done as soon as the writer knows there are no more
searches on a named buffer. This allows the reader to free the
memory used for the buffer.

Anonymous buffers cannot be explicitly deleted with the Q command.

D <Data...>

Specify data

The data bytes in the buffer are specified following the B command.
Any other command terminates the list of data bytes for the buffer.
The size of the buffer is implied by the amount of data in the
following D commands. D commands may only be placed after a B
command.

Bytes are encoded as pairs of hex digits. Hex digits must come in
pairs -- small ASCII values may not be encoded with a single digit.

One or more bytes can be specified per line, up to the line size limit.
The pairs can be separated by spaces/tabs, but this is not necessary.

The following conventions make files more readable, but are not
required for correctness:

* 16 bytes per D command (excepting the final one for a buffer).
* One space between every bytes
* Place a comment at the end of the line giving the
ASCII form of the bytes.

Note : Zero byte buffers are allowed (no D commands), but zero-byte
D commands are not.

F <Flow_ID>

Define a new flow

The flow ID can be referenced by later search commands. See N and T
below.

S <context> <start> <len> (<Buffer_ID>|-)

Search - Not part of flow

<context> is the context to search. It may be an integer or an ID.

<start> and <len> define the position and length of the search within
the buffer. Both are decimal integers.

The first byte in the buffer is 0. The searched range must be
within the buffer. Zero-byte scans (<len> = 0) are allowed.

If buffer ID is "-", then the most recently defined buffer (anonymous
or not) is used. Note that it is an error to use "-" if the most
recently defined buffer has been deleted with the Q command.

Search state is neither restored nor saved, since the search
is not part of a flow.

N <context> <start> <len> (<Buffer_ID>|-) <Flow_ID>

Search - Next search in flow.

If Flow_ID has not been defined, this command will define it.
No explicit F command is required.

If this is not the first search in the flow, search state will be
restored from the flow ID.

Search state will be saved to the flow ID.

T <context> <start> <len> (<Buffer_ID>|-) <Flow_ID>

Search - Terminate flow.

Continue searching in a flow, and terminate the flow. No more
searches will be allowed to the flow. This allows the trace reader
to deallocate state associated with the flow. It also allows
end-anchored ('$') patterns to be properly reported.

The Flow_ID must be defined or it is an error. It is illegal to
define and terminate a flow in the same search. This would be
equivalent to S, which should be used instead.

If this is not the first search in the flow, search state will be
restored from the flow ID.

Search state will not be saved.

The flow ID will be deleted.

X Flow_ID

Terminate flow

The flow ID will be deleted.

Most applications are expected to use T. In some cases, the writer
may not know until later that a flow terminates, or a flow may
terminate unexpectedly.

Note that end-anchored patterns ('$') would not be reported when X is
used. It maybe more appropriate to use a 0-length T command to
terminate the flow.

EXAMPLE FILE

=========================================================================
V SEARCH_TRACE_1_0 # Must have the version line on the first line.

# Define an anonymous 10B Buffer
B -
D 50 30 43 51 55 44 44 43 52 48 # P0CQUDDCRH

# Scan the first 5B with context C1
S C1 0 5 -

# Scan the second 5B with context C2
S C2 5 5 -

#-------------------------------------------------------------------------

B B1 # Define a buffer named B1, containing 32B
D 44495674 57745851 # DIVtWtXQ
D 725a5356 4c513342 # rZSVLQ3B
D 49234b61 6d724334 # I.KamrC4
D 36796437 67786244 # 6yd7gxbD

B B2 # Define a second buffer B2, containing 20B
D 4b4f3176 6e56626f 3959695a 7946666d # KO1vnVbo9YiZyFfm
D 306c616c # 0lal

# Scan with contexts 100 and 101

s 101 0 32 b1
s 101 0 20 b2
s 102 0 32 b1
s 102 0 20 - # '-' = B2, the most recently defined buffer

#-------------------------------------------------------------------------

F Flow_1 # Define flow Flow_1
N 201 0 32 b1 Flow_1 # Search B1

N 202 0 32 B1 Flow_2 # Implicit definition of Flow_2

T 201 0 20 b2 Flow_1 # Continue searching B2 and terminate Flow 1

X Flow_2 # Delayed/unexpected termination of Flow_2

# Delete buffers
Q B1
Q B2

=========================================================================
Loading

0 comments on commit d4bcffe

Please sign in to comment.