|
The CRL2 framework uses a number of standard attributes for exchanging information. This document is an initial approach to document which attributes exist, what they mean, which applications writes them, and which one uses them.
Note that some attributes may be special, i.e., the CRL2 library takes special care of them. This may either be due to the need for interpretation in the library, or because there is special CRL file syntax for them (usually to improve readability), or because the attributes are a special conceptual part of the CRL structure (e.g. the address of an instruction).
For special attributes, the CRL2 library provides member functions for direct access of the attribute values. These should be preferred over the generic Structure::find_sym() interface (although there is transparent access to them via that interface, too).
Please check the Items' documentation about which special attributes exist. Typically, there is a slot for storing the special attribute.
When using the generic Structure::find_sym interface to access special attributes, you must use the result casting versions that output the correct type (e.g. find_sym_bool) and not those that return a generic CrlValue*. This is because the CRL2 library interface may change in the future without prior notice so that only the result casting versions return a value while the generic ones fail. This will be included in future library versions to improve overall performance and to avoid generating Values on the fly.
In the following, attributes are often specified to be ranges. Do not rely on this exact type! For efficiency reasons, a plain scalar number (Unsigned, Signed or Float) may be used. So always use the generic interface ValueNumeric::minimum()/maximum() to retrieve the value instead of casting ta ValueRange!
Many of the user attributes will have AIS annotations that lead to those attributes. These are meant to be examples. For a full AIS syntax specification, please refer to the AbsInt manuals.
In CRL2, resource information at operations can be stored hierarchically to reflect the nature of many processors to have complex operands. Typically, each operand is assigned an extension. For further clean-up, extensions may be nested, so the full description structure of an operation is a tree.
Extensions are stored in the attribute 'ext', which is a vector, each element representing one extension.
Each extension in the vector can have attributes just like any other item in CRL2, therefore, it is implemented as a map from Symbol keys to arbitrary CRL2 Values. Thus, the vector of extension contains Maps as elements.
In the following, attributes will be listed for every item. Those attributes that can occur at operations as well as in extensions will be listed in a seperate section.
These are used across architectures and analyses and these are present directly after decoding, i.e., exec2crl produces them.
These attributes may be found at operations and in extensions. Thus they constitute the hierarchical information stored at operations.
| Name | Type(s) | Description | |||
|---|---|---|---|---|---|
| op_id | Unsigned | A unique numerical identification of the operation. It can be used to map semantics to the operation in an analysis. For the TF14Net framework, which is used by AbsInt to define machine decoders and to specify which attributes the decoders generate, this attribute has a special meaning: it represents the bits in the machine instruction. Together with a significance mask, this is used to decode operations. Insignificant bits are occasionally used to make the 'op_id' unique. | |||
| genname | Symbol | A symbolic identification of the operation. It need not be unique, but it is defined to be identify a conceptual class of operation, like a family of 'add' instructions. This makes it easier for many analyses to quickly map semantics, taking the precise information from the other attributes. An analysis can choose freely whether the 'op_id' or the 'genname' is better suited for mapping semantics, and it may even choose to use a combined approach. | |||
| ext | Vector of Map of Symbol to Value | The list of extensions of an operation | |||
| src | Vector of Symbol or Numeric | Source resources of an operation. If this is a Symbol, then it marks a register or an abstract resource like memory ('Mem'), stack ('Stack') or cache ('Cache'), or the whole register file, etc. If this attribute value is a numeric, it means the operation reads a constant. An 'src' resource that is also written will be found in the 'dst' Vector at the same index. | |||
| dst | Vector of Symbol | Destination resources of an operation. The values are 'src', only the constants are, due to they nature, not allowed here, but only non-constant resources may be specified. A 'dst' resource that is also read will be found in the 'dst' Vector at the same index. | |||
| op | Vector of Symbol or Numeric | Any operand resource of an operation. The indices correspond to the 'src' and 'dst' Vectors. Further, in rare cases, there can be operands that exist conceptually, or are listed for consistency reasons. It these cases in may happen that only the 'op' entry exists, but neither an 'src' nor 'dst' resource. | |||
| mnemonic | String | A string to be shown to framework users. It gives a human-readable textual representation of the operation that show in no case be used to get information about the operation in analyses. It is solely for humans. | |||
| assembly | String | A template for making the mnemonic: all operands/extension are represented as a dollar sign, possibly followed by a number in braces to give an explicit index. These dollars have to be replaced with the assembly string of the corresponding operand/extension to get the mnemonic. This is currently not used in the framework, but is thought to be useful when changing the CRL2 structure to provide the human user with an updated 'mnemonic' string. |
| Name | Type(s) | Description | |||
|---|---|---|---|---|---|
| cat | Map of Symbol to Boolean or Unsigned | A set of simple categorisations of the machine instruction. This is sometimes used by analyses to find out simple facts about the operation. The available keys are: 'boring', 'branch', 'call', 'return' which define the instruction class wrt. the CFG. 'call_conditional', 'immediate_return' for additional classification of a call. 'taken', 'computed', 'predictable' for further classification of branches, calls and returns. These are all booleans. Further, there are the unsigned values 'mem_read' and 'mem_write' which count the number of read and write accesse operation. This is redundant information which is retrieved from the src/dst attributes by exec2crl. |
| Name | Type(s) | Description | |||
|---|---|---|---|---|---|
| address | Address | The instruction address in a linear form. This is the bus address of the instruction before address translation by MMUs. I.e., for paged architectures, this is the address that combines page and base addresses in the way the manual states it. It is not necessarily the bit-wise concatenation of the two, but the address as seen on the (conceptual) bus before address translation. This address always counts in bytes, so for architectures that count in words (e.g. C33), this address is a multiple of the surface address. This address is called the 'linear' address throughout this | |||
| surface_address | String | A string for the human user that represents the surface address of this instruction when residering the programming level. For paged architectures, this contains both the page and the base part, separated by a colon. For multiple instruction set archituctures, the iset of this instruction is prepended with a double colon. Never interpret the contents of the string in analyses! This is for the human user only! | |||
| page | Address | For paged architectures: the page part of the surface address. | |||
| base_address | Address | For paged architectures: the base part of the surface address. | |||
| instructions_set | Symbol | For multiple instruction set architectures: the instruction set of this instruction. | |||
| width | Unsigned | The instruction width in bytes | |||
| bytes | Vector of Unsigned | The bytes the instruction is composed of. This is the raw machine code. Note that in the future, this attribute might use a specialised vector that can store the bytes more efficiently, so use nth_byte() to get them, instead of nth()->get_byte(). |
| Name | Type(s) | Description | |||
|---|---|---|---|---|---|
| address | Address | Like for instructions. Note that empty blocks still have an address. It is assigned conceptually by closest correspondance and is likely (but not necessary) to be equal to an address of some non-empty block. | |||
| surface_address | String | Like for instructions. | |||
| page | Address | For paged architectures: the page part of the surface address. | |||
| base_address | Address | For paged architectures: the base part of the surface address. | |||
| instructions_set | Symbol | For multiple instruction set architectures: the instruction set of this instruction. | |||
| block_type | Symbol | At some nodes, defines what kind of block this is. The following
values may be found here:
| |||
| delayed | Boolean | This marks a delayed branch instruction. The block does not contain the edges of the branch, but only a fall-through edge into the delay block. | |||
| no_calls | Boolean | Marks the presence of a call that has no targets. | |||
| more_calls | Boolean | Marks the presence of more, yet unknown call edges in the CFG. This is an important information for most analyses, since they get to know that they don't see the whole reality here. Some can take special care, some will fail (e.g. a stack analysis). | |||
| more_branches | Boolean | Marks the presence of more, yet unknown branch edges in the CFG. This is very sad information: most analyses will have to fail as almost anything might happen. | |||
| implicit_calls | Vector of Routine | Marks the presence of routine entries that are revealed in this block. This block does not invoke those routines itself, but somewhere else in the program, an unresolved call may call such a routine. A typical example of a call that has this attribute is the atexit() system call, whose only argument is a function pointer eventually invoked somewhere else in the program. | |||
| orig_calls | Vector of Routine | Contains an edge to a routine that is called by the block which is not the logical control flow. That logical control flow is represented by a standard edge. This situation happens when the architecture uses a routine to implement a call, which is often the case on small architectures for implementing computed calls: inlined code would be too large, so a routine implements it. routine is then found in this attribute, while the implemented target is found by traversing the stardard call edge. | |||
| implicit_branches | Vector of Block | Like 'implicit_calls' for branches. | |||
| orig_branches | Vector of Routine | Like 'orig_calls' for branches. Will be filled in with the original branch targets when the user overrides them. |
| Name | Type(s) | Description | |||
|---|---|---|---|---|---|
| address | Address | Like for instructions. Note that empty routines still have an address. It is assigned conceptually by closest correspondance and is likely (but not necessary) to be equal to an address of some non-empty block. | |||
| surface_address | String | Like for instructions. | |||
| page | Address | For paged architectures: the page part of the surface address. | |||
| base_address | Address | For paged architectures: the base part of the surface address. | |||
| instructions_set | Symbol | For multiple instruction set architectures: the instruction set of this instruction. | |||
| name | Symbol | The human readably name of this routine as extracted from e.g. an executable's symbol table. Do not use this to identify routines in analyses as the names need not be unique. Only if the user is to read or identify a routine should this attribute be used for reference. In CRL2, always use pointers to CRL2 structures for identification. | |||
| section | Symbol | The name of the section this routine was found in in the executable. | |||
| external | Boolean | If set, the routine's body is not part of the analyses. Also see 'reason' attribute. | |||
| reason | Symbol | For external routines: the reason why this is not included:
'EXCLUDED': the user has manually excluded the
routine from the analysis.
AIS:
routine X is not analysed 'EXTERNAL' the routine is not part of the executable or the user has forced the routine to be assumed not to be part of the executable: AIS: routine X is external | |||
| no_return | Boolean | This routine does not return to the caller. exit() is a typical candidate that has this attribute. |
These attributes are found at the graph.
| Name | Type(s) | Description | |||
|---|---|---|---|---|---|
| input_file_name | String | Name of the input file leading to this CRL2 file. | |||
| start | Vector of Routine | The start points selected for the coming analyses. |
Yet to be documented.
| Name | Type(s) | Description | |||
|---|---|---|---|---|---|
| infeasible | Symbol | This block is infeasible, i.e., it cannot be reached during program execution. This attribute may also be set by other analyses when new information is found, e.g. by a value analysis. exec2crl generates this from user annotations. |
| Name | Type(s) | Description | |||
|---|---|---|---|---|---|
| infeasible | Boolean | This routine in infeasible. This information may be either found automatically, or a user annotation might have been used. |
These attributes are found at the graph.
| Name | Type(s) | Description | |||
|---|---|---|---|---|---|
| errors | Unsigned | The number of errors that occured during decoding. | |||
| warnings | Unsigned | The number of warnings that occured during decoding. | |||
| reader_name | String | Name of the reader module used used, i.e., the file format. | |||
| decoder_name | String | Name of the decoder module, i.e., the architecture. | |||
| compiler_name | String | The name of the compiler that compiled the input file in case exec2crl could find that out. | |||
| paging | Boolean | Whether the architecture uses paged addresses. |
| Name | Type(s) | Description | |||
|---|---|---|---|---|---|
| mem_access | Vector of Ranges | List of address ranges the operation may access.
AIS:
instruction X accesses Y | |||
| mem_access_by_step | Vector of Vector of Ranges | Similar to 'mem_access', but distinguished by sub-step of
operation. This is interesting for same processors with
several stages in the memory interface.
AIS:
instruction X accesses Y in step Z | |||
| spec_mem_access | Vector of Ranges | Same as 'mem_access' for speculative accesses instruction X accesses Y speculatively | |||
| spec_mem_access_by_step | Vector of Vector of Ranges | Same as 'mem_access_by_step' for speculative accesses
AIS:
instruction X accesses Y in step Z speculatively | |||
| user_reg_values | Map of Symbol to Range | Maps registers to their value when entering this
operation.
AIS:
instruction X is entered with r1=Y, r2=Z, ...; |
user_* Value User defined additional attributes. The AIS names get a 'user_' prefix when moved into the CRL2 structure. | AIS: : instruction X features KEY=VAL, KEY=VAL, ...; +-------------------------+-----------+-----------------------------------------------------------+
| Name | Type(s) | Description | |||
|---|---|---|---|---|---|
| loop_annotations | Map of Symbol to Value | Read by the loop conversion: a map of attributes to be
moved to the routines resulting from loop conversion
of this block. All blocks of a loop routine are searched
for this attribute and all the found attributes are then
copied to the routine's attribute map.
The possible attributes that may occur here are:
'iteration_count', 'user_execution_time', and 'external'.
See the documentation of routine attributes for an
explanation.
Note that the loop conversion is usually performed by
exec2crl already.
AIS:
loop X iterates Y; | |||
| add_cycles | Unsigned | Number of cycles to add to the execution time of a given
block. This is read by the path analysis.
AIS:
instruction X additionally takes Y; |
| Name | Type(s) | Description | |||
|---|---|---|---|---|---|
| loop_annotations | Vector of Map of Symbol to Value | Like the block attribute 'loop_annotations' but applying to all loops found in the routine. Each element of the vector corresponds to one loop in the routine; they are sorted by linear address of the loop header (=entrance address inside the loop). | |||
| iteration_count | Range | The iteration count for a loop starting at this loop
routine. The value is a range to be able to specify
minimum and maximum.
AIS:
loop X iterates Y | |||
| user_execution_time | Range | A user annotated execution time for a loop (i.e., a
bound on the time, not the number of iterations).
This is currently not implemented.
This attribute is also used for excluded routines.
AIS:
routine X is not analysed and takes Y | |||
| default_read_access | Value of Range | Sets a default for accesses in this routine when no range
of memory access can be found out by an analysis.
AIS:
read default X from Y | |||
| default_write_access | Value of Range | Same as 'default_read_access' for write accesses.
AIS:
write default X to Y | |||
| restrict_read_access | Value of Range | Sets a restriction for accesses in this routine. All
access ranges found will be intersected with the ranges
given here.
AIS:
read restrict X to Y | |||
| restrict_write_access | Value of Range | Same as 'restrict_read_access' for write accesses.
AIS:
write restrict X to Y | |||
| default_loop_iteration | Range | If no loop iteration is found for loops in this routine,
then this value is used.
AIS:
loop iteration default X is Y | |||
| restrict_loop_iteration | Range | A restriction of loop iterations for all loops in this
routine.
AIS:
loop iteration restrict X is Y | |||
| incarnation_count | Range | How many times the routine may be alive at the same
time, i.e., how many stack frames of this routine may
exist in parallel.
AIS:
routine X incarnates Y | |||
| stack_usage | Unsigned | Stack usage of the given function.
AIS:
routine X uses Y bytes of stack | |||
| stack_effect | Unsigned | Stack difference of the given function.
AIS:
routine X leaves behind Y bytes of stack | |||
| cc_violation | Bool | Violation of calling conventions.
AIS:
routine X violates calling conventions | |||
| user_* | Value | User defined additional attributes. The AIS names get
a 'user_' prefix when moved into the CRL2 structure.
AIS:
routine X features KEY=VAL, KEY=VAL, ...; |
These attributes are found at the graph.
| Name | Type(s) | Description | |||
|---|---|---|---|---|---|
| clock_rate | Range | The clock rate of the processor.
AIS:
clock X | |||
| default_read_access | Value of Range | Sets a default for accesses for the whole graph.
Also see the routine attribute with the same name.
AIS:
global read default from Y | |||
| default_write_access | Value of Range | Same as 'default_read_access' for write accesses.
AIS:
global write default to Y | |||
| restrict_read_access | Value of Range | Sets a global restriction for accesses.
Also see the routine attribute with the same name.
AIS:
global read restrict to Y | |||
| restrict_write_access | Value of Range | Same as 'restrict_read_access' for write accesses.
AIS:
global write restrict to Y | |||
| default_loop_iteration | Range | See the routine attribute with the same name. The
global attribute works for the whole graph.
AIS:
global loop iteration default is Y | |||
| restrict_loop_iteration | Range | A restriction of loop iterations for all loops.
Also see the routine attribute with the same name.
AIS:
global loop iteration default is Y | |||
| flow_relations | >"Flow" | Flow relations to include in the ILP in Pathan. The
Attribute structure is a bit complex, so there is a
special Section "Flow".
AIS:
flow X / Y is Z | |||
| cache | >"Cache" | The user defined cache specification. This is quite complex, so there is a separate document for this. | |||
| mapping | >"PAG" | The mapping to be used for the analyses. This is a string representation that PAG can parse, so please refer to the Section "PAG". |
A flow relations is currently a relation of linear combination of program points, i.e., a sum of program point counts and constant coefficients, compared with =, <= or >=.
The linear combinations are simple represented as a vector. Each element is an Application with type INFIX and functor '*'. The relations are represented by an infix Application with one of the functors '>=', '<=' or '='.
If the relation holds for each context, not only for the sum over all contexts, then another prefix Application with the functor 'each' is applied.
These relations are arranged in a vector and are found in this form in the global 'flow_relations' attribute.
Here's an example:
CRL2:
[
([ i8 ] '>=' [ ([ 5 ] '*' [ i10 ]) ]),
([ i8 ] '<=' [ ([ 0xa ] '*' [ i10 ]) ]),
('each'[
([ i354 ] '>=' [ ([ 6 ] '*' [ i356 ]) ])
]),
('each'[
([ i354 ] '<=' [ ([ 0x16 ] '*' [ i356 ]) ])
])
]
FIXME: add more docu.
Here's an example:
CRL2:
mapping="VIVU-4,len=inf,def_unroll=2"
(Copied from the internal document exec2crl/doc/spec_format.txt)
lows setting the cache parameters for different levels of a cache. Each level may distinguish instruction, data and unified cache.
An example in AIS looks as follows:
cache
first level data
set-count = 128, associativity = 4, line-size = 16
and second level data
size = 262144, associativity = 8, line-size = 32;The general syntax is:
'cache'
Cache specifitation (non-empty, 'and'-separated list): : (
Cache Level (optional): : (
('first' | 'second' | 'third' | 'fourth' | 'fifth' ) 'level'
| 'level' INT
)?
If this is missing, 'first level' is assumed. :
Cache Scope (optional): : ( 'data' | 'instruction' | 'unified' )?
If this is missing, 'unified is assumed. :
Feature List (non-empty, comma separated list): : ( 'set-count' '=' INT
| 'associativity' '=' INT
| 'line-size' '=' INT
| 'line-count' '=' INT
| 'size' '=' INT
| 'policy' '=' ( 'plruppc' | 'lru' )
| 'may' '=' ( 'none' | 'empty' | 'chaos' )
| 'pers' '=' ( 'none' | 'empty' | 'chaos' )
| 'must' '=' ( 'true' | 'false' )
)','+
)'and'+The cache geometry must be specified by an arbitrary selection of the parameters 'set-count', 'associativity', 'line-size', 'line-count' and 'size' in such a way that the other parameters can be inferred.
From this specification, exec2crl generates a set of global attributes. The keys of the attributes have one of the following forms:
If the level is not given:
<SCOPE>cache_<FEATURE>
If the level is given:
level<LEVEL><SCOPE>cache<FEATURE>
Here, SCOPE may be:
| i | if the instruction cache was specified | ||
| d | if the data cache was specified | ||
| '' | otherwise, i.e, unspecified, or specified as 'unified'. |
FEATURE may be:
| size | the total size of the cache in bytes -> the attribute value is an integer | ||
| line_size | the size of one cache line in bytes -> the attribute value is an integer | ||
| associativity | the level of associativity -> the attribute value is an integer | ||
| policy | the cache policy -> the attribute value is one of the following symbols: plruppc, lru | ||
| may | behaviour of may analysis: -> one of the following symbols: none, empty, chaos | ||
| persistence | behaviour of persistence analysis: -> one of the following symbols: none, empty, chaos | ||
| must | behaviour of must analysis: -> one of the following symbols: none, empty |
Example attribute:
CRL2:
cache=[
{ // level 1
data={
associativity=4,
line_size=0x100,
size=0x20000
policy='lru',
must='empty',
may='none',
pers='none',
}
instruction={
...
}
unified={
...
}
},
{ // level 2
data={
...
}
...
}
...
]Note that all attributes may be missing if they are not specified. Also note that there is no default for the level: instead, there are different attributes for the case where no level was specified. Maybe this should be changed and the default should be level 1.
You should decide yourself when to issue error messages and tell the behaviour to the master of documentation.
The attributes 'size', 'line_size' and 'associativity' are always specified together, i.e., either exec2crl specifies all, or none of them.
| Generated by erwin-cgen | © AbsInt Angewandte Informatik GmbH |