Top   Types   Functions   Classes   Index 

Basic Attributes


AIS
Extensions
Generic Attributes
Operation & Extension
Operation
Instruction
Block
Routine
Global
Data & Bytes & Info & Meta
AbsInt aiT/StackAnalyzer Attributes
Block
Routine
Global
AbsInt aiT/StackAnalyzer User Attributes
Operation
Block
Routine
Global
Flow
PAG
Cache

The CRL2 framework uses a number of standard attributes for exchanging information. This document is an initial approach to document which attributes exist, what they mean, which applications writes them, and which one uses them.

Note that some attributes may be special, i.e., the CRL2 library takes special care of them. This may either be due to the need for interpretation in the library, or because there is special CRL file syntax for them (usually to improve readability), or because the attributes are a special conceptual part of the CRL structure (e.g. the address of an instruction).

For special attributes, the CRL2 library provides member functions for direct access of the attribute values. These should be preferred over the generic Structure::find_sym() interface (although there is transparent access to them via that interface, too).

Please check the Items' documentation about which special attributes exist. Typically, there is a slot for storing the special attribute.

Note

When using the generic Structure::find_sym interface to access special attributes, you must use the result casting versions that output the correct type (e.g. find_sym_bool) and not those that return a generic CrlValue*. This is because the CRL2 library interface may change in the future without prior notice so that only the result casting versions return a value while the generic ones fail. This will be included in future library versions to improve overall performance and to avoid generating Values on the fly.

Further Note

In the following, attributes are often specified to be ranges. Do not rely on this exact type! For efficiency reasons, a plain scalar number (Unsigned, Signed or Float) may be used. So always use the generic interface ValueNumeric::minimum()/maximum() to retrieve the value instead of casting ta ValueRange!


AIS

Many of the user attributes will have AIS annotations that lead to those attributes. These are meant to be examples. For a full AIS syntax specification, please refer to the AbsInt manuals.


Extensions

In CRL2, resource information at operations can be stored hierarchically to reflect the nature of many processors to have complex operands. Typically, each operand is assigned an extension. For further clean-up, extensions may be nested, so the full description structure of an operation is a tree.

Extensions are stored in the attribute 'ext', which is a vector, each element representing one extension.

Each extension in the vector can have attributes just like any other item in CRL2, therefore, it is implemented as a map from Symbol keys to arbitrary CRL2 Values. Thus, the vector of extension contains Maps as elements.

In the following, attributes will be listed for every item. Those attributes that can occur at operations as well as in extensions will be listed in a seperate section.


Generic Attributes

These are used across architectures and analyses and these are present directly after decoding, i.e., exec2crl produces them.

Operation & Extension

These attributes may be found at operations and in extensions. Thus they constitute the hierarchical information stored at operations.

NameType(s)Description
op_idUnsignedA unique numerical identification of the operation. It can be used to map semantics to the operation in an analysis. For the TF14Net framework, which is used by AbsInt to define machine decoders and to specify which attributes the decoders generate, this attribute has a special meaning: it represents the bits in the machine instruction. Together with a significance mask, this is used to decode operations. Insignificant bits are occasionally used to make the 'op_id' unique.
gennameSymbolA symbolic identification of the operation. It need not be unique, but it is defined to be identify a conceptual class of operation, like a family of 'add' instructions. This makes it easier for many analyses to quickly map semantics, taking the precise information from the other attributes. An analysis can choose freely whether the 'op_id' or the 'genname' is better suited for mapping semantics, and it may even choose to use a combined approach.
extVector of Map of Symbol to ValueThe list of extensions of an operation
srcVector of Symbol or NumericSource resources of an operation. If this is a Symbol, then it marks a register or an abstract resource like memory ('Mem'), stack ('Stack') or cache ('Cache'), or the whole register file, etc. If this attribute value is a numeric, it means the operation reads a constant. An 'src' resource that is also written will be found in the 'dst' Vector at the same index.
dstVector of SymbolDestination resources of an operation. The values are 'src', only the constants are, due to they nature, not allowed here, but only non-constant resources may be specified. A 'dst' resource that is also read will be found in the 'dst' Vector at the same index.
opVector of Symbol or NumericAny operand resource of an operation. The indices correspond to the 'src' and 'dst' Vectors. Further, in rare cases, there can be operands that exist conceptually, or are listed for consistency reasons. It these cases in may happen that only the 'op' entry exists, but neither an 'src' nor 'dst' resource.
mnemonicStringA string to be shown to framework users. It gives a human-readable textual representation of the operation that show in no case be used to get information about the operation in analyses. It is solely for humans.
assemblyStringA template for making the mnemonic: all operands/extension are represented as a dollar sign, possibly followed by a number in braces to give an explicit index. These dollars have to be replaced with the assembly string of the corresponding operand/extension to get the mnemonic. This is currently not used in the framework, but is thought to be useful when changing the CRL2 structure to provide the human user with an updated 'mnemonic' string.

Operation

NameType(s)Description
catMap of Symbol to Boolean or UnsignedA set of simple categorisations of the machine instruction. This is sometimes used by analyses to find out simple facts about the operation. The available keys are: 'boring', 'branch', 'call', 'return' which define the instruction class wrt. the CFG. 'call_conditional', 'immediate_return' for additional classification of a call. 'taken', 'computed', 'predictable' for further classification of branches, calls and returns. These are all booleans. Further, there are the unsigned values 'mem_read' and 'mem_write' which count the number of read and write accesse operation. This is redundant information which is retrieved from the src/dst attributes by exec2crl.

Instruction

NameType(s)Description
addressAddressThe instruction address in a linear form. This is the bus address of the instruction before address translation by MMUs. I.e., for paged architectures, this is the address that combines page and base addresses in the way the manual states it. It is not necessarily the bit-wise concatenation of the two, but the address as seen on the (conceptual) bus before address translation. This address always counts in bytes, so for architectures that count in words (e.g. C33), this address is a multiple of the surface address. This address is called the 'linear' address throughout this
surface_addressStringA string for the human user that represents the surface address of this instruction when residering the programming level. For paged architectures, this contains both the page and the base part, separated by a colon. For multiple instruction set archituctures, the iset of this instruction is prepended with a double colon. Never interpret the contents of the string in analyses! This is for the human user only!
pageAddressFor paged architectures: the page part of the surface address.
base_addressAddressFor paged architectures: the base part of the surface address.
instructions_setSymbolFor multiple instruction set architectures: the instruction set of this instruction.
widthUnsignedThe instruction width in bytes
bytesVector of UnsignedThe bytes the instruction is composed of. This is the raw machine code. Note that in the future, this attribute might use a specialised vector that can store the bytes more efficiently, so use nth_byte() to get them, instead of nth()->get_byte().

Block

NameType(s)Description
addressAddressLike for instructions. Note that empty blocks still have an address. It is assigned conceptually by closest correspondance and is likely (but not necessary) to be equal to an address of some non-empty block.
surface_addressStringLike for instructions.
pageAddressFor paged architectures: the page part of the surface address.
base_addressAddressFor paged architectures: the base part of the surface address.
instructions_setSymbolFor multiple instruction set architectures: the instruction set of this instruction.
block_typeSymbolAt some nodes, defines what kind of block this is. The following values may be found here:

zol_back

the back node of a zero-overhead loop

dummy_call

a fall-through to another routine: not really a call, but instead this block marks the traversal between two adjacent routines without any branch instruction, but by a fall-through edge.

exclusion

the block does contain code, but that code was excluded by the user. Such a block has to be handled as a black box and often, this needs special care.

user_dead_end

at this address, the analyses ends by the will of the user. There is no edge coming from this block.

delayedBooleanThis marks a delayed branch instruction. The block does not contain the edges of the branch, but only a fall-through edge into the delay block.
no_callsBooleanMarks the presence of a call that has no targets.
more_callsBooleanMarks the presence of more, yet unknown call edges in the CFG. This is an important information for most analyses, since they get to know that they don't see the whole reality here. Some can take special care, some will fail (e.g. a stack analysis).
more_branchesBooleanMarks the presence of more, yet unknown branch edges in the CFG. This is very sad information: most analyses will have to fail as almost anything might happen.
implicit_callsVector of RoutineMarks the presence of routine entries that are revealed in this block. This block does not invoke those routines itself, but somewhere else in the program, an unresolved call may call such a routine. A typical example of a call that has this attribute is the atexit() system call, whose only argument is a function pointer eventually invoked somewhere else in the program.
orig_callsVector of RoutineContains an edge to a routine that is called by the block which is not the logical control flow. That logical control flow is represented by a standard edge. This situation happens when the architecture uses a routine to implement a call, which is often the case on small architectures for implementing computed calls: inlined code would be too large, so a routine implements it. routine is then found in this attribute, while the implemented target is found by traversing the stardard call edge.
implicit_branchesVector of BlockLike 'implicit_calls' for branches.
orig_branchesVector of RoutineLike 'orig_calls' for branches. Will be filled in with the original branch targets when the user overrides them.

Routine

NameType(s)Description
addressAddressLike for instructions. Note that empty routines still have an address. It is assigned conceptually by closest correspondance and is likely (but not necessary) to be equal to an address of some non-empty block.
surface_addressStringLike for instructions.
pageAddressFor paged architectures: the page part of the surface address.
base_addressAddressFor paged architectures: the base part of the surface address.
instructions_setSymbolFor multiple instruction set architectures: the instruction set of this instruction.
nameSymbolThe human readably name of this routine as extracted from e.g. an executable's symbol table. Do not use this to identify routines in analyses as the names need not be unique. Only if the user is to read or identify a routine should this attribute be used for reference. In CRL2, always use pointers to CRL2 structures for identification.
sectionSymbolThe name of the section this routine was found in in the executable.
externalBooleanIf set, the routine's body is not part of the analyses. Also see 'reason' attribute.
reasonSymbolFor external routines: the reason why this is not included: 'EXCLUDED': the user has manually excluded the routine from the analysis. AIS:

routine X is not analysed

'EXTERNAL' the routine is not part of the executable or the user has forced the routine to be assumed not to be part of the executable: AIS:

routine X is external
no_returnBooleanThis routine does not return to the caller. exit() is a typical candidate that has this attribute.

Global

These attributes are found at the graph.

NameType(s)Description
input_file_nameStringName of the input file leading to this CRL2 file.
startVector of RoutineThe start points selected for the coming analyses.

Data & Bytes & Info & Meta

Yet to be documented.


AbsInt aiT/StackAnalyzer Attributes

Block

NameType(s)Description
infeasibleSymbolThis block is infeasible, i.e., it cannot be reached during program execution. This attribute may also be set by other analyses when new information is found, e.g. by a value analysis. exec2crl generates this from user annotations.

Routine

NameType(s)Description
infeasibleBooleanThis routine in infeasible. This information may be either found automatically, or a user annotation might have been used.

Global

These attributes are found at the graph.

NameType(s)Description
errorsUnsignedThe number of errors that occured during decoding.
warningsUnsignedThe number of warnings that occured during decoding.
reader_nameStringName of the reader module used used, i.e., the file format.
decoder_nameStringName of the decoder module, i.e., the architecture.
compiler_nameStringThe name of the compiler that compiled the input file in case exec2crl could find that out.
pagingBooleanWhether the architecture uses paged addresses.


AbsInt aiT/StackAnalyzer User Attributes

Operation

NameType(s)Description
mem_accessVector of RangesList of address ranges the operation may access. AIS:

instruction X accesses Y
mem_access_by_stepVector of Vector of RangesSimilar to 'mem_access', but distinguished by sub-step of operation. This is interesting for same processors with several stages in the memory interface. AIS:

instruction X accesses Y in step Z
spec_mem_accessVector of Ranges

Same as 'mem_access' for speculative accesses

instruction X accesses Y speculatively

spec_mem_access_by_stepVector of Vector of RangesSame as 'mem_access_by_step' for speculative accesses AIS:

instruction X accesses Y in step Z speculatively
user_reg_valuesMap of Symbol to RangeMaps registers to their value when entering this operation. AIS:

instruction X is entered with r1=Y, r2=Z, ...;

user_* Value User defined additional attributes. The AIS names get a 'user_' prefix when moved into the CRL2 structure. | AIS: : instruction X features KEY=VAL, KEY=VAL, ...; +-------------------------+-----------+-----------------------------------------------------------+

Block

NameType(s)Description
loop_annotationsMap of Symbol to ValueRead by the loop conversion: a map of attributes to be moved to the routines resulting from loop conversion of this block. All blocks of a loop routine are searched for this attribute and all the found attributes are then copied to the routine's attribute map. The possible attributes that may occur here are: 'iteration_count', 'user_execution_time', and 'external'. See the documentation of routine attributes for an explanation. Note that the loop conversion is usually performed by exec2crl already. AIS:

loop X iterates Y;
add_cyclesUnsignedNumber of cycles to add to the execution time of a given block. This is read by the path analysis. AIS:

instruction X additionally takes Y;

Routine

NameType(s)Description
loop_annotationsVector of Map of Symbol to ValueLike the block attribute 'loop_annotations' but applying to all loops found in the routine. Each element of the vector corresponds to one loop in the routine; they are sorted by linear address of the loop header (=entrance address inside the loop).
iteration_countRangeThe iteration count for a loop starting at this loop routine. The value is a range to be able to specify minimum and maximum. AIS:

loop X iterates Y
user_execution_timeRangeA user annotated execution time for a loop (i.e., a bound on the time, not the number of iterations). This is currently not implemented. This attribute is also used for excluded routines. AIS:

routine X is not analysed and takes Y
default_read_accessValue of RangeSets a default for accesses in this routine when no range of memory access can be found out by an analysis. AIS:

read default X from Y
default_write_accessValue of RangeSame as 'default_read_access' for write accesses. AIS:

write default X to Y
restrict_read_accessValue of RangeSets a restriction for accesses in this routine. All access ranges found will be intersected with the ranges given here. AIS:

read restrict X to Y
restrict_write_accessValue of RangeSame as 'restrict_read_access' for write accesses. AIS:

write restrict X to Y
default_loop_iterationRangeIf no loop iteration is found for loops in this routine, then this value is used. AIS:

loop iteration default X is Y
restrict_loop_iterationRangeA restriction of loop iterations for all loops in this routine. AIS:

loop iteration restrict X is Y
incarnation_countRangeHow many times the routine may be alive at the same time, i.e., how many stack frames of this routine may exist in parallel. AIS:

routine X incarnates Y
stack_usageUnsignedStack usage of the given function. AIS:

routine X uses Y bytes of stack
stack_effectUnsignedStack difference of the given function. AIS:

routine X leaves behind Y bytes of stack
cc_violationBoolViolation of calling conventions. AIS:

routine X violates calling conventions
user_*ValueUser defined additional attributes. The AIS names get a 'user_' prefix when moved into the CRL2 structure. AIS:

routine X features KEY=VAL, KEY=VAL, ...;

Global

These attributes are found at the graph.

NameType(s)Description
clock_rateRangeThe clock rate of the processor. AIS:

clock X
default_read_accessValue of RangeSets a default for accesses for the whole graph. Also see the routine attribute with the same name. AIS:

global read default from Y
default_write_accessValue of RangeSame as 'default_read_access' for write accesses. AIS:

global write default to Y
restrict_read_accessValue of RangeSets a global restriction for accesses. Also see the routine attribute with the same name. AIS:

global read restrict to Y
restrict_write_accessValue of RangeSame as 'restrict_read_access' for write accesses. AIS:

global write restrict to Y
default_loop_iterationRangeSee the routine attribute with the same name. The global attribute works for the whole graph. AIS:

global loop iteration default is Y
restrict_loop_iterationRangeA restriction of loop iterations for all loops. Also see the routine attribute with the same name. AIS:

global loop iteration default is Y
flow_relations>"Flow"Flow relations to include in the ILP in Pathan. The Attribute structure is a bit complex, so there is a special Section "Flow". AIS:

flow X / Y is Z
cache>"Cache"The user defined cache specification. This is quite complex, so there is a separate document for this.
mapping>"PAG"The mapping to be used for the analyses. This is a string representation that PAG can parse, so please refer to the Section "PAG".


Flow

A flow relations is currently a relation of linear combination of program points, i.e., a sum of program point counts and constant coefficients, compared with =, <= or >=.

The linear combinations are simple represented as a vector. Each element is an Application with type INFIX and functor '*'. The relations are represented by an infix Application with one of the functors '>=', '<=' or '='.

If the relation holds for each context, not only for the sum over all contexts, then another prefix Application with the functor 'each' is applied.

These relations are arranged in a vector and are found in this form in the global 'flow_relations' attribute.

Here's an example:

CRL2:

 [
    ([ i8 ] '>=' [ ([ 5 ]   '*' [ i10 ]) ]),
    ([ i8 ] '<=' [ ([ 0xa ] '*' [ i10 ]) ]),
    ('each'[
        ([ i354 ] '>=' [ ([ 6 ] '*' [ i356 ]) ])
    ]),
    ('each'[
        ([ i354 ] '<=' [ ([ 0x16 ] '*' [ i356 ]) ])
    ])
 ]


PAG

FIXME: add more docu.

Here's an example:

CRL2:

mapping="VIVU-4,len=inf,def_unroll=2"


Cache

(Copied from the internal document exec2crl/doc/spec_format.txt)

lows setting the cache parameters for different levels of a cache. Each level may distinguish instruction, data and unified cache.

An example in AIS looks as follows:

cache
   first level data
       set-count = 128, associativity = 4, line-size = 16
and second level data
        size = 262144, associativity = 8, line-size = 32;

The general syntax is:

'cache'
Cache specifitation (non-empty, 'and'-separated list): :    (
Cache Level (optional): :        (
          ('first' | 'second' | 'third' | 'fourth' | 'fifth' ) 'level'
       |  'level' INT
       )?
If this is missing, 'first level' is assumed. :
Cache Scope (optional): :        ( 'data' | 'instruction' | 'unified' )?
If this is missing, 'unified is assumed. :
Feature List (non-empty, comma separated list): :        ( 'set-count'     '=' INT
       | 'associativity' '=' INT
       | 'line-size'     '=' INT
       | 'line-count'    '=' INT
       | 'size'          '=' INT
       | 'policy'        '=' ( 'plruppc' | 'lru' )
       | 'may'           '=' ( 'none' | 'empty' | 'chaos' )
       | 'pers'          '=' ( 'none' | 'empty' | 'chaos' )
       | 'must'          '=' ( 'true' | 'false' )
       )','+
   )'and'+

The cache geometry must be specified by an arbitrary selection of the parameters 'set-count', 'associativity', 'line-size', 'line-count' and 'size' in such a way that the other parameters can be inferred.

From this specification, exec2crl generates a set of global attributes. The keys of the attributes have one of the following forms:

If the level is not given:

<SCOPE>cache_<FEATURE>

If the level is given:

level<LEVEL><SCOPE>cache<FEATURE>

Here, SCOPE may be:

iif the instruction cache was specified
dif the data cache was specified
''otherwise, i.e, unspecified, or specified as 'unified'.

FEATURE may be:

sizethe total size of the cache in bytes -> the attribute value is an integer
line_sizethe size of one cache line in bytes -> the attribute value is an integer
associativitythe level of associativity -> the attribute value is an integer
policythe cache policy -> the attribute value is one of the following symbols: plruppc, lru
maybehaviour of may analysis: -> one of the following symbols: none, empty, chaos
persistencebehaviour of persistence analysis: -> one of the following symbols: none, empty, chaos
mustbehaviour of must analysis: -> one of the following symbols: none, empty

Example attribute:

CRL2:

cache=[
     { // level 1
         data={
             associativity=4,
             line_size=0x100,
             size=0x20000
             policy='lru',
             must='empty',
             may='none',
             pers='none',
         }
         instruction={
             ...
         }
         unified={
             ...
         }
     },
     { // level 2
         data={
             ...
         }
         ...
     }
     ...
]

Note that all attributes may be missing if they are not specified. Also note that there is no default for the level: instead, there are different attributes for the case where no level was specified. Maybe this should be changed and the default should be level 1.

You should decide yourself when to issue error messages and tell the behaviour to the master of documentation.

The attributes 'size', 'line_size' and 'associativity' are always specified together, i.e., either exec2crl specifies all, or none of them.


Generated by erwin-cgen © AbsInt Angewandte Informatik GmbH