|
| Copyright | AbsInt Angewandte Informatik GmbH | ||
|---|---|---|---|
| Author | Henrik Theiling | ||
| Description | CRL2 Library interface for C and C++.
Shall be easily usable with:
|
the cache for special attributes should be removed: we don't want to create Value * on the fly. The interface clearly states that e.g. find_sym_symbol should be used, and not find_sym->get_symbol. The private_attrs() stuff is bad!
implement the whole lot of casting access members for the special access functions of certain classes, e.g. for the list_key_t of lists.
block, edges: types: auto-update if possible. Especially for edges.
To allow for future extensions of maps, hash tables from symbol to value should use 'symbol'=... syntax. However this would be inconsistent with the attribute defs at normal items. So better have have same special syntax for arbitrary hash tables.
Include a protocol number in the CRL2 file. Only this way we can handle syntax changes properly.
for distinguishing formats, introduce a proper version number for the CRL format, e.g., a vector:
Current:
crl version 2;
Should be:
crl version 2 1 8;
As with other formats: the first number should be the main version. For this library, this is always two. A higher number would mean that a totally different library is propably necessary.
The second entry would be a major, incompatible version update. A parser that cannot read that version should stop immediate with a fatal error, since there may be subtle changes that have different semantics. A library update is needed.
The third entry would mean this uses a minor version update. A parser should try to read this, but issue an error. New features that are guaranteed to lead to parse errors would be of category: if they are not used in the file, an old parser will perform correctly. If they occur, the parser will fail as they are encountered, thereby ensuring safety.
Any further numbers should not lead to notices by the parser and should also not lead to parse problems: they are for mere distinction only.
have ValueVectorByte/VectorInt for more efficient access to byte/int arrays.
Currently, each byte would occupy a pointer plus an object in a VectorValue. (The question is how to implement 'v_nth()' for this, since we would have to create a new Value for that, which is very inefficient and probably runs against the whole point of having such a data type. Maybe we simply do not have 'nth'? We could have a special v_nth_int().).
Implement expressions denoting srcX and dstX (or opX), e.g. for representing semantics. Maybe instead use a link to a value (e.g. a CrlValueReference should be a value as well. Or there should be a CrlReference (or CrlRefValue) and a CrlValueRef (or CrlValueRefValue))
Implement a ValuePath that works properly with Unix and DOS. The drives of DOS will make this a bit complicated since they cannot (easily) be generated under Linux. But maybe exec2crl (the primary application this feature is for) can handle that.
implement find_int() etc. for maps instead of using a generic wrapper -- the generic implementation is slower by some additional virtual calls.
implement find_uint() and find_raw32() according to find_int().
implement copy()
TypeOr:
attributes instruction
target:identifier || addressProblem: handling things like:
xyz:signed || unsigned
and then 'xyz=4'. We will have to define whether this becomes ValueSigned or ValueUnsigned.
have a ValueMap from int to value (this is faster for ints, of course, than a map from symbol to value) This should be an alternative implementation of vectors and tuples:
attributes instruction
pos:[string,unsigned,unsigned] (implementation=hash)Extend the parser:
check attribute types
Should there be mandatory attributes? Assumed they exist: should the default be optional or mandatory attributes? What about entries in tupels, which could use the same predicate?
for numeric types, have a format specifier (similar to C: with width, precision and base):
attributes instruction
address: address (format="%08x"),
lines: unsigned (format="%d") [],
weight: float (format="%6.2f");(Maybe even have two: one for CRL and one for ASM printing.)
for numeric types, have value constraints:
attributes instruction
weight:float (range=0.0..1.0);have TypeVector store an optional minimum and maximum length attributes instruction mc:byte[1..4]
have TypeMap that recursively contains attribute declarations attributes instruction
cat:{ mem_read:bool, mem_write:bool }here, make clear what is optional, e.g. by ...:
cat:{ mem_read:bool, mem_write:bool, ...}Then, {} would be an empty map, while { ... } would be any map.
have a variant of TypeVector and TypeTuple that selects a ValueList as a base type.
attributes instruction
lines:unsigned[] (implementation=list)Make the application who invokes crl_graph_read able to supersede the implementation given in the crl file. (It will probably need a CrlParser to do this.)
The array definition must be properly recursive:
xyz:unsigned[] (implementation=list) [] (implementation=vector)
type checking and influence on attribute reading (e.g. read 5 as signed if that is the way it is declared. Currently, only +5 is read as signed).
attributes instruction
xyz:signedThen 'xyz=5' shall be read as a signed constant rather than unsigned.
Further: allow generation of vector or list by a type def. (See the comment about TypeVector,Tuple + ValueList above)
attributes instruction
pos:[string,unsigned,unsigned] (implementation=list)Type checking: make it possible to forbid non-null values:
?value(), ?item(), ?string(), ...
(in parse_cast_value()).
Maybe make type checking for items correctly obey the class hierarchy. This only concerns RoutineItem, however.
(We should have erwin_cgen generate generate the corresponding functions. Also, some of the functions in to_string.cpp for id->name and name->id translation should be auto-generated.)
have optional tuple entries:
pos:[string, unsigned?, unsigned? ]
Possibly allow groups that are optional, but if the first occurs, the following must be given, too:
pos:[string, unsigned?, unsigned ]
erasure of structures
clearing block structures both are easy on the graph structure ibrary level, but unfortunately, attributes do not currently support callbacks for erasure of things they point to. Further, in the parser, we have to recursively clean up the with_id structure.
implement specialised integer indexed access for ValueString (read/write) and ValueSymbol (read-only). When ready, enable the char[] vector in the parser.
Allow "..." syntax for parsing vectors and maps: just split the given string. We might want to implement this in ValueVector::poke(). We also need to type-check, of cause.
implement a CRL1 reader.
optimise the set/set_once/poke/poke_once/append etc. functions that take AnyValue: it is quite inefficient to first generate a new Value on the fly, then unpack the value inside and poke it into the given structure, and then deallocate the temporary. Especially poke() is highly inefficient, since the deallocation is almost always necessary.
One problem here is that v_poke has very many implementations, all of which would have to exist in several versions then, in order to really improve performance (at least Value *, unsigned_t, signed_t, VChar const &, char const , and Item ). This is awfully complex to gain performance.
The biggest issue is surely the parser, which invokes poke() for most constants it reads.
For virtual functions that implement a default algorithm by invoking other virtual functions, remove one virtual call by implementing the algorithm in the second set of functions. Do this with priority in functions that are very simple.
- v_print_vchar implemented by crl_name().
Maybe it is possible to have generated implementations in many cases.
Memory usage: hash all values upon reading -> structure sharing
introduce banked allocation
preallocation of vectors and hashes (requires syntax change or duplicate reading of files).
Graph::pag_import() is slow: it is O(n^2) in the number of invocations per function since for all calls, all returns to the corresponding routine are traversed. This is bad. (See FIXME in member.cpp.)
implement lists that contain the next and prev pointers in their contained objects
with -DNDEBUG, include oneliners.cpp at the end of this file to allow for inlining. (Possibly move virtual functions from oneliners.cpp: the compiler will only be slower without any benefit OTOH, these functions are really very, very small).
This optimisation only helps C++.
also with -DNDEBUG: include gen-wrap-c.cpp in the same way as oneliners.cpp.
This optimisation helps both C and C++ (if the latter ever needs that file).
implement non-virtual versions of virtual functions to simulate optimisations like with Java's 'final'. (E.g. Structure::get() in all Values but ValueBox, etc.; similar: Structure::skip(). Probably others.) This isn't too important for the above functions, since it is unlikely that someone invokes get() on a ValueSymbol *, but who knows. (If we don't want them to do that, we should implement a protected version to prohibit invocation, but that's probably too fascist.)
Also see crl2/decls.h for more (base) (type) definitions and some conceptual notes about naming conventions.
The method names in C++ follow the C naming conventions, not those of Qt, for example (insert_routine() instead of insertRoutine). This is in sync with Erwin.
There is an exception of this rule: the as_ and cast_ functions, which are generated by macros, use as_Object(), as_ValueVector(), ... Of course this is a bit unfortunate but much easier to define in announce_class(). Also, other libraries in the CVS do the same (tf14net).
This file is parsed by a perl script that generates the C wrapper! If you edit this file, use the conventions!
Things relied on in make-wrap-c.pl(.in):
Class definitions must start with either
#define THIS #define SUPER struct CRL_THIS: public CRL_SUPER
(Note that you must define SUPER even if it is the same as for the previous type) or with an explicit
struct TypeA: public TypeB.
Any other method of class definition is not handled (e.g. with more sophisticated macros).
Classes are always 'struct's because C wants to understand the pointers to them, too.
Only C comments, no C++ comments are allowed (this is a shared header file, so this is a must anyway, not only for make-gen-c.pl).
Data declarations that should be accessible for the user must be declared only using child- and xref-macros from classdef.h. These macros are partially known to the Perl script, so if you change them, you'd probably need to adjust the script, too. All other data declarations are ignored, regardless whether they are public or not. (C does not see the struct definitions, so it cannot access these members anyway).
BEFORE you declare constructors, the Perl script needs to know whether the class is abstract or not. Its default is to assume that a class is concrete. It cannot determine by the super class whether it is abstract, so you must either:
use the macro 'abstract' (that's the recommended way)
use the commennt '! abstract' alone in a comment (CrlObject does this)
declare an abstract member (using = 0 after a function def.)
Any of these must be done before the first constructor definition.
Currently, the inheritance protection is not honoured. It is assumed that you inherit with 'public' access.
The namespace class Crl may only contain static members and local_* and foreign_* declarations.
Follow the conventions of type naming: TheType is the C++ type. The C-type the_type_t is an automatically generated #define (from the local_class() directive). If your derivation from C to C++ is non-standard, usually because there is a sequence of upper case letters in the name, use 'map-class-name' (CrlWithID -> crl_with_id_t, not crl_with_i_d_t) (Note: this special case cannot be automated: CrlVChar -> crl_v_char_t follows the regular mapping.)
for comments to be assigned to the correct member function in the C wrapper, write them without empty lines around a member function.
Comments about the class go into the class without a newline before them:
struct ... {
/ * class comment
* ....... * /
/ * more class comment * /
/ * func comment 1 * /
void func();
/ * func comment 2 * /
/ * Unassigned comment currently erroneously assigned to the
* following function b * /
/ * comment for b * /
int b()
...
}No usage of complex typedefs, e.g. function types, array types, enums, etc. Use a typedef in decls.h to give a name to such a type (e.g. like crl_string2symbol_t).
No local struct, enum, union definitions. Make a global type with a longer name in decls.h. The reason is that C wants to see your types, too. You may use a typedef local to a class for renaming types, though. This is e.g. done with crl_edge_type_t. It is called edge_type_t inside the name space and type_t in the class CrlEdge.
no #define, #if, #ifndef and other preprocessor directives at all inside class definitions. The Perl skript recognizes one macro: CRL_IF_DEBUG(source code) and ignores the contents. You can use that instead of #ifndef NDEBUG ... #endif.
Only few operators can be used: = converted into 'assign' function comparisons ignored, instead, 'cmp' has to be used.
| Generated by erwin-cgen | © AbsInt Angewandte Informatik GmbH |