c++ - Which data structure is sorted by insertion and has fast "contains" check? -
i looking data structure preserves order in elements inserted , offers fast "contains" predicate. need iterator , random access. performance during insertion or deletion not relevant. willing accept overhead in terms of memory consumption.
background: need store list of objects. objects instances of class called neuron
, stored in layer
. layer
object has following public interface:
class layer { public: neuron *neuronat(const size_t &index) const; neuroniterator begin(); neuroniterator end(); bool contains(const neuron *const &neuron) const; void addneuron(neuron *const &neuron); };
the contains()
method called quite when software runs, i've asserted using callgrind. tried circumvent of calls contains()
, still hot spot. hope optimize method.
i thought of using std::set
, using template argument provide own comparator struct. neuron
class not give position in layer
away. additionally, i'd have *someneuroniterator = anotherneuron
work without screwing order.
another idea use plain old c array. since not care performance of adding new neuron
object, thought make sure neuron
objects stored linear in memory. invalidate pointer pass addneuron()
; @ least i'd have change point new copy created keep things linear aligned. right?
another idea use 2 data structures in layer
object: vector/list order, , map/hash lookup. contradict wish iterator allowed operator*
without const reference, wouldn't it?
i hope can hint idea data structure or concept satisfy needs, or @ least give me idea alternative. thanks!
if contains
check need fastest execution, , assuming can little intrusive source code, fastest way check if neuron
belongs in layer flag when insert layer (ex: bit flag).
you have guaranteed o(1) checks @ point see if neuron
belongs in layer , it's fast @ micro-level.
if there can numerous layer objects, can little trickier, you'll need separate bit each potential layer neuron can belong unless neuron
can belong in single layer @ once. reasonably manageable, however, if number of layers relatively fixed in size.
if latter case , neuron can belong 1 layer @ once, need backpointer layer*
. see if neuron belongs in layer, see if backpointer points layer object.
if neuron
can belong multiple layers @ once, not many @ 1 time, store little array of backpointers so:
struct neuron { ... layer* layers[4]; // use whatever small size fits common case layer* ptr; int num_layers; };
initialize ptr
point layers
if there 4 or fewer layers neuron
belongs. if there more, allocate on free store. in destructor, free memory if ptr != layers
. can optimize away num_layers
if common case 1 layer, in case null-terminated solution might work better. see if neuron belongs layer, linear search through ptr
. that's practically constant-time complexity respect number of neurons provided don't belong in mass number of layers @ once.
you can use vector
here might reduce cache hits on common case scenarios since it'll put contents in separate block, if neuron belongs 1 or 2 layers.
this might bit different looking general-purpose, non-intrusive data structure, if performance needs skewed towards these kinds of set operations, intrusive solution going fastest in general. it's not quite pretty , couples element container, hey, if need max performance...
another idea use plain old c array. since not care performance of adding new neuron object, thought make sure neuron objects stored linear in memory. invalidate pointer pass addneuron(); [...]
yes, won't invalidate indices. while not convenient use pointers, if you're working mass data vertices of mesh or particles of emitter, it's common use indices here avoid invalidation , possibly save 32-bits per entry on 64-bit systems.
update
given neurons exist in 1 layer @ time, i'd go pointer approach. seeing if neuron belongs layer becomes simple matter of checking if pointer points same layer.
since there's api involved, i'd suggest, because sounds you're pushing around lot of data , have profiled it, focus on interface revolves around aggregates (layers, e.g.) rather individual elements (neurons). it'll leave lot of room swap out underlying representations when clients aren't performing operations @ individual scalar element-type interface.
with o(1) contains
implementation , unordered requirement, i'd go simple contiguous structure std::vector
. however, expose potential invalidation on insertion.
because of that, if can, i'd suggest working indices here. however, become little unwieldy since requires clients store both pointer layer in neuron belongs in addition index (though if this, backpointer becomes unnecessary client tracking things belong).
one way mitigate use std::vector<neuron*>
or ptr_vector
if available. however, can expose cache misses , heap overhead, , if want optimize that, fixed allocator comes in handy. however, that's bit of pain alignment issues , bit of research topic, , far seems main goal not optimize insertion or sequential access quite as contains
check, i'd start std::vector<neuron*>
.
Comments
Post a Comment