c++ - Which data structure is sorted by insertion and has fast "contains" check? -


i looking data structure preserves order in elements inserted , offers fast "contains" predicate. need iterator , random access. performance during insertion or deletion not relevant. willing accept overhead in terms of memory consumption.

background: need store list of objects. objects instances of class called neuron , stored in layer. layer object has following public interface:

class layer { public:     neuron *neuronat(const size_t &index) const;     neuroniterator begin();     neuroniterator end();     bool contains(const neuron *const &neuron) const;     void addneuron(neuron *const &neuron); }; 

the contains() method called quite when software runs, i've asserted using callgrind. tried circumvent of calls contains(), still hot spot. hope optimize method.

i thought of using std::set, using template argument provide own comparator struct. neuron class not give position in layer away. additionally, i'd have *someneuroniterator = anotherneuron work without screwing order.

another idea use plain old c array. since not care performance of adding new neuron object, thought make sure neuron objects stored linear in memory. invalidate pointer pass addneuron(); @ least i'd have change point new copy created keep things linear aligned. right?

another idea use 2 data structures in layer object: vector/list order, , map/hash lookup. contradict wish iterator allowed operator* without const reference, wouldn't it?

i hope can hint idea data structure or concept satisfy needs, or @ least give me idea alternative. thanks!

if contains check need fastest execution, , assuming can little intrusive source code, fastest way check if neuron belongs in layer flag when insert layer (ex: bit flag).

you have guaranteed o(1) checks @ point see if neuron belongs in layer , it's fast @ micro-level.

if there can numerous layer objects, can little trickier, you'll need separate bit each potential layer neuron can belong unless neuron can belong in single layer @ once. reasonably manageable, however, if number of layers relatively fixed in size.

if latter case , neuron can belong 1 layer @ once, need backpointer layer*. see if neuron belongs in layer, see if backpointer points layer object.

if neuron can belong multiple layers @ once, not many @ 1 time, store little array of backpointers so:

struct neuron {     ...     layer* layers[4]; // use whatever small size fits common case     layer* ptr;     int num_layers; }; 

initialize ptr point layers if there 4 or fewer layers neuron belongs. if there more, allocate on free store. in destructor, free memory if ptr != layers. can optimize away num_layers if common case 1 layer, in case null-terminated solution might work better. see if neuron belongs layer, linear search through ptr. that's practically constant-time complexity respect number of neurons provided don't belong in mass number of layers @ once.

you can use vector here might reduce cache hits on common case scenarios since it'll put contents in separate block, if neuron belongs 1 or 2 layers.

this might bit different looking general-purpose, non-intrusive data structure, if performance needs skewed towards these kinds of set operations, intrusive solution going fastest in general. it's not quite pretty , couples element container, hey, if need max performance...

another idea use plain old c array. since not care performance of adding new neuron object, thought make sure neuron objects stored linear in memory. invalidate pointer pass addneuron(); [...]

yes, won't invalidate indices. while not convenient use pointers, if you're working mass data vertices of mesh or particles of emitter, it's common use indices here avoid invalidation , possibly save 32-bits per entry on 64-bit systems.

update

given neurons exist in 1 layer @ time, i'd go pointer approach. seeing if neuron belongs layer becomes simple matter of checking if pointer points same layer.

since there's api involved, i'd suggest, because sounds you're pushing around lot of data , have profiled it, focus on interface revolves around aggregates (layers, e.g.) rather individual elements (neurons). it'll leave lot of room swap out underlying representations when clients aren't performing operations @ individual scalar element-type interface.

with o(1) contains implementation , unordered requirement, i'd go simple contiguous structure std::vector. however, expose potential invalidation on insertion.

because of that, if can, i'd suggest working indices here. however, become little unwieldy since requires clients store both pointer layer in neuron belongs in addition index (though if this, backpointer becomes unnecessary client tracking things belong).

one way mitigate use std::vector<neuron*> or ptr_vector if available. however, can expose cache misses , heap overhead, , if want optimize that, fixed allocator comes in handy. however, that's bit of pain alignment issues , bit of research topic, , far seems main goal not optimize insertion or sequential access quite as contains check, i'd start std::vector<neuron*>.


Comments

Popular posts from this blog

android - MPAndroidChart - How to add Annotations or images to the chart -

javascript - Add class to another page attribute using URL id - Jquery -

firefox - Where is 'webgl.osmesalib' parameter? -