c++ - Polymorphic filter iterator range -

i have sequence of elements (usually std::vector<t>), , method overlap_range(iterator, iterator, t) obtaining subset of these elements overlaped t.

if sequence of elements satisfy criteria, result of overlap_range contiguous, , determinable in logarithmic time. otherwise there may multiple discontiguous such sub-ranges, can determined in linear time. takes linear time determine if criteria satisfied.

i following:

overlap_range polymorphic, in sense algorithm operates in logarithmic time if criteria known satisfied.
the returned sub-range polymorphic, able take advantage of known properties of sub-range if criteria satisfied.

my proposed solution use default valued flag parameter in overlap_range, takes care of (1). thought solve (2) using boost::filter_iterator<std::function<bool(t)>, iterator> std::function<bool(t)> returns true if criteria known satisfied. however, turned out approx. 50x slower using functor test overlaped in both cases. logic overlap not particularly complex, number of elements in range sufficiently large there significant advantage not having evaluate unnecessarily, if possible.

are there other approaches might me solve problem?

more details

although think above sufficient understand problem, here more background useful.

t object can reduced genomicregion, defines set of co-ordinates on single contiguous sequence (i.e. 2 co-ordinates). such t mappable<t> requires method get_region.

overlaps defined as

inline genomicregion::differencetype overlap_size(const genomicregion& lhs, const genomicregion& rhs) noexcept {     return static_cast< genomicregion::differencetype>(std::min(lhs.get_end(), rhs.get_end())) -             static_cast< genomicregion::differencetype>(std::max(lhs.get_begin(), rhs.get_begin())); }  inline bool overlaps(const genomicregion& lhs, const genomicregion& rhs) noexcept {     auto num_bases_overlaped = overlap_size(lhs, rhs);     return (num_bases_overlaped == 0) ? !are_adjacent(lhs, rhs) || empty(std::min(lhs, rhs)) : num_bases_overlaped > 0; }

any mappable class can sorted begin co-ordinate, , end co-ordinate (if begin co-ordinates equal).
the "criteria" logarithmic overlap searching (and contiguous resulting region), that: if a <= b end(a) <= end(b), true if example t same size.

the naive filter iterator defined as

template <typename mappabletype> class isoverlapped { public:     isoverlapped() = delete;     template <typename mappabletype_>     isoverlapped(const mappabletype_& mappable) : region_ {get_region(mappable)} {}     bool operator()(const mappabletype& mappable) { return overlaps(mappable, region_); } private:     genomicregion region_; };  template <typename iterator> using overlapiterator = boost::filter_iterator<isoverlapped<typename iterator::value_type>, iterator>;  template <typename iterator> using overlaprange = boost::iterator_range<overlapiterator<iterator>>;

overlap_range can declared as

template <typename forwarditerator, typename mappabletype> overlaprange<forwarditerator> overlap_range(forwarditerator first, forwarditerator last, const mappabletype& mappable, bool is_bidirectional=false)

of course, return range overlaprange<forwarditerator>(first, last), can lot better efficiently finding right bound, , left bound if range is_bidirectional.

the biggest slow down because of dynamic dispatch in std::function<>. try replace custom, non-type-erased predicate type (function object?).

i should asking means elements in range "be overlapped t". maps of numbers called quantifiers but, let me jump bit of conclusion because excuse show off boost interval container library.

note demo hinges on fact boost icl has builtin combining styles , comparers many things. docs map concept:

maps of sets called collectors and

maps of numbers called quantifiers

for sense of implemented out-of-the-box chose demo collectors, i.e. t in terminology happens std::set<int>.

live on coliru

#include <boost/icl/interval_map.hpp> #include <set>  namespace icl = boost::icl;  using container = icl::interval_map<     size_t,          // mirrors vector index     std::set<int> >; // set intersection models our "overlap"  container generate_random_data();  #include <iostream>  int main() {     auto haystack = generate_random_data();      // let's find ranges t overlaps -2:      using seive = container::value_type;      std::cout << "haystack: " << haystack << "\n";     std::cout << "overlaps -2: \n\t-> " << (haystack & seive {{ 0, 1024 }, { -2 }}) << "\n";     std::cout << "overlaps +7: \n\t-> " << (haystack & seive {{ 0, 1024 }, { +7 }}) << "\n";      // can "multi-sieve" in 1 pass:     std::cout << "overlaps of +7 or -2: \n\t-> " << (haystack & seive {{ 0, 1024 }, { -2, +7 }}) << "\n";      // , more interesting, can narrow of "overlap" targets input range:     container combined;     combined += seive {{ 128, 512 }, { -2 }};     combined += seive {{ 512, 768 }, { +7 }};     std::cout << "complex query " << combined << ": \n\t-> " << (haystack & combined) << "\n"; }  #include <random> #include <functional>  container generate_random_data() {     using namespace std;      mt19937 prng {random_device{}()};     // mt19937 prng {3236629322};     auto card     = bind(uniform_int_distribution<>(1,       5),    ref(prng));     auto domain   = bind(uniform_int_distribution<size_t>(0, 1024), ref(prng));     auto codomain = bind(uniform_int_distribution<>(-10,     10),   ref(prng));     auto gen_val  = [&] {          container::value_type::second_type v;         generate_n(inserter(v, end(v)), card(), codomain);         return v;     };      auto gen_range = [&]() -> container::value_type {          auto left = domain(), right = domain();         if (left>right) swap(left, right);          return {              container::interval_type::right_open ( left, right ),             gen_val()         };     };      {         container data;         generate_n(icl::inserter(data, end(data)), prng() % 1024, gen_range);         return data;     } }

which prints (for particular seed)

haystack: {([0,1)->{-10 3 })([1,5)->{-10 })([5,13)->{-9 -1 3 7 10 })([13,17)->{-4 1 6 10 })([17,69)->{0 2 4 8 })([69,81)->{-7 -4 9 })([81,97)->{1 8 10 })([97,102)->{6 })([102,701)->{-10 -2 1 3 7 })([701,987)->{3 })([987,1002)->{-9 6 7 8 10 })([1002,1003)->{2 3 4 5 9 })([1003,1004)->{-9 -7 3 9 })([1004,1024)->{-10 -2 2 5 })} overlaps -2:      -> {([102,701)->{-2 })([1004,1024)->{-2 })} overlaps +7:      -> {([5,13)->{7 })([102,701)->{7 })([987,1002)->{7 })} overlaps of +7 or -2:      -> {([5,13)->{7 })([102,701)->{-2 7 })([987,1002)->{7 })([1004,1024)->{-2 })} complex query {([128,512)->{-2 })([512,768)->{7 })}:      -> {([128,512)->{-2 })([512,701)->{7 })}

you can run sample more times see results other randomly generated data sets.

WIKI

Search This Blog

c++ - Polymorphic filter iterator range -

Comments

Post a Comment