sql - Relational data structure in python -


i'm looking sql-relational-table-like data structure in python, or hints implementing 1 if none exist. conceptually, data structure set of objects (any objects), supports efficient lookups/filtering (possibly using sql-like indexing).

for example, lets objects have properties a, b, , c, need filter by, hence define data should indexed them. objects may contain lots of other members, not used filtering. data structure should support operations equivalent select <obj> <datastructure> a=100 (same b , c). should possible filter more 1 field (where a=100 , b='bar').

the requirements are:

  1. should support large number of items (~200k). items must objects themselves, , not flattened version of them (which rules out sqlite , pandas).
  2. insertion should fast, should avoid reallocation of memory (which pretty rules out pandas)
  3. should support simple filtering (like example above), must more efficient o(len(data)), i.e. avoid "full table scans".

does such data structure exist?


please don't suggest using sqlite. i'd need repeatedly convert object->row , row->object, time consuming , cumbersome since objects not flat-ish.

also, please don't suggest using pandas because repeated insertions of rows slow may requires frequent reallocation.

so long don't have duplicates on (a,b,c) sub-class dict, enter objects indexed tuple(a,b,c), , define filter method (probably generator) return entries match criteria.

class mydict(dict):     def filter(self,a=none, b=none, c=none):         key,obj in enumerate(self):             if (a , (key[0] == a)) or not a:                 if (b , (key[1] == b)) or not b:                     if (c , (key[2] == c)) or not c:                         yield obj 

that ugly , inefficient example, idea. i'm sure there better implementation method in itertools, or something.

edit:

i kept thinking this. toyed around last night , came storing objects in list , storing dictionaries of indexes desired keyfields. retrieve objects taking intersection of indexes specified criteria. this:

objs = [] aindex = {} bindex = {} cindex = {}  def insertobj(a,b,c,obj):     idx = len(objs)     objs.append(obj)     if in aindex:         aindex[a].append(idx)     else:         aindex[a] = [idx]      if b in bindex:          bindex[b].append(idx)     else:         bindex[b] = [idx]      if c in cindex:         cindex[c].append(idx)     else :         cindex[c] = [idx]  def filterobjs(a=none,b=none,c=none):     if : aset = set(aindex[a])     if b : bset = set(bindex[b])     if c : cset = set(cindex[c])     result = set(range(len(objs)))     if , aset : result = result.intersection(aset)     if b , bset : result = result.intersection(bset)     if c , cset : result = result.intersection(cset)     idx in result:         yield objs[idx]  class testobj(object):     def __init__(self,a,b,c):         self.a =         self.b = b         self.c = c      def show(self):         print ('a=%i\tb=%i\tc=%s'%(self.a,self.b,self.c))  if __name__ == '__main__':     in range(20):         b in range(5):             c in ['one','two','three','four']:                 insertobj(a,b,c,testobj(a,b,c))      obj in filterobjs(a=5):         obj.show()     print()     obj in filterobjs(b=3):         obj.show()     print()     obj in filterobjs(a=8,c='one'):         obj.show() 

it should reasonably quick, although objects in list, accessed directly index. "searching" done on hashed dict.


Comments

Popular posts from this blog

css - SVG using textPath a symbol not rendering in Firefox -

Java 8 + Maven Javadoc plugin: Error fetching URL -

datatable - Matlab struct computations -