sql - Relational data structure in python -
i'm looking sql-relational-table-like data structure in python, or hints implementing 1 if none exist. conceptually, data structure set of objects (any objects), supports efficient lookups/filtering (possibly using sql-like indexing).
for example, lets objects have properties a
, b
, , c
, need filter by, hence define data should indexed them. objects may contain lots of other members, not used filtering. data structure should support operations equivalent select <obj> <datastructure> a=100
(same b
, c
). should possible filter more 1 field (where a=100 , b='bar'
).
the requirements are:
- should support large number of items (~200k). items must objects themselves, , not flattened version of them (which rules out
sqlite
,pandas
). - insertion should fast, should avoid reallocation of memory (which pretty rules out
pandas
) - should support simple filtering (like example above), must more efficient
o(len(data))
, i.e. avoid "full table scans".
does such data structure exist?
please don't suggest using sqlite. i'd need repeatedly convert object->row , row->object, time consuming , cumbersome since objects not flat-ish.
also, please don't suggest using pandas because repeated insertions of rows slow may requires frequent reallocation.
so long don't have duplicates on (a,b,c) sub-class dict, enter objects indexed tuple(a,b,c), , define filter method (probably generator) return entries match criteria.
class mydict(dict): def filter(self,a=none, b=none, c=none): key,obj in enumerate(self): if (a , (key[0] == a)) or not a: if (b , (key[1] == b)) or not b: if (c , (key[2] == c)) or not c: yield obj
that ugly , inefficient example, idea. i'm sure there better implementation method in itertools, or something.
edit:
i kept thinking this. toyed around last night , came storing objects in list , storing dictionaries of indexes desired keyfields. retrieve objects taking intersection of indexes specified criteria. this:
objs = [] aindex = {} bindex = {} cindex = {} def insertobj(a,b,c,obj): idx = len(objs) objs.append(obj) if in aindex: aindex[a].append(idx) else: aindex[a] = [idx] if b in bindex: bindex[b].append(idx) else: bindex[b] = [idx] if c in cindex: cindex[c].append(idx) else : cindex[c] = [idx] def filterobjs(a=none,b=none,c=none): if : aset = set(aindex[a]) if b : bset = set(bindex[b]) if c : cset = set(cindex[c]) result = set(range(len(objs))) if , aset : result = result.intersection(aset) if b , bset : result = result.intersection(bset) if c , cset : result = result.intersection(cset) idx in result: yield objs[idx] class testobj(object): def __init__(self,a,b,c): self.a = self.b = b self.c = c def show(self): print ('a=%i\tb=%i\tc=%s'%(self.a,self.b,self.c)) if __name__ == '__main__': in range(20): b in range(5): c in ['one','two','three','four']: insertobj(a,b,c,testobj(a,b,c)) obj in filterobjs(a=5): obj.show() print() obj in filterobjs(b=3): obj.show() print() obj in filterobjs(a=8,c='one'): obj.show()
it should reasonably quick, although objects in list, accessed directly index. "searching" done on hashed dict.
Comments
Post a Comment