python - Numpy apply function to group in structured array -

starting off structured numpy array has 4 fields, trying return array latest dates, id, containing same 4 fields. found solution using itertools.groupby works here: numpy mean structured array

the problem don't understand how adapt when have 4 fields instead of 2. want whole 'row' back, rows latest dates each id. understand kind of thing simpler using pandas, small piece of larger process, , can't add pandas dependency.

data = np.array([('2005-02-01', 1, 3, 8),              ('2005-02-02', 1, 4, 9),              ('2005-02-01', 2, 5, 10),              ('2005-02-02', 2, 6, 11),              ('2005-02-03', 2, 7, 12)],               dtype=[('dt', 'datetime64[d]'), ('id', '<i4'), ('f3', '<i4'),                  ('f4', '<i4')])

for example array, desired output be:

np.array([(datetime.date(2005, 2, 2), 1, 4, 9),           (datetime.date(2005, 2, 3), 2, 7, 12)],          dtype=[('dt', '<m8[d]'), ('id', '<i4'), ('f3', '<i4'), ('f4', '<i4')])

this i've tried:

latest = np.array([(k, np.array(list(g), dtype=data.dtype).view(np.recarray)               ['dt'].argmax()) k, g in                groupby(np.sort(data, order='id').view(np.recarray),               itemgetter('id'))], dtype=data.dtype)

i error:

valueerror: size of tuple must match number of fields.

i think because tuple has 2 fields array has 4. when drop 'f3' , 'f4' array works correctly.

how can return 4 fields?

lets figure out error pealing off 1 layer:

in [38]: operator import itemgetter in [39]: itertools import groupby in [41]: [(k, np.array(list(g), dtype=data.dtype).view(np.recarray)           ['dt'].argmax()) k, g in            groupby(np.sort(data, order='id').view(np.recarray),           itemgetter('id'))] out[41]: [(1, 1), (2, 2)]

what list of tuples supposed represent? isn't rows data. , since each tuple has 2 items can't mapped onto data.dtype array. hence value error.

after playing around bit, think: [(1, 1), (2, 2)] means, id==1, use [1] item group; id==2, use [2] item group.

[(datetime.date(2005, 2, 2), 1, 4, 9),  (datetime.date(2005, 2, 3), 2, 7, 12)]

you have found maximum dates, have translate either indexes in data, or select items groups.

in [91]: groups=groupby(np.sort(data, order='id').itemgetter('id')) # don't need recarray  in [92]: g = [(k,list(g)) k,g in groups]  in [93]: g out[93]:  [(1,   [(datetime.date(2005, 2, 1), 1, 3, 8),    (datetime.date(2005, 2, 2), 1, 4, 9)]),  (2,   [(datetime.date(2005, 2, 1), 2, 5, 10),    (datetime.date(2005, 2, 2), 2, 6, 11),    (datetime.date(2005, 2, 3), 2, 7, 12)])] in [107]: i=[(1,1), (2,2)]  in [108]: [g[1][i[1]] g,i in zip(g,i)] out[108]: [(datetime.date(2005, 2, 2), 1, 4, 9), (datetime.date(2005, 2, 3), 2, 7, 12)]

ok, selection g clumsy, start.

if define simple function pull record latest date group, processing lot simpler.

def maxdate_record(agroup):     an_array = np.array(list(agroup))     = np.argmax(an_array['dt'])     return an_array[i]  groups = groupby(np.sort(data, order='id'),itemgetter('id')) np.array([maxdate_record(g) k,g in groups])

producing:

array([(datetime.date(2005, 2, 2), 1, 4, 9),        (datetime.date(2005, 2, 3), 2, 7, 12)],        dtype=[('dt', '<m8[d]'), ('id', '<i4'), ('f3', '<i4'), ('f4', '<i4')])

i don't need specify dtype when convert list of records array, since records have own dtype.

Search This Blog

Remember

python - Numpy apply function to group in structured array -

Comments

Post a Comment

Popular posts from this blog

Java 8 + Maven Javadoc plugin: Error fetching URL -

css - SVG using textPath a symbol not rendering in Firefox -

c - gcc compile error: unknown type name 'File' -