Python Multiprocessing: is locking appropriate for (large) disk writes? -
i have multiprocessing code wherein each process disk write (pickling data), , resulting pickle files can upwards of 50 mb (and more 1 gb depending on i'm doing). also, different processes not writing same file, each process writes separate file (or set of files).
would idea implement lock around disk writes 1 process writing disk @ time? or best let operating system sort out if means 4 processes may trying write 1 gb disk @ same time?
as long processes aren't fighting on same file; let os sort out. that's job.
unless processes try , dump data in 1 big write, os in better position schedule disk writes. if use 1 big write, mighy try , partition in smaller chunks. might give os better chance of handling them.
of course hit limit somewhere. program might cpu-bound, memory-bound or disk-bound. might hit different limits depending on input or load. unless you've got evidence you're constantly disk-bound and you've got idea how solve that, i'd don't bother. because days write
system call actuall meant data directly sent disk long gone.
most operating systems these days use unallocated ram disk cache. , hdd's have built-in caches well. unless disable both of these (which give huge performance hit) there precious little connection between program completing write
, and data hitting plates or flash.
you might consider using memmap
(if os supports it), , let os's virtual memory work you. see e.g. architect notes varnish cache.
Comments
Post a Comment