regex - Using grep to find keywords, and then list the following characters until the next ; character -
i have long list of chemical conditions in following form:
0.2m sodium acetate; 0.3m ammonium thiosulfate;
the molarities can listed in various ways:
x.xm, x.x m, x m
where number of x
digits vary. want 2 things, select numbers using grep, , list following characters until ;
. if select 0.2m
in example above, want able list sodium acetate
.
for selecting, have tried following:
grep '[0-9]*.[0-9]*[[:space:]]*m' file
so there arbitrary number of digits , spaces, ends m
. problem is, selects following:
0.05mrbcl+mgcl2;
i not quite sure why selected. ideally, want 0.05m
selected, , list rbcl+mgcl2
. how can achieve this?
(the system os x yosemite)
it matches because:
[0-9]*
matches 0
.
matches character (this .
in case, meant escape it)
[0-9]*
matches 05
[[:space:]]*
matches empty string between 05
, m
m
matches m
as how want: think if don't want numbers printed output, require either lookbehind assertion or ability print specific capture group, sounds os x's grep
doesn't support. use similar approach more powerful tool, though:
$ cat test.txt 0.2m sodium acetate; 0.3m ammonium thiosulfate; 0.05mrbcl+mgcl2; 1.23m dihydrogen monoxide; 45 m xenon quadroxide; $ perl -ne 'while (/([0-9]*\.)?[0-9]+\s*m\s*([^;]+)/g) { print "$2\n"; }' test.txt sodium acetate ammonium thiosulfate rbcl+mgcl2 dihydrogen monoxide xenon quadroxide
written out, regex is:
([0-9]*\.)?
optionally, digits , decimal point
[0-9]+
1 or more digits
\s*m\s*
letter m, spacing around it
([^;]+)
characters until next semicolon (the thing want print)
Comments
Post a Comment