regex - Using grep to find keywords, and then list the following characters until the next ; character -
i have long list of chemical conditions in following form:
0.2m sodium acetate; 0.3m ammonium thiosulfate; the molarities can listed in various ways:
x.xm, x.x m, x m where number of x digits vary. want 2 things, select numbers using grep, , list following characters until ;. if select 0.2m in example above, want able list sodium acetate.
for selecting, have tried following:
grep '[0-9]*.[0-9]*[[:space:]]*m' file so there arbitrary number of digits , spaces, ends m. problem is, selects following:
0.05mrbcl+mgcl2; i not quite sure why selected. ideally, want 0.05m selected, , list rbcl+mgcl2. how can achieve this?
(the system os x yosemite)
it matches because:
[0-9]* matches 0
. matches character (this . in case, meant escape it)
[0-9]* matches 05
[[:space:]]* matches empty string between 05 , m
m matches m
as how want: think if don't want numbers printed output, require either lookbehind assertion or ability print specific capture group, sounds os x's grep doesn't support. use similar approach more powerful tool, though:
$ cat test.txt 0.2m sodium acetate; 0.3m ammonium thiosulfate; 0.05mrbcl+mgcl2; 1.23m dihydrogen monoxide; 45 m xenon quadroxide; $ perl -ne 'while (/([0-9]*\.)?[0-9]+\s*m\s*([^;]+)/g) { print "$2\n"; }' test.txt sodium acetate ammonium thiosulfate rbcl+mgcl2 dihydrogen monoxide xenon quadroxide written out, regex is:
([0-9]*\.)? optionally, digits , decimal point
[0-9]+ 1 or more digits
\s*m\s* letter m, spacing around it
([^;]+) characters until next semicolon (the thing want print)
Comments
Post a Comment