regex - Using grep to find keywords, and then list the following characters until the next ; character -


i have long list of chemical conditions in following form:

0.2m sodium acetate; 0.3m ammonium thiosulfate; 

the molarities can listed in various ways:

x.xm, x.x m, x m 

where number of x digits vary. want 2 things, select numbers using grep, , list following characters until ;. if select 0.2m in example above, want able list sodium acetate.

for selecting, have tried following:

grep '[0-9]*.[0-9]*[[:space:]]*m' file 

so there arbitrary number of digits , spaces, ends m. problem is, selects following:

0.05mrbcl+mgcl2; 

i not quite sure why selected. ideally, want 0.05m selected, , list rbcl+mgcl2. how can achieve this?

(the system os x yosemite)

it matches because:
[0-9]* matches 0
. matches character (this . in case, meant escape it)
[0-9]* matches 05
[[:space:]]* matches empty string between 05 , m
m matches m

as how want: think if don't want numbers printed output, require either lookbehind assertion or ability print specific capture group, sounds os x's grep doesn't support. use similar approach more powerful tool, though:

$ cat test.txt  0.2m sodium acetate; 0.3m ammonium thiosulfate; 0.05mrbcl+mgcl2; 1.23m dihydrogen monoxide; 45 m xenon quadroxide;  $ perl -ne 'while (/([0-9]*\.)?[0-9]+\s*m\s*([^;]+)/g) { print "$2\n"; }' test.txt  sodium acetate ammonium thiosulfate rbcl+mgcl2 dihydrogen monoxide xenon quadroxide 

written out, regex is:
([0-9]*\.)? optionally, digits , decimal point
[0-9]+ 1 or more digits
\s*m\s* letter m, spacing around it
([^;]+) characters until next semicolon (the thing want print)


Comments

Popular posts from this blog

css - SVG using textPath a symbol not rendering in Firefox -

Java 8 + Maven Javadoc plugin: Error fetching URL -

order - Notification for user in user account opencart -