bash - Sampling without replacement using awk -

May 15, 2011

i have lot of text files this:

>algkaholaggataccatagatggcacgccct >blgkaholaggataccatagatggcacgccct >hlgkaholaggataccatagatggcacgccct >dlgkaholaggataccatagatggcacgccct >elgkaholaggataccatagatggcacgccct >flgkaholaggataccatagatggcacgccct >jggkaholaggataccatagatggcacgccct >pogkaholaggataccatagatggcacgccct

is there way sampling without replacement using awk?

for example, have 8 lines, , want sample 4 of these randomly in new file, without replacement. output should this:

>flgkaholaggataccatagatggcacgccct >pogkaholaggataccatagatggcacgccct     >algkaholaggataccatagatggcacgccct >blgkaholaggataccatagatggcacgccct

thanks in advance

how random sampling of 10% of lines?

awk 'rand()>0.9' yourfile1 yourfile2 anotherfile

i not sure mean "replacement"... there no replacement occurring here, random selection.

basically, looks @ each line of each file precisely once , generates random number on interval 0 1. if random number greater 0.9, line output. rolling 10 sided dice each line , printing if dice comes 10. no chance of line being printed twice - unless occurs twice in files, of course.

for added randomness (!) can add srand() @ start suggested @klashxx

awk 'begin{srand()} rand()>0.9' yourfile(s)

Search This Blog

And

bash - Sampling without replacement using awk -

Comments

Post a Comment

Popular posts from this blog

Android layout hidden on keyboard show -

google app engine - 403 Forbidden POST - Flask WTForms -

how to run a query SQL in node.js mysql -