bash - Sampling without replacement using awk -


i have lot of text files this:

>algkaholaggataccatagatggcacgccct >blgkaholaggataccatagatggcacgccct >hlgkaholaggataccatagatggcacgccct >dlgkaholaggataccatagatggcacgccct >elgkaholaggataccatagatggcacgccct >flgkaholaggataccatagatggcacgccct >jggkaholaggataccatagatggcacgccct >pogkaholaggataccatagatggcacgccct 

is there way sampling without replacement using awk?

for example, have 8 lines, , want sample 4 of these randomly in new file, without replacement. output should this:

>flgkaholaggataccatagatggcacgccct >pogkaholaggataccatagatggcacgccct     >algkaholaggataccatagatggcacgccct >blgkaholaggataccatagatggcacgccct 

thanks in advance

how random sampling of 10% of lines?

awk 'rand()>0.9' yourfile1 yourfile2 anotherfile 

i not sure mean "replacement"... there no replacement occurring here, random selection.

basically, looks @ each line of each file precisely once , generates random number on interval 0 1. if random number greater 0.9, line output. rolling 10 sided dice each line , printing if dice comes 10. no chance of line being printed twice - unless occurs twice in files, of course.

for added randomness (!) can add srand() @ start suggested @klashxx

awk 'begin{srand()} rand()>0.9' yourfile(s) 

Comments

Popular posts from this blog

Android layout hidden on keyboard show -

google app engine - 403 Forbidden POST - Flask WTForms -

c - Why would PK11_GenerateRandom() return an error -8023? -