bash - Sampling without replacement using awk -
i have lot of text files this:
>algkaholaggataccatagatggcacgccct >blgkaholaggataccatagatggcacgccct >hlgkaholaggataccatagatggcacgccct >dlgkaholaggataccatagatggcacgccct >elgkaholaggataccatagatggcacgccct >flgkaholaggataccatagatggcacgccct >jggkaholaggataccatagatggcacgccct >pogkaholaggataccatagatggcacgccct
is there way sampling without replacement using awk?
for example, have 8 lines, , want sample 4 of these randomly in new file, without replacement. output should this:
>flgkaholaggataccatagatggcacgccct >pogkaholaggataccatagatggcacgccct >algkaholaggataccatagatggcacgccct >blgkaholaggataccatagatggcacgccct
thanks in advance
how random sampling of 10% of lines?
awk 'rand()>0.9' yourfile1 yourfile2 anotherfile
i not sure mean "replacement"... there no replacement occurring here, random selection.
basically, looks @ each line of each file precisely once , generates random number on interval 0 1. if random number greater 0.9, line output. rolling 10 sided dice each line , printing if dice comes 10. no chance of line being printed twice - unless occurs twice in files, of course.
for added randomness (!) can add srand()
@ start suggested @klashxx
awk 'begin{srand()} rand()>0.9' yourfile(s)
Comments
Post a Comment