perl - Collapse rows with multiple fields -
i have code:
awk '!seen[$1,$2]++{a[$1]=(a[$1] ? a[$1]", " : "\t") $2} end{for (i in a) print a[i]} ' inputfile
and working collapse rows more 2 fields base on first field index.
input file (three column tab-delimited):
protein_1 membrane 1e-4 protein_1 intracellular 1e-5 protein_2 membrane 1e-50 protein_2 citosol 1e-40
desired output (three column tab-delimited):
protein_1 membrane, intracellular 1e-4, 1e-5 protein_2 membrane, citosol 1e-50, 1e-40
thanks!
stack here:
awk '!seen[$1,$2]++{a[$1]=(a[$1] ? a[$1]"\t" : "\t") $2};{a[$1]=(a[$1] ? a[$1]", " : "\t") $3} end{for (i in a) print a[i]} ' 1 inputfile
i hoping post awk wizardry, i'll go ahead , throw out longer form perl script now:
use strict; use warnings; @cols = (); $lastprotein = ''; while (<data>) { chomp; ($protein, @data) = split "\t"; if ($protein ne $lastprotein && @cols) { print join("\t", $lastprotein, map {join ', ', @$_} @cols), "\n"; @cols = (); } push @{$cols[$_]}, $data[$_] (0..$#data); $lastprotein = $protein; } print join("\t", $lastprotein, map {join ', ', @$_} @cols), "\n"; __data__ protein_1 membrane 1e-4 protein_1 intracellular 1e-5 protein_2 membrane 1e-50 protein_2 citosol 1e-40
outputs
protein_1 membrane, intracellular 1e-4, 1e-5 protein_2 membrane, citosol 1e-50, 1e-40
Comments
Post a Comment