perl - Collapse rows with multiple fields -


i have code:

awk '!seen[$1,$2]++{a[$1]=(a[$1] ? a[$1]", " : "\t") $2} end{for (i in a) print a[i]} ' inputfile 

and working collapse rows more 2 fields base on first field index.

input file (three column tab-delimited):

protein_1   membrane    1e-4 protein_1   intracellular   1e-5 protein_2   membrane    1e-50 protein_2   citosol 1e-40 

desired output (three column tab-delimited):

protein_1   membrane, intracellular 1e-4, 1e-5 protein_2   membrane, citosol   1e-50, 1e-40 

thanks!

stack here:

awk '!seen[$1,$2]++{a[$1]=(a[$1] ? a[$1]"\t" : "\t") $2};{a[$1]=(a[$1] ? a[$1]", " : "\t") $3} end{for (i in a) print a[i]} ' 1 inputfile 

i hoping post awk wizardry, i'll go ahead , throw out longer form perl script now:

use strict; use warnings;  @cols = (); $lastprotein = '';  while (<data>) {     chomp;     ($protein, @data) = split "\t";      if ($protein ne $lastprotein && @cols) {         print join("\t", $lastprotein, map {join ', ', @$_} @cols), "\n";         @cols = ();     }      push @{$cols[$_]}, $data[$_] (0..$#data);     $lastprotein = $protein; }  print join("\t", $lastprotein, map {join ', ', @$_} @cols), "\n";  __data__ protein_1   membrane    1e-4 protein_1   intracellular   1e-5 protein_2   membrane    1e-50 protein_2   citosol 1e-40 

outputs

protein_1       membrane, intracellular 1e-4, 1e-5 protein_2       membrane, citosol       1e-50, 1e-40 

Comments

Popular posts from this blog

Android layout hidden on keyboard show -

google app engine - 403 Forbidden POST - Flask WTForms -

c - Why would PK11_GenerateRandom() return an error -8023? -