perl - Compare lines in a file -
i have large dataset looks this:
identifier,feature 1, feature 2, feature 3, ... 29239999, 2,5,3,... 29239999, 2,4,3,... 29239999, 2,6,7,... 17221882, 2,6,7,... 17221882, 1,1,7,...
i write script groups these lines identifier (so first 3 , last 2 grouped) in order compare them. so, example, 3 29239999 , take 1 of 2 feature 3 3 , last feature 3 7. in particular, take 1 has largest feature 2 (it third line 29239999).
my specific question: of 2 options: (1) hashes , (2) making each identifier object , comparing them, best?
if working "large" data set , data grouped id in example, suggest process these go instead of building huge hash.
use strict; use warnings; # skip header row <data>; @group; $lastid = ''; while (<data>) { ($id, $data) = split /,\s*/, $_, 2; if ($id ne $lastid) { processdata($lastid, @group); @group = (); } push @group, $data; $lastid = $id; } processdata($lastid, @group); sub processdata { $id = shift; return if ! @_; print "$id " . scalar(@_) . "\n"; # rest of code here } __data__ identifier,feature 1, feature 2, feature 3, ... 29239999, 2,5,3,... 29239999, 2,4,3,... 29239999, 2,6,7,... 17221882, 2,6,7,... 17221882, 1,1,7,...
outputs
29239999 3 17221882 2
Comments
Post a Comment