python - Importing data with multiple IDs -
i need import data onto server. problem have data isn't quite in right format. put simply, looks this:
items_direc id | co-ordinate 1 | 648 2 | 25 2 | 305 2 | 307 2 | 569 3 | 354 3 | 450 3 | 573 4 | 293 4 | 449 5 | 25 5 | 73
i want more this:
1 | 648 2 | 25, 305, 307, 569, 3 | 354, 450, 573, 4 | 293, 449 5 | 25, 73
this code have alter (this code assumes each id unique, no multiples above):
class item: def __init__(self, iid, name): self.iid = iid self.name = name class data: def __str__(self): return "item[iid=%s,name=%s]" % (self.iid, self.name) def __init__(self): self._items = {} self._items_file = "%s/%s" % (data_direc, items_direc) def add_item(self, item): self._items[item.iid] = item def __init_items(self): f = open(self._items_file, 'r') line in f: data = line.rstrip('\r\n').split("|") self.add_item(item(data[0], data[1])) f.close() print "items added" % len(self._items)
so impression if use code on original data, won't consider multiple id's 1 in same. not that, data set quite large (100,000+) , not every id has same number of co-ordinates can't construct matrix , fill in values.
can give me in right direction? i'm not @ experienced python i've tried far has failed quite miserably.
you can use csv.reader
, csv.writer
handle pipe delimited data , collections.defaultdict
accumulate each value key id. can use islice
conveniently skip first few rows don't require, final output, sort rows id , write them out followed comma delimited list of values. eg:
import csv itertools import islice collections import defaultdict dd = defaultdict(list) open('input') fin: pipe_in = csv.reader(fin, delimiter='|') key, val in islice(pipe_in, 3, none): dd[key].append(val) open('output', 'wb') fout: pipe_out = csv.writer(fout, delimiter='|') pipe_out.writerows([k, ', '.join(v)] k, v in sorted(dd.iteritems())) # 1|648 # 2|25, 305, 307, 569 # 3|354, 450, 573 # 4|293, 449 # 5|25, 73
Comments
Post a Comment