python - Importing data with multiple IDs -


i need import data onto server. problem have data isn't quite in right format. put simply, looks this:

items_direc  id | co-ordinate 1  | 648 2  | 25 2  | 305 2  | 307 2  | 569 3  | 354 3  | 450 3  | 573 4  | 293 4  | 449 5  | 25 5  | 73 

i want more this:

1  | 648 2  | 25, 305, 307, 569,  3  | 354, 450, 573, 4  | 293, 449 5  | 25, 73 

this code have alter (this code assumes each id unique, no multiples above):

class item:     def __init__(self, iid, name):         self.iid = iid         self.name = name  class data:     def __str__(self):         return "item[iid=%s,name=%s]" % (self.iid, self.name)      def __init__(self):         self._items = {}          self._items_file = "%s/%s" % (data_direc, items_direc)      def add_item(self, item):         self._items[item.iid] = item      def __init_items(self):         f = open(self._items_file, 'r')         line in f:             data = line.rstrip('\r\n').split("|")             self.add_item(item(data[0], data[1]))         f.close()         print "items added" % len(self._items) 

so impression if use code on original data, won't consider multiple id's 1 in same. not that, data set quite large (100,000+) , not every id has same number of co-ordinates can't construct matrix , fill in values.

can give me in right direction? i'm not @ experienced python i've tried far has failed quite miserably.

you can use csv.reader , csv.writer handle pipe delimited data , collections.defaultdict accumulate each value key id. can use islice conveniently skip first few rows don't require, final output, sort rows id , write them out followed comma delimited list of values. eg:

import csv itertools import islice collections import defaultdict  dd = defaultdict(list) open('input') fin:     pipe_in = csv.reader(fin, delimiter='|')     key, val in islice(pipe_in, 3, none):         dd[key].append(val)  open('output', 'wb') fout:     pipe_out = csv.writer(fout, delimiter='|')     pipe_out.writerows([k, ', '.join(v)] k, v in sorted(dd.iteritems()))  # 1|648 # 2|25, 305, 307, 569 # 3|354, 450, 573 # 4|293, 449 # 5|25, 73 

Comments

Popular posts from this blog

Android layout hidden on keyboard show -

google app engine - 403 Forbidden POST - Flask WTForms -

c - Why would PK11_GenerateRandom() return an error -8023? -