python - Comparing a file containing a chromosomal region and another containing point coordinates -

August 15, 2014

could please advised on following problem. have csv files compare. first contains coordinates of specific points in genome (e.g. chr3: 987654 – 987654). other csv files contain coordinates of genomic regions (e.g.chr3: 135596 – 123456789). cross compare first file other files see if point locations in first file overlaps regional coordinates in other files , write set of overlap separate file. make things simple start, have drafted simple code cross compare between 2 csv files. strangely, code runs , prints coordinates not write point coordinates separate file. first question if approach (from code) @ comparing these 2 files optimal or there better way of doing this? secondly, why not writing separate file?

import csv  region = open ('region_test1.csv', 'rt', newline = '') reader_region = csv.reader (region, delimiter = ',')  dmc = open ('dmc_test.csv', 'rt', newline = '') reader_dmc = csv.reader (dmc, delimiter = ',')  dmc_testpoint = open ('dmc_testpoint.csv', 'wt', newline ='') writer_exon = csv.writer (dmc_testpoint, delimiter = ',')    col in reader_region:     chr_region = col[0]     start_region = int(col[1])     end_region = int(col [2])     col in reader_dmc:         chr_point = col[0]         start_point = int(col [1])         end_point = int(col[2])         if chr_region == chr_point , start_region <= start_point , end_region >= end_point:             print (true, col)         else:             print (false, col)             writer_exon.writerow(col)   region.close() dmc.close()

a couple of things wrong, not least of never check see if files opened successfully. glaring never close writer.

that said incredibly non-optimal way go program. file i/o slow. don't want keep rereading in factorial fashion. given how search requires possible comparisons you'll want store @ least 1 of 2 files in memory, , potentially use generator/iterator on other if dont wish store both complete sets of data in memory.

one have both sets loaded, proceed intersection checks

i'd suggest take @ http://docs.python.org/2/library/csv.html how use csv reader because doing doesn't appear make anysense because col[0], col[1] , col[2] aren't going think are.

these style , readability things but: names of iteration variables seem bit off, for col in ... should for token in ... because processing token token, , not column columns/line line, etc.

additionally nice pick consistent stick variables, start uppercase, save uppercase after '_'

that putting ' ' between objects , function noames , not others odd. again these dont change functionality of code.

Search This Blog

And

python - Comparing a file containing a chromosomal region and another containing point coordinates -

Comments

Post a Comment

Popular posts from this blog

visual studio - vb.net filter binding source by time -

php - SPIP: From Tag directly to an article -

jquery - isAjaxRequest always return false -