python - Find common elements in two files and unite them in one file -


i have tab-delimited file1,

marker1 transcript0 scaff1 1 24 marker2 transcript1 scaff2 1 53 marker3 transcript1 scaff2 1 53 marker4 transcript2 scaff3 1 89 marker5 transcript2 scaff3 1 89 marker6 transcript2 scaff3 1 89 

and file2,

contig1 transcript1 scaff2 1 53 contig2 transcript1 scaff2 1 53 contig3 transcript1 scaff2 1 53 contig4 transcript2 scaff3 1 89 

my desired output file is,

transcript1 marker2 contig1 scaff2 1 53 transcript1 marker3 contig2 scaff2 1 53 transcript1 0       contig3 scaff2 1 53 transcript2 marker4 contig4 scaff3 1 89 transcript2 marker5 0       scaff3 1 89 transcript2 marker6 0       scaff3 1 89 

basically, need unite 2 files if there transcripts in common. 2 files have different lengths. have tried using dictionary , join comman lines, results no good. can give inductions or ideas how can on python? have tried join,

 join -1 2 -2 2 file1 file2 

and code,

f1=open('file1','r') f2=open('file2','r') output = open('common','w')  dicta= dict() line1 in f1:     lista = line1.rstrip('\n').split('\t')     dicta[lista[1]] = lista  line1 in f2:     new_list=line1.rstrip('\n').split('\t')     query=new_list[0]     subject=new_list[1]     scaff=new_list[2]     chrom=new_list[3]     cm=new_list[4]     if subject in dicta:         lista = dicta[subject]         output.write(subject+'\t'+query+'\t'+str(lista[0])+'\t'+str(lista[1])+'\t'+str(lista[2])+'\t'+str(lista[3])+'\t'+chrom+'\t'+cm+'\t'+scaff+'\n') output.close() 

how (python 3):

from collections import defaultdict itertools import zip_longest  open('file1', 'r') f1, open('file2', 'r') f2, \                                open('common', 'w') fout:     remainder = {}     markers = defaultdict(list)     line in f1:         fields = line.split()         markers[fields[1]].append(fields[0])         remainder[fields[1]] = fields[2:]      contigs = defaultdict(list)     line in f2:         fields = line.split()         contigs[fields[1]].append(fields[0])         remainder[fields[1]] = fields[2:]      print(remainder)     transcripts = sorted(set(markers.keys()) | set(contigs.keys()))     transcript in transcripts:         rest = remainder[transcript]         zipped = zip_longest(markers[transcript], contigs[transcript],                              fillvalue='0')         marker, contig in zipped:             print(transcript, marker, contig, *rest, sep='\t') 

outputs:

transcript0 marker1 0   scaff1  1   24 transcript1 marker2 contig1 scaff2  1   53 transcript1 marker3 contig2 scaff2  1   53 transcript1 0   contig3 scaff2  1   53 transcript2 marker4 contig4 scaff3  1   89 transcript2 marker5 0   scaff3  1   89 transcript2 marker6 0   scaff3  1   89 

Comments