i trying build matrix list , fill values of dict. works small data computer crashes when bigger data used (not enough ram). script heavy don't see how improve (first time in programming). thanks
import numpy np liste = ["a","b","c","d","e","f","g","h","i","j"] dico = {"a/b": 4, "c/d" : 2, "f/g" : 5, "g/h" : 2} #now i'd build square array (liste x liste) , fill values of # dict. def make_array(liste,dico): array1 = [] liste_i = [] #each line of array in liste: if liste_i : array1.append(liste_i) liste_i = [] j in liste: if dico.has_key(i+"/"+j): liste_i.append(dico[i+"/"+j]) elif dico.has_key(j+"/"+i): liste_i.append(dico[j+"/"+i]) else : liste_i.append(0) array1.append(liste_i) print array1 matrix = np.array(array1) print matrix.shape() print matrix return matrix make_array(liste,dico) thanks lot, answers, using in dico or list comprehensions improve speed of script, , helpfull. seems problem caused following function:
def clustering(matrix, liste_globale_occurences, output2): most_common_groups = [] y = scipy.spatial.distance.pdist(matrix) z = scipy.cluster.hierarchy.linkage(y,'average', 'euclidean') scipy.cluster.hierarchy.dendrogram(z) clust_h = scipy.cluster.hierarchy.fcluster(z, t = 15, criterion='distance') print clust_h print len(clust_h) most_common = collections.counter(clust_h).most_common(3) group1 = most_common[0][0] group2 = most_common[1][0] group3 = most_common[2][0] most_common_groups.append(group1) most_common_groups.append(group2) most_common_groups.append(group3) open(output2, 'w') results: # here begining of problem group in most_common_groups: i, val in enumerate(clust_h): if group == val: mise_en_page = "{0:36s} groupe co-occurences = {1:5s} \n" results.write(mise_en_page.format(str(liste_globale_occurences[i]),str(val))) when small file used, correct results, instance :
contact = groupe 2
contact b = groupe 2
contact c = groupe 2
contact d = groupe 2
contact e = groupe 3
contact f = groupe 3
but when heavy file used, 1 example per group :
contact = groupe 2
contact = groupe 2
contact = groupe 2
contact = groupe 2
contact e = groupe 3
contact e = groupe 3
you create matrix mat=len(liste)*len(liste) of zeros , go through dico , split key: val before '/' number of row , val after '/' number of column. @ way don't need use 'has_key' search function.
Comments
Post a Comment