Build An Array from a list and a dictionnary with Python -


i trying build matrix list , fill values of dict. works small data computer crashes when bigger data used (not enough ram). script heavy don't see how improve (first time in programming). thanks

import numpy np liste = ["a","b","c","d","e","f","g","h","i","j"]  dico = {"a/b": 4, "c/d" : 2, "f/g" : 5, "g/h" : 2}  #now i'd build square array (liste x liste) , fill values of # dict.   def make_array(liste,dico):     array1 = []     liste_i = [] #each line of array     in liste:         if liste_i :             array1.append(liste_i)             liste_i = []         j in liste:             if dico.has_key(i+"/"+j):                  liste_i.append(dico[i+"/"+j])             elif dico.has_key(j+"/"+i):                 liste_i.append(dico[j+"/"+i])             else :                 liste_i.append(0)     array1.append(liste_i)     print array1     matrix = np.array(array1)     print matrix.shape()     print matrix     return matrix  make_array(liste,dico) 

thanks lot, answers, using in dico or list comprehensions improve speed of script, , helpfull. seems problem caused following function:

def clustering(matrix, liste_globale_occurences, output2):     most_common_groups = []     y = scipy.spatial.distance.pdist(matrix)     z = scipy.cluster.hierarchy.linkage(y,'average', 'euclidean')     scipy.cluster.hierarchy.dendrogram(z)     clust_h = scipy.cluster.hierarchy.fcluster(z, t = 15, criterion='distance')     print clust_h     print len(clust_h)     most_common = collections.counter(clust_h).most_common(3)     group1 = most_common[0][0]     group2 = most_common[1][0]     group3 = most_common[2][0]     most_common_groups.append(group1)     most_common_groups.append(group2)     most_common_groups.append(group3)     open(output2, 'w') results: # here begining of problem          group in most_common_groups:              i, val in enumerate(clust_h):                 if group == val:                     mise_en_page = "{0:36s} groupe co-occurences = {1:5s} \n"                     results.write(mise_en_page.format(str(liste_globale_occurences[i]),str(val))) 

when small file used, correct results, instance :

contact = groupe 2

contact b = groupe 2

contact c = groupe 2

contact d = groupe 2

contact e = groupe 3

contact f = groupe 3

but when heavy file used, 1 example per group :

contact = groupe 2

contact = groupe 2

contact = groupe 2

contact = groupe 2

contact e = groupe 3

contact e = groupe 3

you create matrix mat=len(liste)*len(liste) of zeros , go through dico , split key: val before '/' number of row , val after '/' number of column. @ way don't need use 'has_key' search function.


Comments