c# - Getting a count of unique strings from a List<string[]> into a dictionary -


i want input list<string[]> ,

the output dictionary keys unique strings used index , values array of floats each position in array representing count of key string[] in list<string[]>

so far here attempted

static class ct {     //counts terms in array     public static dictionary<string, float[]> termfreq(list<string[]> text)     {         list<string> unique = new list<string>();          foreach (string[] s in text)         {             list<string> groups = s.distinct().tolist();             unique.addrange(groups);         }          string[] index = unique.distinct().toarray();          dictionary<string, float[]> countset = new dictionary<string, float[]>();            return countset;     }  }     static void main()     {         /* local variable definition */           list<string[]> doc = new list<string[]>();         string[] = { "that", "is", "a", "cat" };         string[] b = { "that", "bat", "flew","over","the", "cat" };         doc.add(a);         doc.add(b);         // console.writeline(doc);           dictionary<string, float[]> ret = ct.termfreq(doc);          foreach (keyvaluepair<string, float[]> kvp in ret)         {             console.writeline("key = {0}, value = {1}", kvp.key, kvp.value);          }           console.readline();      } 

i got stuck on dictionary part. effective way implement this?

it sounds use like:

var dictionary = doc     .selectmany(array => array)     .distinct()     .todictionary(word => word,                   word => doc.select(array => array.count(x => x == word))                              .toarray()); 

in other words, first find distinct set of words, each word, create mapping.

to create mapping, @ each array in original document, , find count of occurrences of word in array. (so each array maps int.) use linq perform mapping on whole document, toarray creating int[] particular word... , that's value word's dictionary entry.

note creates dictionary<string, int[]> rather dictionary<string, float[]> - seems more sensible me, cast result of count float if really wanted to.


Comments