Saturday, October 11, 2014

Tutorial on distributional semantics (vector space models)

Python: encoding/decoding a string starting with \x


If a string starts with \x (e.g., '\xea \xe1\xeb\xe8\xe6\xed\xe5\xec\xf3' for cyrillic) decode it using 'string_escape' :

string=string.decode('string_escape')

Lowercase strings in file

 Lowercase strings in text file using sed(works for cyrillic):
 sed -e 's/\(.*\)/\L\1/' file