Thursday, March 19, 2015

Working with Giza++

1/ Edit file_spec.h to

     struct tm *local;
     time_t t;
     char *user;
     char time_stmp[19];
     char *file_spec = 0;

     t = time(NULL);
     local = localtime(&t);

     sprintf(time_stmp, "%04d-%02d-%02d.%02d%02d%02d.", 1900 +  local->tm_year,
         (local->tm_mon + 1), local->tm_mday, local->tm_hour,
         local->tm_min, local->tm_sec);
     user = getenv("USER");
  file_spec = (char *)malloc(sizeof(char) *
                 (strlen(time_stmp) + strlen(user) + 1));
  file_spec[0] = '\0';
  strcat(file_spec, time_stmp) ;
  strcat(file_spec, user);
  return file_spec;

2/
$ cp Makefile Makefile.orig
$ sed -i 's/ -DBINARY_SEARCH_FOR_TTABLE//;s/mkdir/mkdir -p/g' Makefile
$ make
 
3/ Run the scripts: 
plain2snt.out sv-text.txt da-text.txt 
GIZA++ -S sv-text.vcb -T da-text.vcb -C sv-text_da-text.snt  

Saturday, October 11, 2014

Tutorial on distributional semantics (vector space models)

Python: encoding/decoding a string starting with \x


If a string starts with \x (e.g., '\xea \xe1\xeb\xe8\xe6\xed\xe5\xec\xf3' for cyrillic) decode it using 'string_escape' :

string=string.decode('string_escape')

Lowercase strings in file

 Lowercase strings in text file using sed(works for cyrillic):
 sed -e 's/\(.*\)/\L\1/' file

Thursday, May 22, 2014

Increase heap size in weka

java -Xmx600M -jar /usr/share/java/weka.jar

Monday, November 25, 2013

Copy files from remote server

scp -P port user@server:/path_to_the_file/file  /path_to_local_dir/
get file size:
ls -sh file

Monday, November 11, 2013

Select from Many-to-Many with GroupBy

Three entities: 
Cue(id, name) 
Reaction(id, name) 
Freqs(id, cue_id FK, reaction_id FK, some_other_data)

Get most frequent reactions and their frequency:
Select name from rdb_reaction 
where id in (SELECT reaction_id as freq
  FROM rdb_formdata group by reaction_id order by count(*) desc limit 20);


Count number of (cue, reaction) pairs like (cue.name, reaction.name, count):
SELECT distinct
  c.name cue_name,
  b.name reaction_name, 

 (SELECT
  Count(*)
  FROM 
  rdb_freqs a1
  where 
  a1.cue_id =  a.cue_id and 
  a1.reaction_id = a.reaction_id 
  ) as count
FROM 
  rdb_freqs a 
  inner JOIN rdb_reaction b ON a.reaction_id = b.id
  inner JOIN rdb_cue c ON a.cue_id = c.id
  order by count desc