Tuesday, January 24, 2012

Change file encoding in linux

1. Get encoding:

file -bi test.txt
text/plain; charset=us-ascii

2. Change it:

iconv -f ascii -t utf8 [filename] > [newfilename]

Sunday, January 15, 2012

R: non-linear regression with nls


dir_degree<-read.table("/home/bliss/Study/Bauman/Magistr&Disser/Data/degrees_undir",sep='\t')
dir_degree<-as.data.frame(dir_degree)
degrees<-dir_degree$V2
#calculate frequences
distr<-tabulate(degrees)
# sort by frequency
distr<-sort(distr,decreasing=T)
#calculate probabilities
prob<-distr/sum(distr)

attach(as.data.frame(prob))
x<-c(1:length(prob))

#non-linear regression
model<-nls(prob ~ alfa*x^(-gamma), start=list(alfa=-10, gamma=-1/4), algorithm="port", trace=T)

awk , add column to file

Add column with line number to file:

awk '{print NR"\t"$0}' degrees_dir > degrees_dir_order

After:

head degrees_dir_order
1 669
2 667
3 600
4 596
5 580
6 549
7 490
8 460
9 454
10 447

Graphs in python

Using networkx (http://networkx.lanl.gov/)

Loading data from file (format: one\t two\t23\n):

ass_base={}
ass_file=open("avs_weights_utf8.txt",'r')
for line in ass_file.readlines():
parts=line.split('\t')
ass_base[parts[0]+'\t'+parts[1]]=parts[2].replace('\n','')

import networkx as nx
graph=nx.DiGraph()

for key in ass_base.keys():
parts=key.split('\t')
graph.add_weighted_edges_from([(parts[0],parts[1],float(ass_base[key]))])


#average node degree
round( sum([d[1] for d in graph.degree_iter()])/float(len(graph)), 4)

#get all degrees
degrees=sorted(degrees.values(), reverse=True)