Tuesday, February 17, 2009

combine redundant rows into one

> exprs_133a[1:4, ]
Representative.Public.ID Gene.Symbol Chromosomal.Location CHP_SKN1.CEL MDA.CEL ratios HGU133.IDs..selected.4..
1007_s_at U48705 DDR1 chr6p21.3 10.265215 10.554566 0.2893516 discoidin domain receptor tyrosine kinase 1
1053_at M87338 RFC2 chr7q11.23 9.305431 9.463867 0.1584354 replication factor C (activator 1) 2, 40kDa
117_at X51757 HSPA6 chr1q23 9.255379 9.053673 -0.2017056 heat shock 70kDa protein 6 (HSP70B)
121_at X69699 PAX8 chr2q12-q14 10.405100 10.522243 0.1171425 paired box 8
> geneID.133a <- as.character(exprs_133a[ ,1]);
> length(geneID.133a)
[1] 22283
> sum(geneID.plus2 %in% geneID.133a);
[1] 22442
> tmp1 <- apply(exprs_133a[ ,4:6], 2, function(v) tapply(v, factor(geneID.133a), mean));
> tmp2 <- apply(exprs_133a[ ,c(1, 2, 3, 7)], 2, function(v) tapply(v, factor(geneID.133a), function(v1) v1[1]));

> tmp1[1:3, ]
CHP_SKN1.CEL MDA.CEL ratios
AA001552 9.414568 9.594727 0.18015902
AA004579 8.302277 8.746708 0.44443149
AA004757 9.328749 9.383631 0.05488192
> tmp2[1:3, ]
Representative.Public.ID Gene.Symbol Chromosomal.Location HGU133.IDs..selected.4..
AA001552 "AA001552" "C19orf54" "chr19q13.2" "chromosome 19 open reading frame 54"
AA004579 "AA004579" "TAF1B" "chr2p25" "TATA box binding protein (TBP)-associated factor, RNA polymerase I, B, 63kDa"
AA004757 "AA004757" "ZNF236" "chr18q22-q23" "zinc finger protein 236"
>
> exprs_133a.unique <- data.frame(tmp1, tmp2);
> exprs_133a.unique[1:3, ];
CHP_SKN1.CEL MDA.CEL ratios Representative.Public.ID Gene.Symbol Chromosomal.Location
AA001552 9.414568 9.594727 0.18015902 AA001552 C19orf54 chr19q13.2
AA004579 8.302277 8.746708 0.44443149 AA004579 TAF1B chr2p25
AA004757 9.328749 9.383631 0.05488192 AA004757 ZNF236 chr18q22-q23
HGU133.IDs..selected.4..
AA001552 chromosome 19 open reading frame 54
AA004579 TATA box binding protein (TBP)-associated factor, RNA polymerase I, B, 63kDa
AA004757 zinc finger protein 236

No comments: