I am relevant

Share

Worst day at work - ever!

Share

Sequence evaluation

When processing sequences obtained from a vendor, it is useful to have an idea of how well the sequencing reactions worked, both in an absolute sense and relative to other recently obtained sequences in the same project. What follows is a primary sequence independent method of evaluating a collection (i.e. and order from an outside vendor) of sequences.

The first step is to align sequences by nucleotide index (ignoring the actual sequence). Start by reading the sequences into a list. I use the list s.b to hold forward (5’ to 3’) sequences, and the list s.f to hold the reverse (but in the 5’ to 3’ orientation, as sequenced) sequences:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
rm(list=ls(all=TRUE))
library(seqinr)

working.dir <- "B:/<my-working-dir>/"


back.files <- list.files( paste(working.dir, "back/", sep="" ))
for.files <- list.files( paste(working.dir, "for/", sep="" ))

> back.files[1:20]
[1] "MBC20120428a-A1-PXMF1.seq" "MBC20120428a-A10-PXMF1.seq"
[3] "MBC20120428a-A11-PXMF1.seq" "MBC20120428a-A12-PXMF1.seq"
[5] "MBC20120428a-A2-PXMF1.seq" "MBC20120428a-A3-PXMF1.seq"
[7] "MBC20120428a-A4-PXMF1.seq" "MBC20120428a-A5-PXMF1.seq"
[9] "MBC20120428a-A6-PXMF1.seq" "MBC20120428a-A7-PXMF1.seq"
[11] "MBC20120428a-A8-PXMF1.seq" "MBC20120428a-A9-PXMF1.seq"
[13] "MBC20120428a-B1-PXMF1.seq" "MBC20120428a-B10-PXMF1.seq"
[15] "MBC20120428a-B11-PXMF1.seq" "MBC20120428a-B12-PXMF1.seq"
[17] "MBC20120428a-B2-PXMF1.seq" "MBC20120428a-B3-PXMF1.seq"
[19] "MBC20120428a-B4-PXMF1.seq" "MBC20120428a-B5-PXMF1.seq"
>

Next determine the number of files read and create a list of that length to hold the sequences. Then read them in and inspect a sequence:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
s.b <- list()
length(s.b) <- length(back.files)
s.f <- list()
length(s.f) <- length(for.files)

for(i in 1:length(back.files)){
s.b[[i]] <- read.fasta(paste( working.dir, "back/", back.files[[i]], sep=""))
}

for(i in 1:length(for.files)){
s.f[[i]] <- read.fasta(paste( working.dir, "for/", for.files[[i]], sep=""))
}

> getSequence(s.b[[2]])[[1]][1:200]
[1] "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "c" "n" "n" "n"
[19] "g" "t" "c" "c" "a" "c" "t" "g" "c" "g" "g" "c" "c" "g" "c" "c" "a" "t"
[37] "g" "g" "g" "a" "t" "g" "g" "a" "g" "c" "t" "g" "t" "a" "t" "c" "a" "t"
[55] "c" "c" "t" "c" "t" "t" "c" "t" "t" "g" "g" "t" "a" "g" "c" "a" "a" "c"
[73] "a" "g" "c" "t" "a" "c" "a" "g" "g" "c" "g" "c" "g" "c" "a" "c" "t" "c"
[91] "c" "g" "a" "t" "a" "t" "t" "g" "t" "g" "a" "t" "g" "a" "c" "t" "c" "a"
[109] "g" "t" "c" "t" "c" "c" "a" "c" "t" "c" "t" "c" "c" "c" "t" "g" "c" "c"
[127] "c" "g" "t" "c" "a" "c" "c" "c" "c" "t" "g" "g" "c" "g" "a" "g" "c" "c"
[145] "g" "g" "c" "c" "g" "c" "c" "a" "t" "c" "t" "c" "c" "t" "g" "c" "a" "g"
[163] "g" "t" "c" "t" "a" "g" "t" "c" "a" "g" "a" "g" "c" "c" "t" "c" "c" "t"
[181] "a" "c" "a" "t" "a" "a" "t" "g" "g" "a" "t" "a" "c" "a" "a" "c" "t" "a"
[199] "t" "a"


Note that ambiguities are indicated with an “n”. The sequence evaluation will involve counting the number of ambiguitites at each index position. The expectation is that initially - first 25 or so bases - will have a large number of ambiguities, falling to near zero at position 50. This is the run length required to get the primer annealed and incoporating nucleotides. Next will follow 800-1200 positions with near zero ambiguity count. How long exactly is a function of the sequencing quality. Towards the end of the run the ambiguities begin to rise as the polymerase loses energy. Finally the ambiguity count will fall as the reads terminate.

Create a vector nbsum that will tally the count of ambiguities at a given index. Then process through each sequence and count, at each index, the number of ambiguities. The total count of ambiguities is entered into nbsum at the corresponding index position.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
for( i in 1:length(s.b)){
for( j in 1:length( getSequence(s.b[[i]][[1]]))){
if( getSequence(s.b[[i]])[[1]][j] == "n") nbsum[j] <- nbsum[j] + 1
}
}

> nbsum[1:100]
[1] 167 168 168 168 168 166 163 164 160 153 149 142 131 150 135 125 120 111
[19] 99 93 79 80 59 51 61 52 48 38 26 20 17 17 22 20 18 14
[37] 16 15 13 11 14 16 23 21 13 12 6 7 5 6 9 5 3 3
[55] 1 4 3 1 1 3 3 3 2 0 1 0 0 0 0 1 1 1
[73] 5 5 12 14 21 25 24 28 29 31 21 20 8 8 7 10 4 2
[91] 2 3 5 1 3 0 3 1 1 0
>


x <- 1:1200
plot(nbsum[x])

Overlay the reverse reads in red.

1
2
3
4
5
6
7
8
9
10
nfsum <- vector( mode="integer", length=2000)

for( i in 1:length(s.f)){
for( j in 1:length( getSequence(s.f[[i]][[1]]))){
if( getSequence(s.f[[i]])[[1]][j] == "n") nfsum[j] <- nfsum[j] + 1
}
}

points(nfsum[x], col="red")

I have created a shiny app that implements the above code. Download it here.

Share

Automated colony counting

In a previous post I discussed using a 96 channel pipettor and Omniplates for titering bacterial cultures. Here I show how to automate the counting process. Start with a two dimension serial dilution of a culture as shown on the plate below:

A 2D dilution simulation suggests that a small number of regions will be in the correct range for accurate titer determination (red cells):

Capture an image:

1
2
3
4
5
6
7
8
9
10
rm(list=ls(all=TRUE))
library(ggplot2)
library("EBImage")

file.prefix <- "3c"
f<- paste( getwd(), "/input/", file.prefix, ".JPG", sep="")

pic = readImage(f)
display(pic)

and convert to binary black/white:

1
2
3
4

pic.thresh <- pic > 0.4
#display(pic.thresh)

Next perform a series of erodes and dilates to sharpen the colonies:

1
2
3
4
5

pic2 <- erode(pic.thresh)
pic2 <- dilate(pic2)
display(pic2)

Then count. I will decompose the main image into 96 smaller images which can then each be processed individually. Identify the upper right corner origin and the segment.width, 103 pixels in this case. Print out the coordinates of individual segments

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
segment.width <- 103
origin <- c(35,20)

# A[ rows, cols ]
imgs <- rep(list(matrix(data=NA,nrow=segment.width+1,ncol=segment.width+1)), 96)

counts <- rep(NA, 96)

k <- 1
for(i in 1:8){
for(j in 1:12)
{
cat( "pic2[",(origin[1] + 1 + (j-1)*segment.width),":",(origin[1]+ (j)*segment.width )," , ")
cat( (origin[2]+ (i-1)*segment.width),":",(origin[2]+ (i)*segment.width ),"]\n")

imgs[[k]] <- imageData( pic2 )[ (origin[1] + 1 + (j-1)*segment.width):(origin[1]+ (j)*segment.width ),
(origin[2]+ (i-1)*segment.width):(origin[2]+ (i)*segment.width ) ]
img.label <- bwlabel(imgs[[k]])
counts[k] <- max(img.label)
k <- k+1
rm(img.label)
}
}

pic2[ 139 : 241 , 20 : 123 ]
pic2[ 242 : 344 , 20 : 123 ]
pic2[ 345 : 447 , 20 : 123 ]
pic2[ 448 : 550 , 20 : 123 ]
...

#display a segment

display(imgs[[77]])


Now count the spots in the individual images algorithmically and also manually for those wells that can be counted. Plot and compare.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
#reorder by column
counts.m <- matrix(counts, nrow=8, byrow=TRUE)
counts.bycol <- as.vector(counts.m)

#convert a well number to row/column designation.
getWell96 <- function( id ){
n <- 1:96
row <- rep(LETTERS[1:8],12)
col <- sort(rep(1:12, 8))
name <- I(paste(row,col, sep="")) #I() is as.is to prevent factorization
well.ids <- data.frame(name, n)

if( id %in% name) {well.ids[well.ids$name==id,"n"]
}else{
well.ids[well.ids$n==id,"name"]}
}

wells <- sapply( 1:96 , getWell96)
manual <- NA #to hold manual counts

results <- data.frame(wells, counts.bycol, manual)

#count a bunch manually
results[ results$well=="A10","manual"] <- 16
results[ results$well=="A11","manual"] <-11
results[ results$well=="A12","manual"] <-8
results[ results$well=="B9","manual"] <-13
...

The diagonal with a majority of counts >10 would be predicted to give the most reliable results.
Now compare algorithmic vs. manually determined counts:

1
2
3
4
5
6
7
names(d)[2:3] <- c("algorithm","manual")

x <- d$algorithm
y <- d$manual
#we will make y the response variable and x the predictor
#the response variable is usually on the y-axis
plot(x,y,pch=19)

Make an attemp to fit a couple of polynomial lines - for predictive purposes.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

#fit first degree polynomial equation:
fit <- lm(y~x)
#second degree
fit2 <- lm(y~poly(x,2,raw=TRUE))
fit3 <- lm(y~poly(x,3,raw=TRUE))

xx <- seq( 1,25, length=50)
plot(x,y,pch=19, ylim=c(0,15), xlim=c(0,15))
lines(xx, predict(fit, data.frame(x=xx)), col="red")
lines(xx, predict(fit2, data.frame(x=xx)), col="blue")
#lines(xx, predict(fit3, data.frame(x=xx)), col="green")

abline(0,1)

#y = 1.00008x -0.018513x^2 - 0.137933
b <- fit[[1]][1]
m <- fit[[1]][2]

Looks like not much advantage to the polynomial fit so I will fit a line and extrapolate.

Print out algorithmically determined counts, predicted (i.e. manually determined) counts, and the difference.

1
2
3
4
5
6
7
8
9

results$predicted <- round( predict(fit, data.frame(x=results$counts.bycol)), digits = 0)
results$delta <- results$predicted - results$counts.bycol
predicted <- matrix(results$predicted, nrow=8, byrow=FALSE)

counts.m
predicted
predicted - counts.m

Share

High throughput titering

Plating colonies on 10cm dishes, either for titering or for isolating clones, is a resource intensive process. Eliminate some of the tedium and expense by reducing the scale. Serial dilutions are performed in a 96 well plate filled with bacterial growth media e.g. 270uL per well, transferring 30uL down the columns for a 1:10 dilution per row. Next, using a 96 channel Biomek NX, 10uL of dilute bacteria from each well are deposited on the surface of an Omni plate filled with 60mL LB agar + AMP at 100ug/mL. Dilute such that counts are in the range of 5 to 20 per spot.

Nunc cat# 267060
Nunc cat# 242811 OmniTray Single Well w/lid 86x128mm (pictured below)

A single Omniplate can substitute for 96 10cm dishes, though in practice many of the Omniplate regions will not be countable. Below are some images of colony growth on the first three rows of an Omni plate.

Close up of 9 regions, with counts indicated in white:

Comparison of counts from a 10cm dish to counts from an Omniplate. Expected counts are an estimate based on OD600.
OD600 estimate is >100 fold off.

Calculated titers. Note that results are more consistent with the wells containing a higher number of counts - Row B:

Share

Referencer

A couple of options for free, web based bibliography managers are Mendeley and Zotero. Given that Mendeley is owned by Elsevier, they are not an option. I started with Zotero, some configuration elements I will outline below, but eventually dropped them for Referencer. I like the idea of having complete control over my library. I am not interested in the “social” features of Zotero, I found upgrades to be difficult, and didn’t like the file structure utilized. Referencer is extremely simple, easy to install and configure, and gets the job done. I don’t anticipate many “upgrades” to the Referencer software - hopefully it will be maintained well into the future. I replicate the files to a cloud server for easy backup.

Referencer

#apt-get install referencer

Zotero

do not export, backup the database directory to retain metadata

~/.zotero/zotero/<3xdb….>/zotero…

the “pipes” subdir will not copy, be sure to compensate

pacman -S alsa-lib dbus-glib gcc-libs
pacman -S gtk2 nss #these were already installed

To configure the Zotero webdav server on Hostgator:

www.mysite.com/owncloud/remote.php/webdav ( /zotero automatically appended)
you will be prompted to create directory

Share

Associate GPS coordinates with a street address

Obtain street addresses

We would like to associate trees with street addresses when possible. When collecting tree data, we could also make note of the nearest address. Since we are collecting longitude/latitude data anyway, it would be convenient if we had a list of street addresses with associated longitude/latitude. This would allow us to auto-populate an address field based on longitude/latitude data. How does one obtain a list of all street addresses in town with longitude/latitude data? Let’s start with obtaining addresses alone. Addresses are availble commercially e.g. from the post office, but for a substantial fee. Since this is a volunteer effort, I want to see what I can get for free. Our town publishes property assessments on a regular basis, but this is hardcopy in the local paper. Though assessments are available online, it is through a query interface providing one at a time, with no obvious way to download the entire list. I was able to find our town’s street sweeping schedule which supplies a list of all streets in html format:

Street Name
Rte
Location
Aberdeen Road
16
Tanager St. to Dundee Rd.
Academy Street
25
734 Mass. Ave. to Irving St.
Acorn Park
n/a
30 Concord Tpk., 100' swly
Acton Street
20
21 Appleton St. to Appleton Pl.
Adamian Park
7
Baker Rd. to 100'  S'LY OF Upland Rd.
Adams Street
32
319 Mass. Ave. to 216 Bdwy.
Addison Street
27
106 Pleasant St., 800' sely
Aerial Street
2
169 Forest St., 375' nely
Aerial Street
2
375' nely of Forest St. to Carl Rd.
Aerial Street
2
Carl Rd. to 288 Wash. St.
Albermarle Street
21
Walnut St. to Mt. Vernon St.
Alfred Road
28
97 Lake St. to Princeton Rd.
Allen Street
32
339 Mass. Ave. to 70 Warren St.
Alpine Street
15
26 Park Ave. Ext., 350' nly
Alpine Street
15
350' nly Park Ave. Ext. to 300'swly Branch Av
Alpine Street
15
300' swly of Branch Ave. to Summer St
Alpine Terrace
1
Huntington Rd., 286' swly
Alton Street
35
295 Bdwy. to 158 Warren St.
Amherst Street
11
14 River St. to Rawson Rd.
Amsden Street
34
107 Mass. Ave. to Waldo Rd.
Andrew Street
32
Foster St. to Allen St.
Apache Trail
3
Lantern Ln to 150' wly of Cntry Clb Dr

etc.




Copy paste into a text editor and with a little manipulation in R, I can get a vector of street names:

1
2
3
4
5
6
7
8
9
10
11
> rm(list=ls(all=TRUE))
> library(rjson)
> library(plyr)
>
> con <- file("streets-modified.csv", open="rt", encoding="latin1")
> all.streets <-unique( readLines(con))
> head(all.streets)
[1] "street" "Aberdeen Rd " "Academy St " "Acton St "
[5] "Adamian Park " "Adams St "
> length(all.streets)
[1] 561

Looks like about 561 streets total. From my hardcopy assessment I learn that:

  • Most streets don’t begin with 1 as the first address
  • Addresses aren’t consecutive - there can be many gaps in numbering
  • Most streets have fewer than 200 addresses
  • A few of the main streets have up to 1600 addresses

Query for GPS coordinates

To obtain street addresses with lng/lat data I will use the DataScienceToolkit Google Style geocoder. To obtain coordinates I need to submit an address that looks like:

1 Aberdeen Road, Arlington, MA

This will return a data set that looks like:

[{“geometry”:{“viewport”:{“northeast”:{“lng”:-71.188745,”lat”:42.424604},”southwest”:{“lng”:-71.190745,”lat”:42.422604}},”location_type”:”ROOFTOP”,”location”:{“lng”:-71.189745,”lat”:42.423604}},”types”:[“street_address”],”address_components”:[{“types”:[“street_number”],”short_name”:”1”,”long_name”:”1”},{“types”:[“route”],”short_name”:”Aberdeen Rd”,”long_name”:”Aberdeen Rd”},{“types”:[“locality”,”political”],”short_name”:”Arlington”,”long_name”:”Arlington”},{“types”:[“administrative_area_level_1”,”political”],”short_name”:”MA”,”long_name”:”MA”},{“types”:[“country”,”political”],”short_name”:”US”,”long_name”:”United States”}],”formatted_address”:”1 Aberdeen Road Arlington MA”}]

You can see that lng= -71.188745 and lat=42.424604. I want to submit algorithmically, using a post to their URL. The post will look like:

http://www.datasciencetoolkit.org/maps/api/geocode/json?sensor=false&address=1+Aberdeen+Rd+Arlington+MA"

I also want to count up the street addresses beginning at 1. This will give me a lng/lat for every address depending on how the geocoder handles addresses that don’t exist. Let’s find out.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
> 
>
> prefix <- 'http://www.datasciencetoolkit.org/maps/api/geocode/json?sensor=false&address='
> suffix <- "Arlington+MA"
>
> arl.addresses <- data.frame( lng = numeric(), lat = numeric(), num=numeric(), street = character())
>
> prefix <- 'http://www.datasciencetoolkit.org/maps/api/geocode/json?sensor=false&address='
> suffix <- "Arlington+MA"
> arl.addresses <- data.frame( lng = numeric(), lat = numeric(), num=numeric(), street = character())
>
> for( i in 1:20){
+ middle <- gsub(" ","+",all.streets[10])
+ url <- paste( prefix, i, "+", middle, suffix, sep="") #replace the space with a + sign
+ doc <- fromJSON(file=url, method='C')
+ arow <- data.frame( lng=doc$results[[1]]$geometry$location$lng,
+ lat=doc$results[[1]]$geometry$location$lat,
+ num=doc$results[[1]]$address_components[[1]]$short_name,
+ street=doc$results[[1]]$address_components[[2]]$short_name)
+
+ arl.addresses <- rbind(arl.addresses, arow)
+
+ }
There were 20 warnings (use warnings() to see them)
>
> arl.addresses
lng lat num street
1 -71.14672 42.41162 1 Allen St
2 -71.14672 42.41162 2 Allen St
3 -71.14672 42.41151 3 Allen St
4 -71.14684 42.41160 4 Allen St
5 -71.14678 42.41145 5 Allen St
6 -71.14690 42.41153 6 Allen St
7 -71.14683 42.41139 7 Allen St
8 -71.14697 42.41146 8 Allen St
9 -71.14688 42.41132 9 Allen St
10 -71.14703 42.41139 10 Allen St
11 -71.14694 42.41126 11 Allen St
12 -71.14709 42.41132 12 Allen St
13 -71.14699 42.41120 13 Allen St
14 -71.14715 42.41125 14 Allen St
15 -71.14705 42.41114 15 Allen St
16 -71.14722 42.41118 16 Allen St
17 -71.14710 42.41108 17 Allen St
18 -71.14728 42.41110 18 Allen St
19 -71.14716 42.41102 19 Allen St
20 -71.14734 42.41103 20 Allen St
>

Allen Street addresses begin at 12 so it looks like we will assigned lng/lat to nonexisting addresses. 1 and 2 share the same lng/lat. This also happens at the high end e.g. if the highest address is 100, 1 through 100 will be assigned unique lng/lats, and then the lng/lats become constant. Modify the code to stop querying once both lng and lat aren't changing.
Share

BU 353 USB GPS unit

Handling GPS data

  1. Hardware
  2. Software
  3. Direct access to tty
  4. Using gpsd

Hardware

Be sure to purchase a unit that will broadcast NMEA compliant sentences. I chose the Globalsat BU-353 shown below. This is a USB device that obtains power through the USB, but also has an internal battery for short term memory. Plug the unit into your laptop and allow a few minutes to get a fix. The led will start blinking once a fix has been obtained. You will not need any of the software provided with the unit.





Software

Guile 1.8 a dialect of Scheme/Lisp

gpsd a gps daemon. You will want to install the software compiled for your system, e.g. on Arch use

>pacman -S gpsd gpsd-clients

Direct access to tty

By default your gps unit is assigned to /dev/ttyUSB0, but you should check to make sure:

>dmesg | grep -i USB
[11250.384023] usb 4-2: new full-speed USB device number 2 using uhci_hcd
[11250.669968] usbcore: registered new interface driver usbserial
[11250.669991] USB Serial support registered for generic
[11250.670030] usbcore: registered new interface driver usbserial_generic
[11250.670033] usbserial: USB Serial Driver core
[11250.673280] USB Serial support registered for pl2303
[11250.685151] usb 4-2: pl2303 converter now attached to ttyUSB0
[11250.685174] usbcore: registered new interface driver pl2303
[11250.685177] pl2303: Prolific PL2303 USB to serial adaptor driver

Set access, change the baud rate and put the GPS in NMEA mode. Look at some of the output:

>sudo chmod 666 /dev/ttyUSB0 && stty -F /dev/ttyUSB0 ispeed 4800
>cat /dev/ttyUSB0
$GPGSA,A,3,15,14,09,18,21,24,22,06,19,,,,1.9,1.2,1.4*36
$GPGSV,3,1,12,18,71,317,34,24,63,112,31,21,52,212,28,15,42,058,34*71
$GPGSV,3,2,12,22,37,301,28,14,21,250,24,09,07,035,21,19,06,316,13*73
$GPGSV,3,3,12,06,05,279,18,26,01,060,,03,01,294,,28,01,024,*7F
$GPRMC,224355.000,A,1825.0332,N,02111.3093,W,0.13,134.34,050513,,,A*75
$GPGGA,224532.000,1825.0301,N,02111.3093,W,1,08,1.3,39.8,M,-73.7,M,,0000*25
$GPGSA,A,3,15,14,09,18,21,24,22,06,19,,,,1.9,1.2,1.4*36
$GPRMC,224356.000,A,4225.0330,N,02111.3093,W,0.33,145.51,050513,,,A*73
$GPGGA,224357.000,1825.0328,N,07111.3092,W,1,09,1.2,81.8,M,-33.7,M,,0000*5F
$GPGSA,A,3,15,14,09,18,21,24,22,06,19,,,,1.9,1.2,1.4*36
$GPRMC,224357.000,A,1825.0328,N,02111.3092,W,0.28,151.99,050513,,,A*71

Use Ctrl-C to stop the process.

Pick a sentence format that contains the coordinate information you wish to extract. I chose GPGGA.

>cat /dev/ttyUSB0 | grep GPGGA
$GPGGA,124532.000,1825.0301,N,02111.3093,W,1,08,1.3,39.8,M,-73.7,M,,0000*25
$GPGGA,124532.000,1825.0301,N,02111.3093,W,1,08,1.3,39.8,M,-73.7,M,,0000*25
$GPGGA,124532.000,1825.0301,N,02111.3093,W,1,08,1.3,39.8,M,-73.7,M,,0000*25
$GPGGA,124532.000,1825.0301,N,02111.3093,W,1,08,1.3,39.8,M,-73.7,M,,0000*25
$GPGGA,124532.000,1825.0301,N,02111.3093,W,1,08,1.3,39.8,M,-73.7,M,,0000*25
$GPGGA,124532.000,1825.0301,N,02111.3093,W,1,08,1.3,39.8,M,-73.7,M,,0000*25
$GPGGA,124532.000,1825.0301,N,02111.3093,W,1,08,1.3,39.8,M,-73.7,M,,0000*25
$GPGGA,124532.000,1825.0301,N,02111.3093,W,1,08,1.3,39.8,M,-73.7,M,,0000*25
$GPGGA,124532.000,1825.0301,N,02111.3093,W,1,08,1.3,39.8,M,-73.7,M,,0000*25
$GPGGA,124532.000,1825.0301,N,02111.3093,W,1,08,1.3,39.8,M,-73.7,M,,0000*25

You can extract to a text file using:

cat /dev/ttyUSB0 | grep --line-buffered   GPGGA > output.txt

Now with further munging you can extract the coordinates. I’ll save that for the next section where I use gpsd.


Using gpsd

Plug in your gps unit and start up gpsd. You will need to know the tty to which it is connected, see above.

gpsd -b  /dev/ttyUSB0

This launches the daemon which is now ready to be configured to accept requests. The “-b” switch means read only, to prevent writing that may lock up the unit. Use at your discretion. To verify the daemon is broadcasting, you can run either xgps (gui) or cgps, gpsmon (text) if you installed the client tools.

>gpsmon

If everything is working you should see your coordinates displayed in the interface. Here is an example of what the xgps interface looks like:



Next we want to algorithmically extract coordinates from the daemon. Here I am using Guile. First I import two modules, rdelim which provides the read-line method, and srfi-1 which allows me to extract elements of a list using first, second, third, etc. Create a socket, connect the socket to port 2947 on my local machine and send the message "?WATCH={\"enable\":true,\"nmea\":true}" to indicate I am waiting for NMEA sentences.

(use-modules (ice-9 rdelim))
(use-modules (srfi srfi-1))

(define coords '())

(let ((s (socket PF_INET SOCK_STREAM 0)))
     (connect s AF_INET (inet-pton AF_INET "127.0.0.1") 2947)
     (send s "?WATCH={\"enable\":true,\"nmea\":true}")
     (do ((line  (read-line s) (read-line s)))
         ((string= (substring line 1 6) "GPGGA")
         (set! coords (list (third (string-split line #\,)) (fourth (string-split line #\,)) (fifth (string-split line #\,)) (sixth (string-split line #\,)))))))
 

I keep resetting the variable ‘line’ until I find a line that starts with GPGGA. Once obtained, I extract the coordinates and form a list which is named ‘coords’. Run the method to get the ouput below.

guile> guile> guile> guile> guile> ("4225.0384" "N" "78411.3104" "E")
guile> 

You now have a list of coordinates which can now be converted into the desired format.



Share

Linux Distributions

I have used three Linux distributions: Ubuntu, Arch, and Debian. Here are my thoughts on the strengths and weaknesses of each.

Ubuntu

For a newcomer to Linux, Ubuntu is a natural choice. Canonical, the company behind Ubuntu, is a business, and as a business it is in their best interest to make Linux installation a no brainer. All drivers are included in the image and a live CD will allow you to experience Ubuntu without even installing the software. This allows you to determine if the distribution will work with your hardware, and test drive some of the applications. Once installed you will find all the features you expect from a mature desktop operating system. With a large user base, someone has probably already found a fix to problems you may experience.

Users migrating to Linux, myself included, are often migrating for the purpose of privacy and control. I don’t want to pay $1000 for a computer and then have to get Microsoft’s permission to use it. Meanwhile every keystroke is logged and my search queries a catalogued at headquarters. By signing up with Ubuntu, you are in for a similar experience. Desktop search pipes your queries to Amazon and the Ubuntu software manager will annoy you with advertisements for non-free crapware. Once you have become comfortable with Linux using Ubuntu, it is time to move on.

Arch

I thought I wanted a minimalist system, so I committed to Arch and the I3 window manager. Arch is a great system for someone who wants to learn Linux inside out. Nothing is preconfigured, forcing you to learn the intricasies of the configuration files. Most users will be hard core programmers - you have to be to deal with installation and configuration - so all problems have been encountered and solutions explicitly provided. I never had an Arch problem I couldn’t solve from reading the wiki. One quirk (feature?) of Arch is the rolling releases. When installing packages, you almost always need to run a system upgrade. 99% of the time this completes without problems, but occasionally, for example 2013-06-03 the file system directory layout was changed and a significant effort had to be invested in completing the upgrade. “Always read the Arch news before upgrading” - Arch is thoroughly and expertly documented, but this requires religious devotion.

The problem with Arch is that it requires a lot of work and time. Everything needs to be configured. For example, if you plug in a USB stick, nothing happens unless you configured auto-mount. I quickly came to realize that I don’t want to be a Linux guru, I just want to use Linux to get things done. That is what led me to Debian.

Debian

Debian is the distribution on which many other distributions (including Ubuntu) are built. Debian has the reputation for stability, with major releases occurring about every 2 years. Releases make use of well tested applications, sometimes a version or two behind what is currently available, to guarantee stability. Debian’s package manager is field tested and shared by many other distributions. Most software written for Linux will have an installation package for Debian. As I usually buy older computers with less memory, I use the minimalist Xfce window manager. Xfce has a clean layout, doesn’t install a lot of unnecessary software, yet provides the essentials you would expect in a modern window manager. There is a Debian installer that defaults to using Xfce as the window manager. I have been using Debian for a couple of years now and can report that I have encountered no issues that would prevent the average user from switching from Windows or Apple to Linux. The one exception would be Microsoft Office file compatibility. LibreOffice does a good job, but complex documents may not render properly.

Share

Population Genetics Chpt 5 End

Chapter 5 End Problems


Problem 2

1
2
3
4
5
6
7
8
condition <- letters[1:4]
N <- c(50,100, 50, 100)
Nm <- c(0.25, 0.5, 0.1, 0.2)
m <- Nm/N

d <- data.frame(condition,N,m)

plot(d$N, d$m, pch=as.character(d$condition))

Shifting balance favors smaller N and smaller m so C is the favored condition.

Problem 6

Start by setting r = x/ny to get the break even n. Anything larger is needed for altruistic behaviour to be favored by kin selection.

0.25 = 0.4/0.2n or n=0.4/(0.2*0.25) = 8 or greater

Problem 11

1
2
3
4
5
6
7
8
9
10
11
12
13
14
area <- c("dark","dark","light","light")
melanic <- c(T, F, T, F)
p.eaten <- c(0.26, 0.74, 0.86, 0.14)
p.survive <- 1-p.eaten

d <- data.frame(area, melanic, p.eaten, p.survive)
d2 <- split(d, area)

lapply(d2, function(x){ x[x$melanic==TRUE,"p.survive"]/x[x$melanic==FALSE,"p.survive"] }) #melanic/nonmelanic survival ratio
## $dark
## [1] 2.846154
##
## $light
## [1] 0.1627907

Problem 14

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
J.1 <- c(0.12, 0.8, 0.06, 0.02)

J.2 <- c(0.3,0.56, 0.1, 0.04)

J.11 <- sum(J.1^2)
J.11
## [1] 0.6584
J.22 <- sum(J.2^2)
J.22
## [1] 0.4152
J.12 <- sum(J.1*J.2)
J.12
## [1] 0.4908
I <- J.12/sqrt(J.11*J.22) #from p408
I
## [1] 0.938709
D <- -log(I)
D
## [1] 0.06324978

Problem 15

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
J.1 <- c(0.3, 0.5)

J.2 <- c(0.2, 0.3)

J.11 <- sum(J.1^2)
J.11
## [1] 0.34
J.22 <- sum(J.2^2)
J.22
## [1] 0.13
J.12 <- sum(J.1*J.2)
J.12
## [1] 0.21
I <- J.12/sqrt(J.11*J.22) #from p408
I
## [1] 0.9988681
D <- -log(I)
D
## [1] 0.001132503
Share