Hit Identification

Hit identification can be accomplished using one of three different methods:

1. Select an algorithm during data import

Select an algorithm from the drop down during data import. Hits will be identified and the Hit List name and description text fields will be enabled so you can register the hit list.

2. In the scatterplot/replot view, view and generate a hit list

When plotting or replotting, the tool icon provides the option to view a hit list. The hit list view provides the option to register and save the hit list.

3. Export data and use external data analysis to identify hits. Import the hit list.

Post data import, annotated data can be exported for visualization using other software. A hit list can be compiled external to LIMS*Nucleus and then imported.

Scroll down to the hit list and import under the tools icon:

hitlist

A list of samples of interest Must have a header road named “name” One sample per line, no separator Primarily used to cherry pick samples from plate to plate

Next:

Layouts

LIMS*Nucleus makes use of the following definitions:

Sample: Item of interest being tracked by LIMS*Nucleus, i.e. the item in wells. Examples would be compounds, antibodies, bacterial clones, DNA fragments, siRNAs.

Target: the item with which the sample interacts, usually coated on the bottomn of the microwell plate e.g. the antigen for an antibody or the enzyme (target) of a compound.

When creating layouts there are three attributes that need to be defined:

Entity Attribute
Sample type, replication
Target replication

LIMS*Nucleus support 5 sample types:

Type ID
unknown 1
positive control 2
negative control 3
blank 4
edge 5

LIMS*Nucleus has twenty pre-defined layouts installed at the time of system installation. Custom sample layouts can be defined and imported by administrators. A sample layout import file that defines four control wells at the bottom of column 7 looks like:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
well	type
1 1
2 1
3 1
4 1
5 1
...
51 1
52 1
53 2
54 2
55 3
56 4
57 1
58 1
...

92 1
93 1
94 1
95 1
96 1

When viewed in the layout viewer, the above file would provide the following sample layout:

For every sample layout imported, an additional 5 layouts are created that define sample and target replication. These layouts are discussed in detail on the replication page.

Here is a sample layout import file that defines 8 controls in a 384 well plate, randomly scattered, excluding edge wells

When reformatted into 1536, the layout will look like:

LIMS*Nucleus - Multi-Well Plate Management Software

LIMS*Nucleus is a software program used to manage multi-well plates in an academic or industrial environment. Functionality includes:

  • Generate 96, 384 or 1536 well plates with or without samples
  • Collect plates into plate sets
  • Group or split plate sets
  • Reformat plates - four 96 well plates into a 384 well plate; four 384 well plates into a 1536 well plate
  • Associate assay data with plate sets
  • Identify hits scoring in assays using included algorithms - or write your own
  • Export annotated data
  • Generate worklists for liquid handling robots
  • Rearray hits into a smaller collection of plates
  • Prototype algorithms, visualization with R/Shiny
  • Evaluate an online instance
  • Video overviews of features and capabilities

LIMS*Nucleus has a restricted set of features - multi-well plate management, hit identification, rearraying - and serves as the core of a larger system. Source code is available for modification. The architecture is simple client/server with no middleware or ORM. The client utilizes Bootstrap/Datatables and the database is PostgreSQL The software is packaged as a Guix pack for easy installation/configuration. R/Shiny dashboards can be used to extend functionality.

Next: Monoliths vs Systems

Mutation Visualization

Compare parental and mutant sequences

After perfoming error prone PCR (random) or oligonucleotide (directed) mutagenesis you will want to visualize your sequences and determine the rate of mutation incorporation. A typical visualization is the stacked bar chart as in this figure from Finlay et al. JMB (2009) 388, 541-558:

To decode this graphic you must:

  • estimate the percentage of each amino acid by comparison to the Y axis
  • compare relative amino acid abundance by comparing the area of boxes
  • correlate color with amino acid identity
  • compare to the reference sequence at the bottom of the graph

An easier to interpret graphic would be a scatter plot of sequence index (i.e. nucleotide position) on the X axis vs frequency on the Y. The data points are the single letter amino acid code. Highlight the reference sequence with a red letter.

The first step is to align all sequences. Start with a multi-fasta file of all sequences:


$cat ./myseqs.fasta

>ref
GLVQXGGSXRLSCAASGFTFSSYAMSWVRQAPGKGLEWVSAISGSGGSTYY
ADSVKGRFTISRDNSKNTLYLQMNSLRAEDTAVYYCAKDHRRPKGAFDIWGQGTMVTVSS
GGGGSGGGGSGGGGSGQSALTQPASVSGSPGQSITISCTGTSSDVGAYNYVSWYQQYPGK
APKLMIYEVTNRPSGVSDRFSGSKSGNTASLTISGLQTGDEADYYCGTWDSSLSAVV
>BSA130618a-A01
glvxxggxxrlscasgftfssyamswvrqapgklewvsaisgsggstyysdsvkgrftissdnskntlylqmnslraedt
avyycakdhrrpkgafdiwgqgtmvtvssggggsggggsggggsgqsaltqprsvsgtpgqsviisctgtssdvggskyv
swyqqhpgnapkliiydvserpsgvsnrfsgsksgtsaslaitglqaedeadyycqsydsslvvf
>BSA130618a-A02
glvqpggxxrlscasgftfssyamswvrqapgkglewvsaisgsggstyyadsvkgrftisrdnskntlylqmnslraed
tavyycakdhrrpngafdiwgqgtmvtvssggggsggggsggggsgqsvvtqppsmsaapgqkvtiscsgsssnignnyv
swyqqlpgtapklliydnnkrpsxipdrfsgsksgtsatlitglqtgdeadyycgtwdsslsagvf
>BSA130618a-A03
glvqxggxxrlscasgftfssyamswvrqapgkglewvsaisgsggstyyadsvkgrftisrdnskntlylqmnslraed
tavyycakdhrrpkgafdiwgqgtmvtvssggggsggggsggggsgsyeltqppsvsvspgqtasitcsgsssniginyv
swyqqvpgtapklliyddtnrpsgisdrfsgsksgtsatlgitglqtgdeadyycgtwdsslsvvvf

Above I have labeled my parental reference sequence “ref”. Use clustalo to perform the alignment and request the output in “clustal” format. The clustalo command can be run from within R using the system command. Read the alignment file into a matrix:

  input.file <- paste( getwd(), "/out.fasta", sep="")
  output.file <-  paste( getwd(), "/out.aln", sep="")
  system( paste("c:/progra~1/clustalo/clustalo.exe -infile=", input.file, " -o ", output.file, ".aln --outfmt=clustal", sep=""))   
 
 in.file <- paste(getwd(), "/out.aln", sep="")  
 seqs.aln <- as.matrix(read.alignment(file = in.file, format="clustal"))

At each position determine the frequency of all 20 amino acids. Set up a second matrix that has one dimension as the length of the sequence and the other as 20 for each amino acid. This is the matrix that will hold the amino acid frequencies.

The R package “seqinr” provides a constant containing all single character amino acids as well as asterisk for the stop codon. Use this to name the rows of the frequency matrix.

    library(seqinr)
levels(SEQINR.UTIL$CODON.AA$L)

[1] "*" "A" "C" "D" "E" "F" "G" "H" "I" "K" "L" "M" "N" "P" "Q" "R" "S" "T" "V"
[20] "W" "Y"

aas <- c(levels(SEQINR.UTIL$CODON.AA$L), 'X')
freqs <- matrix(  ncol=dim(seqs.aln)[2], nrow=length(aas))
rownames(freqs) <- aas

#Process through the matrix, calculating the frequency for each amino acid.
for( col in 1:dim(aligns)[2]){
     for( row in 1:length(aas)){
          freqs[row, col] <- length(which(toupper(seqs.aln[,col])==aas[row]))/dim(seqs.aln)[1]
      }
}

Set up an empty plot for Frequency (Y axis) vs nucleotide index (X axis). Y range is 0 to 1, X range is one to the length of the sequence i.e. the number of columns in the frequency matrix. Plot frequencies >0 in black, using the single letter amino acid code as the plot character.

    plot(1, type="n", xlab="Sequence Index", ylab="Frequency", xlim=c(1, dim(freqs)[2]), ylim=c(0, 1))
for( i in 1:length(aas)){
       points( which(freqs[i,]>0), freqs[i, freqs[i,]>0], pch=rownames(freqs)[i], cex=0.5)
       }

Overlay the reference sequence in red.

ref <-seqs.aln[rownames(seqs.aln)=="ref",]
for(i in 1:length(ref)){
     if(  length( freqs[rownames(freqs)[rownames(freqs)==toupper(ref[i])],i] ) > 0){
    if(freqs[rownames(freqs)[rownames(freqs)==toupper(ref[i])],i] > 0){
                  points( i,freqs[rownames(freqs)[rownames(freqs)==toupper(ref[i])],i]  , pch=toupper(ref[i]), cex=0.5, col="red")
              }
            }
    }

This is what it looks like (open in a new tab to see detail):

It’s easy to see which amino acid is parental, and its relative abundance to other amino acids is clear.
Consider position 61: N is the parental amino acid but T is now more abundant in the panel of mutants. K and S are the next most abundant amino acids.

Should multiple amino acids have the same or close to the same frequency, the graph can get cluttered and difficult to interpret. Adjusting the Y axis can help clarify amino acid identity. At each position percentages may not add up to 100 depending on the number of gaps. Consider the sequence “RFSGS” at positions 69-73 which is in a region containing gaps for some of the clones:

Installation

Edit your channels.scm file to include the labsolns channel

Once edited:


$guix pull
$guix package -i mutvis
$source $HOME/.guix-profile/etc/profile

##run the bash script

$mutvis.sh

Monoliths

LIMS (Laboratory Management Information Systems) can be broadly characterized into 2 groups, monoliths and systems. The difference is less about functionality and more about architecture. Monoliths are a large all inclusive application that maximize automation and minimizes user intervention. Monoliths are very efficient when a process is standardized and unchanging.

Advantages

  • Full automation, maximum reduction in FTE requirements
  • Consistant reproducible processing
  • Enhancements, upgrades, and training outsourced to the vendor
  • User groups provide resources for problem solving (bug fixes, add on components, help with problems)

Disadvantages

  • Cost
  • Many moving parts (database, ORM, web server, interface)
  • Complex - requires extensive training
  • Feature creep
  • Brittle - difficult to change in response to a changing process
  • Dependant on vendor for bug fixes and upgrades
  • Off-the-shelf solutions may not satisfy all requirements
  • May depend on obscure components (old programming languages, object database, image)
  • Custom solutions may be obsolete on delivery
  • Resistance to use

Next>> Systems

General navigation

LIMS*Nucleus works with a nested heirarchy of entities. The object heirarchy can be navigted by clicking hyperlinks in the data tables. The left hand menu items allow for global navigation. Since users are often concerned with only one project at a time, LIMS*Nucleus tracks the current (default) project, which is visbile in the menu area. The default project can be changed by listing all project (first menu item) and clicking into a project. The tools icon presents workflows associated with the visible entity, and often require selection of row(s) in the data table.

Install LIMS*Nucleus using a Guix pack

A Guix pack installation is a simple installation suitable for users not interested in the Guix package manager. The detailed instructions below follow the same process automated by the install script found on the evaluate page, but provides additional diagnostic information.

The first step in the install script is setting up the database. Manually install the database using instructions on the postgres page.

If you are using the install script and it is not executable, change permissions:

1
chmod 777 install-limsn-pack.sh

Download and unzip the archive

1
2
wget https://github.com/labsolns/labsolns/releases/download/v0.1.0p/limsn-0.1.0-pack.tar.gz
tar xf ./limsn-0.1.0-pack.tar.gz

Look at the contents of the bin directory

1
2
3
4
5
6
7
$ls ./bin
art dropuser guile-config initdb oid2name pg_controldata pg_receivewal pg_standby pg_waldump reindexdb
clusterdb ecpg guile-snarf install-pg-aws-ec2.sh pg_archivecleanup pg_ctl pg_recvlogical pg_test_fsync pgbench start-limsn.sh
createdb gnuplot guile-tools install-pg-aws-rds.sh pg_basebackup pg_dump pg_resetwal pg_test_timing postgres vacuumdb
createuser guild init-limsn-channel.sh lnpg.sh pg_checksums pg_dumpall pg_restore pg_upgrade postmaster vacuumlo
dropdb guile init-limsn-pack.sh load-pg.sh pg_config pg_isready pg_rewind pg_verifybackup psql

Various *limsn*.sh scripts are needed to configure and start up LIMS*Nucleus. You can also use psql to diagnose the database.

Place $HOME/bin on $PATH

1
2
3
$export PATH="$HOME/bin${PATH:+:}$PATH"
$echo $PATH
/home/admin/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games

Note that in the above example I am using the admin account on AWS so $HOME == /home/admin

Initialize LIMS*Nucleus

Initialize by executing ./bin/init-limsn-pack.sh. The function of the various scripts is described on the scripts page. Check that the $HOME/.config/limsn directory has been created and artanis.conf has been copied into the directory:

1
2
$ ls $HOME/.config/limsn
artanis.conf

Check that $HOME/.bashrc has been modified to include exports for LC_ALL, PATH, GUILE_LOAD_PATH, and GUILE_DBD_PATH.

$HOME/.bashrc
1
2
3
4
5
6
7
8
9
10
11
12
13
14
...
$cat $HOME/.bashrc
/# sources /etc/bash.bashrc).
if ! shopt -oq posix; then
if [ -f /usr/share/bash-completion/bash_completion ]; then
. /usr/share/bash-completion/bash_completion
elif [ -f /etc/bash_completion ]; then
. /etc/bash_completion
fi
fi
export LC_ALL="C"
export PATH=/home/admin/bin:/gnu/store/2rl49lcanmqn26s660dd85lv7pfn0ykb-limsn-0.1.0/bin:/home/admin/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
export GUILE_LOAD_PATH=/home/admin/gnu/store/rj0pzbki1m5hpcshs614mhkrgs2b3i9d-artanis-0.5.2/share/guile/site/3.0:/home/admin/gnu/store/780bll8lp0xvj7rnazb2qdnrnb329lbw-guile-json-3.5.0/share/guile/site/3.0:/home/admin/gnu/store/jmn100gjcpqbfpxrhrna6gzab8hxkc86-guile-redis-2.1.1/share/guile/site/3.0:/home/admin/gnu/store/3f0lv3m4vlzqc86750025arbskfrq05p-guile-dbi-2.1.8/share/guile/site/2.2
export GUILE_DBD_PATH=/home/admin/gnu/store/z5kilafxayw2kdvn3anw1shkqij17dqb-guile-dbd-postgresql-2.1.8/lib

source .bashrc to make certain that all environment variables have been properly set:

1
$source $HOME/.bashrc

Modify the artanis.conf configuration file

Critical parameters are described on the configuration page. You must have in hand IP addresses for the database and client.

1
sudo nano $HOME/.config/limsn/artanis.conf

Make sure the database is available and loaded

The psql command below will work on a local database - modify accordingly. You should find 10 preloaded projects. Projects 4-9 are empty and can be used for experimentation:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
$psql -U ln_admin -h 127.0.0.1 lndb
psql (13.4, server 11.14 (Debian 11.14-0+deb10u1))
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off)
Type "help" for help.

lndb==>select * from project;

id | project_sys_name | descr | project_name | sessions_id | updated
----+------------------+------------------------------------------+----------------------+-------------+-------------------------------
1 | PRJ-1 | 3 plate sets with 2 96 well plates each | With AR, HL | 9999999999 | 2022-04-06 11:07:05.854606+00
2 | PRJ-2 | 1 plate set with 2 384 well plates each | With AR | 9999999999 | 2022-04-06 11:07:06.825439+00
3 | PRJ-3 | 1 plate set with 1 1536 well plate | With AR | 9999999999 | 2022-04-06 11:07:08.32809+00
4 | PRJ-4 | description 4 | MyTestProj4 | 9999999999 | 2022-04-06 11:07:13.364018+00
5 | PRJ-5 | description 5 | MyTestProj5 | 9999999999 | 2022-04-06 11:07:13.365426+00
6 | PRJ-6 | description 6 | MyTestProj6 | 9999999999 | 2022-04-06 11:07:13.367082+00
7 | PRJ-7 | description 7 | MyTestProj7 | 9999999999 | 2022-04-06 11:07:13.368542+00
8 | PRJ-8 | description 8 | MyTestProj8 | 9999999999 | 2022-04-06 11:07:13.370008+00
9 | PRJ-9 | description 9 | MyTestProj9 | 9999999999 | 2022-04-06 11:07:13.371613+00
10 | PRJ-10 | 2 plate sets with 10 96 well plates each | Plates only, no data | 9999999999 | 2022-04-06 11:07:13.372956+00
(10 rows)

lndb=>

Start LIMS*Nucleus

Start in detached mode so you can close the terminal. To kill the process you will need to look up the PID and kill. Start in regular mode to monitor any error messages.

1
$nohup start-limsn.sh &

to kill Ctrl-C in interactive mode or in detached mode:

1
2
3
4
5
$ ps aux | grep artanis
admin 12479 2.9 6.0 154628 60944 pts/0 Sl 13:22 0:00 /gnu/store/cnfsv9ywaacyafkqdqsv2ry8f01yr7a9-guile-3.0.7/bin/guile \ /gnu/store/dfa7p2zvk4xlhaq1y3hsqkzpqd73ggni-artanis-0.5.2/bin/.art-real work -h0.0.0.0 --config=/home/admin/.config/limsn/artanis.conf
admin 12494 0.0 0.0 3084 880 pts/0 S+ 13:22 0:00 grep artanis

$ kill -15 12479

Plate

Plates are one of three formats - 96, 384, or 1536 well The plate system name in the format PLT-NNN is automatically assigned at creation All plates are part of plate sets

Plates can be assigned a variety of types. Depending on the type, a plate may not contain samples. For example, assay plates are transient and discarded after data collection, so could not serve as the source for rearraying or replica plating.

Installed types are:

Type Description Contain samples?
assay contain associated data no
rearray created during a reformat operation yes
archive designated for storage yes
master original plate of samples yes
daughter result of replica plating or grouping operations yes
replicate result of replica plating yes

Plates are of various types - assay, rearray, glycerol, etc. Plate types are to provide clarity to the user - no convention is enforced

PlateSet

Composed of plates Specific to a project All plates within a plate set must be of the same format (e.g. 96 well) Plate sets can be merged together (different plate types OK) When created, all plates in a plate set will be of the same plate type