PBITV3

Pipeline Builder for Identification of Targets

1. Installation

  • For Windows:

 

Install PERL from https://www.perl.org
Test installation by typing perl -v in the cmd (command prompt) window

Install Bioperl by following the instructions at bioperl.org or follow the steps:

I) Open a cmd window

II) Type cpan to enter the CPAN shell. If CPAN is not recognized, you may have to set the PATH to C:\Strawberry\perl\lib (or the location where CPAN.pm is installed).

III) At the cpan> prompt, type install CPAN to upgrade to the latest version.
IV) Quit (by typing q) and reload CPAN. You may be asked some configuration questions; accepting default questions should work fine.

V) At the cpan> prompt, type o conf prefer_installer MB to tell CPAN to prefer to use Build.PL scripts for installation, and the type o conf commit to save the choice.

VI) At the cpan> prompt, type
install Module::Build

VII) At the cpan> prompt, type
install Test::Harness

VIII) At the cpan> prompt, type
 install Test::Most

2. Finish the install with BioPerl from GitHub
Install the current version of Bioperl manually using a ZIP file from the GitHub repository:
https://cpan.metacpan.org/authors/id/C/CJ/CJFIELDS/BioPerl-1.007001.tar.gz

I) Extract the archive using 7zip or WinRAR.

II) In a cmd window go to the directory you extracted the .rar file to. E.g. if you extracted to directory C:\Downloads\bioperl type cd and the path link of the folder.

III) At the prompt type:
perl Build.pl
and few questions will be asked, answer the questions as per your requirement.

IV) Type

perl Build test

All the tests should pass, but if they don’t, your usage of Bioperl may or may not be affected by the failures, so you can choose to continue anyway.

3) Type:

perl Build install to install bioperl.

2. Install R in windows

  • Add Rscript.exe and its bin to PATH

Start a Control panel, go to System and security and Click on Advanced system settingsà Click “environment variables”

Or

Start and Type “View advanced system settings” Click “environment variables” and edit
Under “System variables”, select Path à Click on edit à Click new à Click browse à Browse to C:/Program files/R/R(version)/bin à Ok
Also add the “exe” browse to upto bin; add the location and manually add /Rscript.exe.

  • Install packages: Open R GUI and type install.packages(type="source")

Source of igraph install.packages("igraph")
User can also download binaries of package and unzip it R library folder.

  • Download and install other dependent packages (if any).

 

2. Supported Platforms

Currently tested on Windows 10 x64 Version. But as Perl is platform independent it should work on Linux, Unix and Mac (Although test runs need to be performed).

3. Dependencies

For PBIT the following must be installed.
I.) PERL v5.8.3 and above
II.)Bioperl and its libraries (https://metacpan.org/pod/release/CJFIELDS/BioPerl-1.007001/BioPerl.pm)
III.) BLAST+ v2.10 (https://ftp.ncbi.nlm.nih.gov/blast/executables/LATEST/) 
IV.) Python v3.7 and above (https://www.python.org/downloads/)
i) pip v19.0 and above (https://pypi.org/project/pip/)
ii) COBRA 0.13.3 and above (https://pypi.org/project/cobra/)
iii) Libraries: sbml, pandas, xlwt, xlxswriter, xlrd, iedb, statistics, csv, click, and selenium (install using pip)
V)  R and its packages (https://cran.r-project.org/bin/windows/base/)
i) igraph(https://igraph.org/r/#downloads)

IMPORTANT:
Once Perl, BioPerl, Python, BLAST+ have been installed and working, go to the Perl command line and enter cpan by giving the command – cpan and you should be in the cpan mode of Perl.
Enter the following commands to install the respective modules which will be needed for PBIT. Install the modules/libraries one at a time.

  • install Inline
  • install Inline::Python
  • install Data::Table
  • install HTML::TokeParser
  • install Parse::RecDescent
  • install Devel::Leak::Module
  • install Regexp::Common
  • install Number::Format
  • install GD::Graph
  • install Statistics::Basic
  • install Statistics::Descriptive
  • install Math::Utils

4. List of available modules and submodules


5. Selection of modules

As observed in the flow chart there are 4 main modules.

  • Screening & Characterization:

This module has 8 submodules. All the sub-modules can be executed at once or individually. They can also be linked together by entering the module and submodule numbers as comma separated values. The screening & characterization module can also be linked to Druggability and Immunoinformatics module (Antigenicity prediction).
Input: FASTA formatted protein sequence files from UniProtKB
Parameters: Parameters such as the E-Value (to be entered in float or integer) and Percent identity (to be entered in float or integer)

Output files: The queryname.bls and queryname.fasta files for each sequence will be generated. A folder with the module name which was selected will be obtained in which the “Ouputformodule_.fasta” will be the main .fasta file which will contain all the .fasta sequences generated for each module. The main folder may also contain the .txt files which may contain the results as per the query.
Output files: The queryname.txt, queryname.html for each query sequence will be generated. A folder of the selected module will be generated. In the folder the .xls file will be present in which the information for the query sequence will be present.

  • Druggability Analysis

Druggability Analysis can be executed as a standalone module and has a pipeline connectivity between Screening and Characterization module and its submodules.
Input: FASTA formatted protein sequence files from UniProtKB
Parameters: E-Value (to be entered in float or integer) and Percent identity (to be entered in float or integer)

Output files: The queryname.bls, queryname.fasta files for each sequence will be generated. A folder with the module name will be obtained in which the “Ouputformodule_.fasta” will be the main .fasta file which will contain all the .fasta sequences generated for each module.

 

  • Immuno-informatics

For B and T cell analysis:
Input: FASTA formatted protein sequence files from UniProtKB
Parameters: Parameters such as the Alleles (For B and T cell) (to be entered in float or integer) and Window size for B cell (to be entered as an integer)
Output files: The “method_name.csv” are the files generated from the B or T cell analysis which contain the list of peptides. The “method_name_filtered.csv” (For B cell) file are the list of peptides for each method which are equal or above to threshold value. The “common_filtered_final.csv” (For B cell) file contains the list of peptides above the threshold value, which are present (common) in all the 5 methods.

For Antigenicity Prediction:
Input: FASTA formatted protein sequence files from UniProtKB (This module is also connected and forms a pipeline with Screening & characterization module).
Parameters: Parameters such as the E-Value (to be entered in float or integer) and Percent Alignment Length cut-off (to be entered in float or integer)
Output files: The queryname.bls, queryname.fasta files for each sequence will be generated. A folder with the module name which was selected will be obtained in which the “Homologous_sequences.fasta” will be the main .fasta file which will contain all the .fasta sequences generated for alignment-based module. For the alignment free method, the “Antigenicity3.csv” file will contain the respective uniport ids with a conclusion of them being antigen or not with a probability.

 

  • Systems Biology

Systems Biology has 3 submodules and are independent of each other, no pipeline has been formed.

For Essential Reaction and Gene-Knockout Analysis:
Input: SBML 2 genome scale metabolic model in JSON format  
Parameters: reaction ids (for Essential Reaction) and gene ids in .txt format (for individual gene knockouts) (reaction and gene ids to be obtained from the generated model_annotation.csv), reaction constraints.
Output files: ‘Model_Associations.csv’ contains gene-reaction association of the metabolic model, 'FVA_Model.xls' contains flux variability analysis results, ‘Essential_Rxn.xls’ contains reaction ids with equal minimum and maximum flux values, ’Essential_Rxn_genes.xls' contains genes id associated with filtered reactions, ‘genes.csv’ contains the gene names associated with the model, ‘growth_sgd.xls’ contains the gene ids and their respective growth rate.

For Topological Network Analysis:
Input: Proteome (edge list, comma separated file)
Output files: Text file with list of shortlisted proteins

 

Note:
After the tool has been executed. The next time the user executes PBIT, the files of the previous results will be shifted to a new folder which will be named according to the format - D/M/Y and time conventions according to Linux so that the files can be stored and no overlapping of results can take place.

 

6. Warnings

  • Erroneous installation of Perl/Bioperl/Blast+/Python and their respective libraries can lead to failure in working of the tool.
  • After installation, few test runs should be done by the user to make sure that the tools installed are working without any errors.
  • Depending on the systems configuration the tool can execute in a certain amount of time, usually being few seconds – minutes.
  • All input files must be kept in the working directory along with PBIT executables.
  • User must execute PBIT as an administrator as permissions are needed to move or create files.
  • The user must have an active internet connection if they want to execute the Annotation module.
  • The user must enter the choices and syntax as instructed.
PBIT-V3