PhosphoHunter is a software tool for phosphopeptide identification from tandem mass spectrometry data.
It can be downloaded freely for non-profit institutions from http://aimed11.unipv.it/PhosphoHunter. Other organizations have to require a write authorization to mspi@aimed11.unipv.it before using this software. The implemented procedure of the version 1.0 was described in the paper "PhosphoHunter: an efficient phosphopeptide identification software tool" by Tiengo et al..

In the distribution directory there are the following files:

- README (this file) contains the main instructions for installing the software tool and for using Perl scripts and associated ASCII files.

- PhosphoHunter_x.x.zip files containing the x.x version of the software tool.

Perl and text files are here briefly described.


How to install PhosphoHunter
-------------------

1) If the Perl distribution is not already installed, download the ActivePerl distribution from the site http://www.activestate.com/Products/activeperl (or your preferred Perl distribution) and install it.

2) Download the desired version of the tool (PhosphoHunter_x.x.zip file) and extract all files in the PhosphoHunter root directory with your preferred software.

3) Install the required Perl modules (see the following section) if they are not yet installed.


Required Perl modules
---------------------

- Cwd -> This module provides some functions for determining the path name of the current working directory.

- Lwp -> This module provides the program "lwp-download" to download files from a specified ftp address.

- IO::Zlib -> This module provides some functions to read and write gzip/zlib compressed files.


Configuration file
------------------

- settings.ini -> This file contains the main settings parameters for PhosphoHunter tool.
It is stored in the PhosphoHunter_x.x/src directory.
Each line ends with a new line.
The parameters are:
* the platform used (e.g.: WIN or other)
* the name of the FASTA database stored in the PhosphoHunter_x.x/ directory
* the organism of interest
* the maximum number of consecutive MCs allowed (for creating the composite database, e.g. 2)
* the lower bound of the acquisition mass range (e.g. 800)
* the upper bound of the acquisition mass range (e.g. 5000)
* the MW % tolerance window on the electrophoretic MW (from 0 to 1)
* the pI tolerance window on the electrophoretic pI (e.g. 1)

Perl scripts
------------

They are stored in the PhosphoHunter_x.x/src directory.

- create_database.pl -> This script creates the composite (reference and random) peptide database starting from the FASTA database stored in the PhosphoHunter_x.x/tmp directory
It uses the files table_mw.txt and table_pi.txt for computing the molecular weight (MW) and the isoelectric point (pI) of each protein. It includes for all the proteins the missing cleavages (MCs) and the post-translational modifications (PTMs) and it performs the in silico digestion of each protein in the specified acquisition range.

- phosphopeptide_ID.pl -> This script performs the phosphopeptide identification. An input file with the main parameters (see the "Example of input file" section) is needed as argument of the command line. 
If the files "compare" or "compare_win.exe" are present in the folder, they are used to perform the comparison between experimental and theoretical MS/MS peaks, speeding up the searching step.
If they are not present, the searching step is anyway performed by using a Perl routine. 
The results of the identification are written in the PhosphoHunter_x.x/results directory in the file chosen by the user.

- merge.pl -> This script merges all the .DTA files into a single ASCII file. It has to be stored and run in the same directory of the .DTA files. The name of the output file is needed as argument of the command line.

- compareC.c -> This C file performs the comparison between experimental and theoretical MS/MS spectra. It has to be compiled from the command line. The compiled file must to be named as "compare" for Linux platform or "compare_win.exe" for Microsoft operating systems.

Text files
----------

They are stored in the PhosphoHunter_x.x/src directory.
Each line of these files ends with a new line.

- table_mw.txt -> It contains the amino acids list and their monoisotopic and average weights. The amino acids are represented by an alphabetic letter. The element on each line are spaced by a tab character:
*A	71.03711	71.0788
*C	103.00919	103.1388
*...

- table_pi.txt -> It contains the amino acids and the groups that influence the pI of a protein. Each line reports the amino acid letter, the pKr and
the charge polarity spaced by a tab character.
*Y	10.07	-
*H	6.0	+
*C_term	3.1	-
*N_term	8	+
*...

- table_ptm.txt -> It contains the PTMs that can be included in the reference protein database. Each line reports the name, the amino acid involved
(alphabetic letter), the monoisopic weight, the average weight and the type (fixed F or variable V) spaced by a tab character:
*CAM	C	57.021464	57.0513	F
*...


Example of input file
---------------------

This is an example of input file required to perform a protein identification with phosphopeptide_ID.pl routine.
It must be stored in the PhosphoHunter_x.x/data directory.

Each line of this file ends with a new line.

* peak list file (e.g. band.txt)
* results file (e.g. results.txt)
* electrophoretic MW (0 if unknown)
* electrophoretic pI (0 if unknown)
* peptide mass tollerance (ex. P 100 or D 1) 
* fragment mass tollerance (ex. P 100 or D 1)
* lower bound of the acquisition mass range
* upper bound of the acquisition mass range
* p-value threshold (if 0 no statistical validation will be performed)
* intensity threshold (for the neutral loss detection, e.g. 40);
* maximum number of consecutive MCs allowed in the search (e.g. 2)
* number of PTMs allowed in the search (e.g. 2)
* specify whether the experimental mass values are average or monoisotopic (1 for monoisotopic and 2 for average)

If the p-value threshold is fixed to 0, the statistical validation of the results will be not performed.


Output file
---------------------

The output file reports the list of candidate phosphopeptides grouped by protein.
It is stored in the PhosphoHunter_x.x/results directory.
For each protein, the following information are reported: the accession number, the ID, the organisms, a brief description, the MW, the PI and the list of phosphopeptides identificated.
For each phosphopeptide in the list the following information are reported: the sequence, the experimental mass, the charge state, the scan, the theoretical mass, the delta mass, the fraction of most intense ions matched (out of the 10 most intense ions), the number of theoretical ions produced, the start and stop position on the protein, the number of matched ions, the score obtained and the list of the ions matched.
Every ion matched is labelled with a tag and the experimental and the theoretical mass and its intensity are reported. 

For example:
1) Q9JHR4	Q9JHR4_MOUSE	Mus musculus	Myosin heavy chain IIB (Fragment) 	61030.60	5.19	

Peptide IDQEKsELQASLEEAEASL	Mass: 2167.6473	Charge: 2	Scan: 5190_5190	Mass db: 2168.9725	Delta mass: -1.3252	Ratio: 0.7	Ions: 664	Position: 121	140
	Ions matched: 32	Score: 31.98
		y1h2o_9_2_0p	465.9897	466.99	12.11
		y1h2o_10_2_0p	501.5290	501.37	13.06
		b1h2o_12_2_0p	646.6369	647.52	28.50
		b_13_2	712.2242	713.35	31.35
		...

The ion's tag has the format AB_C_D_E. The letters A,B,C,D and E indicate:
* A: the ion type (b or y)
* B: the number of water or ammonia losses (1h2o, 2h2o, 1nh3...). If no loss occurs, B is not present 
* C: the number of residues in the fragment
* D: the charge
* E: the number of neutral loss of phopshoric acid. If no loss of phopshoric acid occurs, E is not present 

Also a file containing a summary of the results is produced.
The summary reports the list of candidate phosphopeptides ranked by the score and grouped by the scan.
Each line contains a candidate phosphopeptide and its information.


How to run PhosphoHunter
---------------

1) Create the input file (see Example input file section)

2) Open the settings.ini file in the PhosphoHunter_x.x/src directory to configure some parameters. In particular, set:

 - the used platform
 - the name of the FASTA database stored in tmp folder
 - the organism of interest
 - the maximum number of consecutive MCs allowed
 - the lower bound of the acquisition mass range 
 - the upper bound of the acquisition mass range 
 - the MW % tolerance window on the electrophoretic MW 
 - the pI tolerance window on the electrophoretic pI 

3) Open the table_ptm.txt file in the PhosphoHunter_x.x/src directory to set the PTMs
to be included in the composite database.

4) Run the Perl routines as reported here:

 - to create the file of input MS/MS spectra copy the script merge.pl in the correspondent directory and then: perl merge.pl output_file_name
 - to create the database: perl create_database.pl
 - to perform phosphopeptide identification: perl phosphopeptide_ID.pl input_file 
 
 - to compile the C script: gcc compareC.c -o compare 
 			    gcc compareC.c -o compare_win
 			    
 
 