I’ve always preferred file formats with richer meta information, but that doesn’t diminish the fact that there are a lot of MGF data files floating around. Simply converting them back to mzML doesn’t restore the lost metainformation. This fact turned out to be problematic for a lab that had only MGF files for DDA scan events, and did not have the original raw files containing the precursor scans. Kojak did not work after converting the MGF files to mzML format, because it was assumed the user could provide the precursor scans in the data file. As a result, it would appear MGF files were not supported.
The solution was to add a new parameter to version 1.5.5: precursor_refinement
This parameter toggles the Kojak precursor analysis routines. These routines must be disabled when there are no precursor scans in the data file. But it is not limited to MGF files. The parameter can also be used with mzML and mzXML files. When skipping precursor refinement, Kojak will use the instrument-predicted precursor mass to define peptide search boundaries. If a scan does not have a predicted precursor mass, the selected ion m/z and predicted charge states will be used. Optimal performance is most frequently achieved when using the most accurate precursor mass possible, and so it is recommended to keep precursor refinement turned ON. But in cases where the precursor scans are no longer available, this option just isn’t possible. By default, precursor refinement is turned on. Set precursor_refinement = 0 in your configuration file to disable it.
A consequence of this update is better MGF support. Hopefully few, if any, MGF files will fail with Kojak. Please notify me if you have any such cases. Otherwise, search away on your MGF collections - no conversion to other formats necessary.
It all started with a simple request on the code repo message board: could the variable modifications on peptide c- and n-termini be restored? To give this post a little context, early versions of Kojak allowed for specification of modifications on the c- and n-termini of peptides using $ and @, respectively, as amino acid wildcards. Admittedly, this was not well thought out. The issue of modifications is much broader. Fixed or differential modifications? To the peptide termini or protein termini? Single or multiple modifications to the same amino acid? At the time, there was conflict in the code on how to resolve some of these considerations, so the peptide termini modifications were quietly shelved in favor of the typically necessary (in XL experiments) protein termini modifications that were causing the conflict.
This latest update (1.5.4) restores the peptide termini modifications. More importantly to users, there have been some parameter and syntax changes designed to clarify and facilitate use of modifications in Kojak. The changes in the Kojak code were not trivial, and this is really a discussion for a different time. However, the user interface need not reflect that complexity. So here is a brief summary of the new user interface, which can be explored in more detail in the parameter documentation.
All differential modifications to peptides are specified using the modification parameter. The parameter accepts a single uppercase amino acid letter and the differential mass. A lowercase ‘c’ or ‘n’ is used to specify the modification is on the peptide c-terminus or peptide n-terminus, respectively. If more than one differential modification is required, specify a unique modification parameter line for each modification. As many as you want. You can even list the same amino acid in multiple lines with a different modification mass each time. This can be used, for example, to identify singly, doubly, and triply methylated lysines.
Differential modifications to protein termini are specified with their own modification_protC and modification_protN parameters. These parameters need only the differential mass as values. It is possible to specify more than one differential protein modification with multiple instances of these parameters.
Fixed modifications are changes to the mass values that are applied to all instances of the specified amino acids and termini. The syntax for fixed modifications to peptides is identical to the syntax for differential modifications, except the parameter is named fixed_modification. The amino acid is specified in upper case. The peptide c-terminus or n-terminus is specified with lowercase ‘c’ or ‘n’, respectively. Add multiple fixed_modification lines to the Kojak configuration file to indicate multiple mass differences in the analysis.
Like the differential modifications to protein termini, fixed modifications can also be specified to the protein c-terminus or n-terminus. Use fixed_modification_protC and fixed_modification_protN to specify these mass differences. These parameters need only the mass as values.
To summarize, parameters to indicate modifications have been expanded from two to six. Under the new rules, there are specific ways to indicate the mass differences be applied to the peptides and peptide termini, or the protein termini. All mass values are in addition to the existing default amino acid values. There are no special characters (e.g. $ and @) to specify termini. Use lowercase ‘c’ and ‘n’ to specify peptide termini modifications, or the appropriate protein termini modification parameters.
Chemical cross-linkers pose a special case. They frequently target multiple sites at the protein level (multiple amino acids and the protein termini). Cross-linker that binds on only one side (hydrolyzing on the other side) can therefore create a diverse set of differential modifications to search for. Rather than list all possibilities as a large set of modification parameters, Kojak has a simple shortcut: the mono_link parameter. This parameter accepts a set of amino acid characters. It also accepts ‘c’ or ‘n’ to specify the protein C-terminus or N-terminus, respectively. The final value is the differential mass to apply. This is a very convenient shortcut. For example, by specifying “mono_link = cDE -0.9837153”, the necessary parameters to define an EDC mono-link has been reduced from three differential modification parameters to a single mono_link parameter.
This month’s release rolls out additional bugfixes.
A buffer overrun was intermittently causing crashes with using the turbo_button in searches. The fix improves the stability of the turbo_button searches. Again, this feature is still in the experimental stages, thus the ability to disable it is in the parameters file. An additional bugfix corrected cases where the n- and c-terminal modifications of loop-links were being reported elsewhere in the peptide.
A small bug was identified that causes some data corruption with a memory buffer shared by multiple threads. It was tough to replicate because it typically required a lot of modifications on a small dataset with a lot of threads to produce, but it was there nonetheless. The bug has been fixed and Kojak operation returned to normal.