Thank you to everyone who said hello during ASMS. It was great to talk to so many users and get feedback. I appreciate all the support for Kojak and look forward to implementing new features and suggestions from everyone’s comments. Particularly, there were a lot of requests for an early release of Kojak’s newest feature, 15N-labeled analysis for identification of protein homomultimers.
Cross-links between homo-multimers are difficult to decipher, because it may be impossible to tell if these are intra-protein or inter-protein cross-links. Labeling one of the multimers with 15N and linking it to its 14N counterpart can distinguish inter-protein cross-links. The newest feature in Kojak allows this type of analysis.
To perform 15N analysis, two steps must be taken. First, the labeled protein must be duplicated in the sequence database, with a unique identifier. Here is an example:
>protein1 MASTHAKEEILSVNAQWKADRGHLSELED... >n15_protein1 MASTHAKEEILSVNAQWKADRGHLSELED...
Notice in the example above that the sequences are identical, but the labeled protein now has “n15” as a unique identifier. The second step is to add the following parameter to your Kojak configuration file:
n15_filter = n15
Here, the new n15_filter parameter indicates that any protein name prefaced with “n15” will have the mass of 15N incorporated into all its amino acids. Also note, that if you have many other proteins that are not labeled, they will be analyzed using the normal 14N masses so long as they do not begin with “n15” in their protein name.
This newest feature has not been fully tested yet, so bugs may still exist. Despite this, there seemed to be overwhelming support from users to start using this new feature now, so I am providing my developers version of Kojak (1.6.0-dev). Please note that this is outside my usual release cycle, and should be considered alpha software that may change before the official release. Also, I can only provide it for Windows 64-bit at this time. I will be out of the lab for most of June, but please email me any feedback and I will respond as soon as I am able.
I’ve always preferred file formats with richer meta information, but that doesn’t diminish the fact that there are a lot of MGF data files floating around. Simply converting them back to mzML doesn’t restore the lost metainformation. This fact turned out to be problematic for a lab that had only MGF files for DDA scan events, and did not have the original raw files containing the precursor scans. Kojak did not work after converting the MGF files to mzML format, because it was assumed the user could provide the precursor scans in the data file. As a result, it would appear MGF files were not supported.
The solution was to add a new parameter to version 1.5.5: precursor_refinement
This parameter toggles the Kojak precursor analysis routines. These routines must be disabled when there are no precursor scans in the data file. But it is not limited to MGF files. The parameter can also be used with mzML and mzXML files. When skipping precursor refinement, Kojak will use the instrument-predicted precursor mass to define peptide search boundaries. If a scan does not have a predicted precursor mass, the selected ion m/z and predicted charge states will be used. Optimal performance is most frequently achieved when using the most accurate precursor mass possible, and so it is recommended to keep precursor refinement turned ON. But in cases where the precursor scans are no longer available, this option just isn’t possible. By default, precursor refinement is turned on. Set precursor_refinement = 0 in your configuration file to disable it.
A consequence of this update is better MGF support. Hopefully few, if any, MGF files will fail with Kojak. Please notify me if you have any such cases. Otherwise, search away on your MGF collections - no conversion to other formats necessary.
It all started with a simple request on the code repo message board: could the variable modifications on peptide c- and n-termini be restored? To give this post a little context, early versions of Kojak allowed for specification of modifications on the c- and n-termini of peptides using $ and @, respectively, as amino acid wildcards. Admittedly, this was not well thought out. The issue of modifications is much broader. Fixed or differential modifications? To the peptide termini or protein termini? Single or multiple modifications to the same amino acid? At the time, there was conflict in the code on how to resolve some of these considerations, so the peptide termini modifications were quietly shelved in favor of the typically necessary (in XL experiments) protein termini modifications that were causing the conflict.
This latest update (1.5.4) restores the peptide termini modifications. More importantly to users, there have been some parameter and syntax changes designed to clarify and facilitate use of modifications in Kojak. The changes in the Kojak code were not trivial, and this is really a discussion for a different time. However, the user interface need not reflect that complexity. So here is a brief summary of the new user interface, which can be explored in more detail in the parameter documentation.
All differential modifications to peptides are specified using the modification parameter. The parameter accepts a single uppercase amino acid letter and the differential mass. A lowercase ‘c’ or ‘n’ is used to specify the modification is on the peptide c-terminus or peptide n-terminus, respectively. If more than one differential modification is required, specify a unique modification parameter line for each modification. As many as you want. You can even list the same amino acid in multiple lines with a different modification mass each time. This can be used, for example, to identify singly, doubly, and triply methylated lysines.
Differential modifications to protein termini are specified with their own modification_protC and modification_protN parameters. These parameters need only the differential mass as values. It is possible to specify more than one differential protein modification with multiple instances of these parameters.
Fixed modifications are changes to the mass values that are applied to all instances of the specified amino acids and termini. The syntax for fixed modifications to peptides is identical to the syntax for differential modifications, except the parameter is named fixed_modification. The amino acid is specified in upper case. The peptide c-terminus or n-terminus is specified with lowercase ‘c’ or ‘n’, respectively. Add multiple fixed_modification lines to the Kojak configuration file to indicate multiple mass differences in the analysis.
Like the differential modifications to protein termini, fixed modifications can also be specified to the protein c-terminus or n-terminus. Use fixed_modification_protC and fixed_modification_protN to specify these mass differences. These parameters need only the mass as values.
To summarize, parameters to indicate modifications have been expanded from two to six. Under the new rules, there are specific ways to indicate the mass differences be applied to the peptides and peptide termini, or the protein termini. All mass values are in addition to the existing default amino acid values. There are no special characters (e.g. $ and @) to specify termini. Use lowercase ‘c’ and ‘n’ to specify peptide termini modifications, or the appropriate protein termini modification parameters.
Chemical cross-linkers pose a special case. They frequently target multiple sites at the protein level (multiple amino acids and the protein termini). Cross-linker that binds on only one side (hydrolyzing on the other side) can therefore create a diverse set of differential modifications to search for. Rather than list all possibilities as a large set of modification parameters, Kojak has a simple shortcut: the mono_link parameter. This parameter accepts a set of amino acid characters. It also accepts ‘c’ or ‘n’ to specify the protein C-terminus or N-terminus, respectively. The final value is the differential mass to apply. This is a very convenient shortcut. For example, by specifying “mono_link = cDE -0.9837153”, the necessary parameters to define an EDC mono-link has been reduced from three differential modification parameters to a single mono_link parameter.
This month’s release rolls out additional bugfixes.
A buffer overrun was intermittently causing crashes with using the turbo_button in searches. The fix improves the stability of the turbo_button searches. Again, this feature is still in the experimental stages, thus the ability to disable it is in the parameters file. An additional bugfix corrected cases where the n- and c-terminal modifications of loop-links were being reported elsewhere in the peptide.