The Next Alpha Release
December 20 2019
This update adds a major new feature: decoy database generation. Previously, it was up to the user to supply decoys of their choice - and this is still possible. However, many other users just want to supply target protein sequences and let the algorithm worry about decoys. There are some advantages to this approach, such as complete control by Kojak over those decoys. The strategy Kojak uses for decoy generation is to reverse the amino acids between enzymatic cleavage points. The result is a set of peptides identical in number, length, and mass as the target set of peptides, but with a unique fragment pattern. There are peculiarities: short peptides are more likely to be palendromic and therefore appear in both the target and decoy sets; also, the leading methionine remains fixed, which can create an edge case in peptide counts. Importantly, though, is that each target protein has an equivalent decoy protein with the same number of peptides of equal mass. This has huge implications for downstream validation of inter- and intra-protein error estimation. This new decoy database feature requires updating the decoy_filter parameter to support the new format. Also, if you are using Kojak to generate your decoy sequences, a new FASTA file is generated in your output that contains the exact decoy (and target) sequences used in the analysis for further evaluation in any downstream tools.
November 27 2019
I’m releasing Kojak 2.0.0 for Windows and Linux (64-bit). This is an alpha release, indicating that this version is in active development, but I think it is worth using. There are many new features, such as improving scoring (e-values), a better troubleshooting interface, and performance improvements. More files are now supported, such as mzIdentML to conform with the Proteomics Standards Initiative (PSI). However, as the new features are being developed, some bugs may exist. I suspect many of them are specifically related to mzIdentML, so if you don’t use that format, you’ll probably appreciate this new version. If you use the PeptideProphet/iProphet validation of the Trans-Proteomic Pipeline, you might notice a much needed boost in performance. I have several more additions planned before this release is finalized, so check back soon. Also, as these are going to be rapid release, I’m not distributing the MSI installer for windows or with KojakViewer included. Please simply use the ZIP files provided on the Download page.
Site Refresh and More To Come
November 7 2019
There are major upcoming changes to Kojak and the website. You might have noticed that the usual release schedule had slowed down. This was to make time for the next major release of Kojak. As I’m in the process of putting it all together, it isn’t ready today, but I hope to have it ready soon. Part of the changes meant refreshing this site, as it had become a bit stale. This is also an ongoing process. So far I’ve updated the site structure and style, and I’m currently updating the tutorials and instructions. Stay tuned, there’s a lot more to come.
Update for TPP Users
December 1 2017
This update fixes an issue with validating PSMs that use 15N with PeptideProphet in TPP when using the browser front end. Thanks much for the user reports that found this edge case. Essentially, 15N-labeled proteins were being grouped with the 14N counterparts, rather than being treated as independent proteins. The solution was a quick fix to the Kojak pepXML output in version 1.6.1 AND when using TPP add ‘-nR’ to the “Enter additional options to pass directly to the command-line (expert use only!)” field of the PeptideProphet options on the Analyze Peptides page. That ‘-nR’ parameter instructs the TPP to also resist the urge to group 15N and 14N proteins into the same group based on sequence homology. Future versions of TPP will be able to figure this out on its own, but TPP is on a different release schedule.
To put this latest version of Kojak in the TPP, backup your existing Kojak.exe file in the C:\TPP\bin folder by renaming it. Then copy the new Kojak into that folder. It will then work natively in the TPP environment.
Next update to the page will include a series of new demos for using Kojak in the TPP.
Big upgrades in 1.6.0
November 27 2017
Today marks the official 1.6.0 release. There are some pretty hefty changes beyond the usual minor feature additions and bugfixes. As mentioned in the June alpha release, one new analysis type incorporates 15N for homomultimer cross-linking. Briefly, to perform 15N analysis, the labeled protein must be duplicated in the sequence database, with a unique identifier:
>protein1 MASTHAKEEILSVNAQWKADRGHLSELED... >15N_protein1 MASTHAKEEILSVNAQWKADRGHLSELED...
Then add the following parameter to your Kojak configuration file:
15N_filter = 15N
More details are provided in the June news entry, but also note that the parameter name has changed slightly for clarity.
The second big change alters the search method for cross-linked peptides. Originally, all peptides were searched individually, and a list of the top hits (user defined with the top_count parameter) was kept to find pairs of peptides that explained the spectra. This was a fast and smart way to reduce a large search space of all possible peptide combinations to just the most relevant combinations given an observed spectrum. It also had a caveat in that cross-links in which one of the peptides had little or no fragmentation could not be suggested as the best possible hit because it did not make the top hits. Furthermore, there was no clear definition of how large top_count should be set (the answer is appropriately large for your data set; 250 is just a recommended starting point). Often users would set a small value, which is appropriate for targeting large peptides with few modifications, not small peptides with many modifications. So this particular design implementation was revisited and revised.
The new method searches the upper half of the peptides (precursor mass divided by two and larger) and keeps the top hits. Then among those top hits, all remaining peptides in the database that can be paired to this list are searched and scored. This has the added benefit of testing and scoring even the smallest of peptides that would not have made it onto the list under the previous method. In some ways this is equivalent to searching more peptide combinations. At the same time, it is no longer necessary to maintain a large top_count for each spectrum, which reduces the number of peptide combinations - BUT distinctly reducing the combinations among the least likely candidates.
A additional clarifications should be made. First, there is an assumption that the larger peptide in a cross-link will have largest contribution to the final score for the PSM. If it is not, then hopefully it is at least large enough to make the top hits anyway. Second, a lot more PSMs with one good scoring peptide will make it into the results, often paired with a very small counterpart peptide. These large-to-small PSMs are the most highly suspect. Best consider this when performing validation on your PSMs by your method of choice. Third, this method also means that top_count as a parameter has been repurposed. A more appropriate value might now be 25 instead of 500. If you use a large value such as 500, there may be performance issues on systems with insufficient memory. And likely there is no benefit on any system.
Finally, there are a lot of new diagnostic features and reports. As these are intended for advanced users, I will spend more time explaining them in a future post.
Thanks everyone for your patience while this release was being prepared. I know the website documentation is now slightly out of date, but updating it is on my radar. Also, I cannot recommend Percolator for validation anymore. The lastest Percolator version (3.1) has standard, single peptide assumptions that prevent it from working with Kojak (or perhaps any cross-linking) output. I recommend switching to PeptideProphet in the Trans-Proteomic Pipeline.