Extreme genetic diversity in the type VII secretion system of Listeria monocytogenes suggests a role in bacterial antagonism

Received 01 October 2020; Accepted 26 January 2021; Published 18 February 2021 Author affiliations: Microbes in Health and Disease Theme, Newcastle University Biosciences Institute, Newcastle University, Newcastle upon Tyne, NE2 4HH, UK. *Correspondence: Tracy Palmer, tracy. palmer@ newcastle. ac. uk


INTRODUCTION
Protein secretion systems are found in almost all bacteria and are critical for processes such as nutrient capture and niche adaptation. To date, 10 distinct protein secretion systems (named types I-X) have been identified [1][2][3]. While most of these are exclusive to Gram-negative bacteria, the type VII secretion system (T7SS) is encoded by many Grampositive Actinobacteria and Firmicutes [4,5]. The T7SS has been heavily studied in pathogenic mycobacteria, where it is essential for virulence [6][7][8]. It is also linked to virulence in Staphylococcus aureus, supporting abscess formation during persistent infection [9,10].
The T7SS comprises two subtypes -the T7a found in Actinobacteria and T7b in Firmicutes, that although related, have distinct differences [11,12]. A hexameric membranebound ATPase, termed EccC in Actinobacteria and EssC in Firmicutes, is common to both systems and is a critical component of the T7SS [13,14]. It lies at the centre of a multisubunit membrane complex that mediates protein transport across the cytoplasmic membrane [15][16][17]. The protein has four ATP-binding domains in its cytoplasmic C-terminus, each of which is essential for secretion activity [13,14,[16][17][18][19], and the most C-terminal of these domains also plays a role in substrate recognition [13,20,21]. Substrates are recognized through a short C-terminal signal sequence, and secretion appears to be post-translational because at least some substrates are exported in a folded form [22][23][24].
The S. aureus T7SS is encoded at the ess locus. Although part of the core genome, it was noted that there is significant genetic diversity at the ess region across S. aureus strains [33]. Sequence divergence occurs part way through essC, giving rise to four EssC variants that differ in their C-terminal domains. Each essC variant is associated with a specific set of genes encoding T7 substrate proteins, and the variable C-terminal domain of EssC has been implicated in the recognition of strain-specific substrates [21,33].
The most commonly studied S. aureus strains encode the EssC1 variant (Fig. 1c). These strains produce a T7-secreted nuclease, EsaD, a member of the YeeF protein family, and are protected from self-intoxication by an immunity protein, EsaG [34,35]. A chaperone protein, EsaE, interacts with both EsaD and EssC and likely targets the nuclease to the secretion machinery [34,36]. Strains harbouring essC1 account for approximately 50 % of sequenced S. aureus [33]. Analysis of EsaD sequences across these strains shows that the nuclease domain is polymorphic, suggestive of an anti-bacterial toxin [34]. All S. aureus strains, including essC2, essC3 and essC4 strains that do not encode EsaD, have multiple genes coding for EsaG homologues at their ess loci, suggesting that EsaD targets staphylococci (Fig. 1c) [33,34]. An anti-staphylococcal role for EsaD has been confirmed in competition experiments, where strains were sensitized to killing by the nuclease when esaG genes were deleted [34,37].
Since the identification of EsaD, a growing number of T7-secreted antibacterial toxins have been described. TspA, a second polymorphic toxin found in S. aureus, has membranedepolarizing activity that is neutralized by the TsaI immunity protein [37]. Streptococcus intermedius produces at least three antibacterial toxins, of which TelB is an NADase and TelC a lipid II phosphatase [30]. Moreover, recent reports indicate that the T7b systems of Bacillus subtilis and Enterococcus faecalis also have antibacterial activity [38,39], indicating that bacterial antagonism is a common feature of several Firmicutes T7SSs.
Listeria monocytogenes is a Gram-positive firmicute and human foodborne pathogen that is able to invade and replicate intracellularly [40]. A T7SS is encoded in the genome and has been partially characterized in strains EGDe and 10403s, where it was shown to be dispensable for virulence and host cell invasion [32,41]. Here we have analysed the genetic diversity at the T7 gene clusters of sequenced L. monocytogenes strains. We show that seven variants of EssC, which differ in their C-terminal region, are encoded across L. monocytogenes. Each variant is associated with specific genes encoding likely substrate proteins. We identify polymorphic toxins that fall into the YeeF (EsaD) and LXG families and clusters of immunity proteins for self and nonself toxins. Our observations point to a major role for the T7SS in interbacterial competition.

METHODS
L. monocytogenes EGDe serovar 1/2a (NC_003210.1) was used as the reference strain in this analysis. EssC, encoded by lmo0061, was used as the query for Basic Local Alignment Search Tool Protein (blastp) analysis using the National Center for Biotechnology Information's (NCBI's) Reference Sequence (RefSeq) database of non-redundant protein sequences [42]. All identified L. monocytogenes EssC homologues were assigned a percentage identity to Lmo0061 using the BLOSUM62 scoring matrix and an expect threshold of 10. EssC sequence variants were aligned using the clustalw algorithm of the European Bioinformatic Institute (EBI) via the Clustal omega software (v.1.2.4) [43], and protein sequence alignments were viewed in Jalview [v.2.11.1.0] [44]. EssC sequences were further analysed through the construction of maximum-likelihood phylogenetic trees using the megax software package (v.10.1.8) [45,46]. The Pasteur Institute's Listeriomics tool [47] was used to acquire sequences of EssCs across 60 annotated L. monocytogenes genomes with information on lineage, clonal clusters and serotypes. The presence of sigma 70 promoters and terminator regions within the T7b locus was evaluated using bprom and FindTerm, respectively [48]. Candidate Rho-independent terminator regions were analysed using ARNold [49].
Protein accession numbers were subjected to flanking genes (FlaGs) analysis [50] to establish the conservation of T7b genes across multiple L. monocytogenes strains. Gene products were analysed using blastp analysis, and the presence of transmembrane regions was predicted using TMHMM [51]. Homology modelling was achieved using the Protein Homology/analogy Recognition Engine (PHYRE [52]).

The T7SS gene cluster of L. monocytogenes
The functional components of the L. monocytogenes T7SS are encoded contiguously from lmo0056 (esxA) through to lmo0061 (essC; Fig. 1a; all gene numbers relate to type strain EGDe), with the gene order mirroring that of the S. aureus ess locus. In S. aureus, a peptidoglycan hydrolase, EssH, is encoded divergently to esxA and is essential for T7 substrate secretion across the cell wall [53]. Analysis of the genomic region immediately upstream of esxA in L. monocytogenes did not identify a homologue of essH at this locus, or elsewhere on the chromosome. A stem-loop structure is present in the S. aureus esxA-esaA intergenic region [25]. A similar stemloop with an estimated Gibbs free energy of −13.6 kcal mol −1 and sharing 62 % identity with that of S. aureus, is conserved across the sequenced L. monocytogenes genomes. In S. aureus the stem-loop plays a role in regulating the expression level of the genes immediately downstream of esxA, which are transcribed 10-100 fold lower than esxA [25,54]. It likely has a similar role in L. monocytogenes, as it has been reported that essC and essB are much more weakly expressed than esxA in strain EGDe [41].

Seven distinct EssC variants are encoded across L. monocytogenes strains
Analysis of the amino acid sequence of the T7 core components indicates that five of them (EsxA, EsaA, EsaB, EssA and EssB) have high sequence conservation (approximately 97 % identity across strains present in RefSeq). By contrast, EssC sequences are much more variable, with less than 80 % identity between strains. To further analyse the sequence variability of EssC, all L. monocytogenes full-length EssC sequences (274 in total) were extracted from NCBI RefSeq and aligned. It was observed that the proteins fall into seven different groups, here named EssC1-7. The sequences of EssC1 and EssC3 diverge from the other EssC groups at around the start of domain D2 (Figs 1a and S1, available in the online version of this article). The five other EssC variants are almost invariant in their proximal D2 sequences, starting to diverge from one another approximately 180 amino acids into this domain. As shown in Table 1, EssC1 is the most common variant, being encoded in 26 % of sequenced L. monocytogenes strains, and EssC6 the least common, with only 3 % occurrence.
To assess the distribution of essC variants across clonal complexes and evolutionary lineages [55,56], the 65 complete L. monocytogenes EssC sequences were extracted from the Pasteur Institute's publicly available annotated collection and used to construct a maximum-likelihood phylogenetic tree (Fig. 2). While essC6 strains are notably absent from the phylogenetic tree, the six other EssC variants are represented. The tree divides into two main branches, with the first comprising EssC2, EssC4, EssC5 and EssC7, and the second solely comprising EssC1. As with the NCBI RefSeq database, the majority of the Pasteur Institute sequences encode EssC1 or EssC2, which both show distribution across evolutionary lineages I and II. The remaining essC variants are confined to a single lineage, with the exception of L. monocytogenes FSL J1-208, which is the only essC7 strain to belong to lineage IV out of those analysed. The four essC3 strains belonging to CC69 of lineage III cluster outside of the main two branches away from the other variants.

Two variable genomic regions are found at the T7 gene cluster of L. monocytogenes
It was noted for S. aureus that genetic diversity at the ess locus starts within the essC gene and continues downstream [33]. Inspection of the genomic region 3′ of L. monocytogenes essC reveals that it is exceptionally diverse across different strains, in agreement with prior studies that identified this region as a hypervariable hotspot [57,58]. We identified two closely spaced regions of high variability (Figs 3a and S2). The first of these variable regions is bounded by essC at the 5′ side and at the 3′ side by a cluster of housekeeping genes (lmo0075-lmot01; encoding a predicted phosphoenolpyruvate mutase, 6-O-methylguanine DNA methyltransferase, a protein of the YjbI superfamily, a predicted d-isomer-specific 2-hydroxyacid dehydrogenase and a tRNA, respectively). The second variable region lies between lmot1 at the 5′ end and lmo0082 (encoding a hypothetical membrane protein) at the 3′ end (Fig. 3a).

YeeF family toxins are encoded in variable region 1 of L. monocytogenes essC1 strains
Although variable region 1 is highly diverse across L. monocytogenes, strains with the same essC variant show some common features. For example, a shared set of genes are encoded immediately downstream of essC in all essC1 strains (Figs 3b and S2). The first three of these genes, lmo0062, lmo0063 and lmo0064, are syntenous with the three genes that are adjacent to essC1 in S. aureus strains (Fig. 1c), and encode homologues of the substrate proteins EsxC (Lmo0062) and EsxB (Lmo0063), and the chaperone protein EsaE (Lmo0064). Lmo0065 is also common to all essC1 strains and encodes a small helical protein with some limited similarity to S. aureus EsxD.
The essC1 sequences start to diverge at lmo0066. In each essC1 strain this gene codes for a protein of the YeeF superfamily, to which the S. aureus toxin EsaD also belongs. However, we note that the encoded proteins have a highly variable C-terminal toxin domain across variant 1 strains. In total, 10 different toxin domains could be identified (Fig.  S3, Table 2). In each case, the N-terminal 400 amino acids (the 'YeeF domain') are highly conserved and this region is predicted to be almost completely α-helical. In S. aureus   variant 1 strains, the EsaE chaperone recognizes the YeeF domain of EsaD, and it is likely that this highly conserved domain in L. monocytogenes essC1 toxins is also recognized by EsaE. The toxin domains are predicted to have functions such as ADP ribosyltransferase, ribonuclease or NADase activity (Table 2), and each toxin is always paired with the same candidate immunity protein encoded directly downstream. Some essC1 strains encode two YeeF domain proteins within this region, and some also encode a protein with a tuberculosis necrotizing toxin (TNT) domain, although this lacks a detectable YeeF domain (Figs 3b and S2). Other genes found within this region of essC1 strains encode toxin fragments and orphan immunity proteins.

A conserved set of genes are found downstream of essC2
All essC2 strains have a common set of genes immediately downstream of essC ( Fig. 3c; numbered 21-28 for variant 2 genes 1-8). Most of these genes encode small proteins (>150 amino acids) of unknown function. The largest protein encoded in this cluster, represented by B0647_RS15290 (gene 26 in Fig. 3c) is approximately 320 amino acids in size and is strongly predicted to be a d-isomer-specific 2-hydroxyacid dehydrogenase family protein. Curiously, this is the same predicted function as that for the gene encoded by lmo0078 in the invariant housekeeping cluster, although the two proteins have no detectable sequence identity. Two of the genes in the essC2 variable region 1 cluster encode related proteins (24 in Fig. 3c). This duplication is seen in all strains, and there is always an approximate 500 bp intergenic region between the last two genes in the cluster.
Genes 24-28 in this cluster seem to form a module of six genes (always in the order 24-25-26-27-28-24; with an extended intergenic region between genes 28 and 24). This module is often but not always found in variable region 1 of other essC variants, and sometimes additional copies of gene 24 also flank this cluster (Fig. S2). Within this module genes 24 and 25 may form a pair, as they are occasionally co-occur away from the other genes of the module. Many orphan copies of 24 are also found in the T7 gene clusters of most essC variants, suggesting that it may encode an immunity protein.
Downstream of these nine conserved genes the essC2 strains are much more variable, but many strains encode multiple predicted antitoxins, including to the YeeF and LXG domain toxins present in other L. monocytogenes strain variants.

essC3 strains may encode a novel toxin
Many essC3 strains analysed have a very short variable region 1, comprising two genes of unknown function (Fig. 4a). The second gene of the pair encodes a protein of approximately 660 amino acids, which is a similar size to other T7b toxins, although it is unrelated to either the LXG or YeeF domain proteins. It therefore may represent a novel toxin. A few essC3 strains have larger gene clusters in variable region 1, with a few carrying the six-gene module [24,25] that is present in all essC2 strains. Orphan immunity proteins to other T7b toxins are also found in this region in some strains.

Seven conserved genes are found downstream of essC4
The first seven genes downstream of essC4 are conserved in all essC4 strains (41-47 in Fig. 4b). Most of the encoded proteins are <200 amino acids; the largest protein encoded by the cluster, at 433 amino acids, is the product of gene 44. Most of the proteins lack any identifiable domains, with the exception of 41, which is a member of the SMC_prok_A  region. Several essC4 strains also encode a copy of the essC2 six-gene module at this locus (Fig. 4b).
essC5, essC6 and essC7 strains encode an LXG toxin in variable region 1 Variable region 1 of essC5, essC6 and essC7 strains shares some superficial similarity (Fig. 5). An LXG family toxin is always encoded by the third gene downstream of essC, followed by a probable immunity protein. The three toxins (toxin A in essC5 strains, toxin B in essC6 strains and toxin C in essC7 strains) share some limited sequence identity (approximately 15-20 %; Table 3, Fig. S4) in the first ~270 amino acids, but are less similar in the C-terminal toxin domains. Nonetheless, all three proteins can be structurally modelled with   Recently it has been shown that TspA, a T7-secreted protein from S. aureus that can also be modelled on the colicin Ia, is a membrane-depolarizing toxin [37], suggesting that these three L. monocytogenes toxins may also have a similar mode of action. The candidate immunity proteins for these toxins fall into three unrelated protein families (Figs 5 and S2, Table 3).
The essC5 strains always encode a pair of small proteins of unknown function in-between essC5 and the toxin A gene. Downstream of the candidate immunity gene the essC5 strains start to diverge (Fig. 5a), with some strains encoding the six-gene module found in essC2 strains, while others have the five-gene module found in essC4.
All essC6 strains sequenced to date have an identical set of genes in variable region 1 (Fig. 5b). The first gene in this region encodes a 367 amino acid product that appears to be structurally related to a metallopeptidase (match to the crystal structure of the m16b metallopeptidase subunit from Sphingomonas sp. a1 with 96 % confidence; HYRE [52]). The second gene encodes a 93 amino acid peptide that is a member of the DUF3130 family. Following the toxin and immunity gene are a cluster of five genes encoding proteins of between 136 and 151 amino acids. The final four genes in this cluster form a module (61-64) that is also found in variable region 1 of some essC5 and essC7 strains (Figs 5c and S2). The first two genes of the module are also found in some essC3 strains (Fig. 4).
The majority of essC7 strains have a relatively short variable region 1, comprising genes encoding two small proteins (115 and 113 amino acids, respectively) alongside the toxin-and immunity-encoding genes (Fig. 5c). A few strains also have the essC6 four gene module at this region (Fig. 5c).

Some essC variant 1 strains encode truncated EssC2, EssC3 and EssC4 variants
During our analysis we observed that some essC1 strains encode the conserved sets of genes found in variable region 1 of other essC variant strains. For example, strain N11-1255 encodes the complete eight-gene module [21][22][23][24][25][26][27][28] found in essC2 strains, strain FSL L7-0763 encodes the candidate toxin found in essC3 strains and strain CFSAN026581 encodes the full seven-gene module from essC4 strains (Fig. 6a). In each of these cases, a truncated EssC protein comprising the C-terminal ~728 amino acids is also encoded immediately preceding these 'foreign' genes. The truncated EssCs (which are annotated as pseudogenes on the FlaGs output in Fig.  S2) are much shorter than the full-length EssC1 protein encoded by these strains; each one aligns to the canonical EssC sequence at residue 771 (Fig. 1d) and has a potential initiator methionine a few codons away. Strikingly, these truncated EssCs do not have the EssC1 variable region, instead the variable region is from the same essC variant as the foreign genes they harbour (i.e. essC2 for N11-1255, essC3 for FSL L7-0763 and essC4 for CFSAN026581).
We also noted that some essC1 strains only encode part of the essC2 and essC4 conserved modules (Fig. 6b). For example, strain PNUSAL000361 harbours genes 24-28 of the essC2 module but lacks genes 21-23, and strain 25B09 encodes the final four genes [44][45][46][47] of the essC4 conserved cluster, including the shorter version of gene 44 (Fig. S2). In these instances, no truncated EssC is encoded at the gene cluster. These findings suggest that the first three genes at these conserved clusters may be involved in mediating interaction with the variable region of the cognate EssC, potentially to target substrates or to promote functional assembly of the secretion system.

An LXG toxin is encoded at variable region 2 of many strains
The second variable region lies between the tRNA gene lmot01 and lmo0082, which encodes a small membrane protein of unknown function (Fig. 3a). We noted that while approximately 25 % of strains have no 'insertion' in this region, many other strains have a small locus of two or three genes relating to the T7SS (Figs 7 and S2). Commonly the locus comprises a gene encoding an LXG toxin, along with one or two genes encoding probable immunity proteins (Fig. 7). Four different toxins were identified in variable region 2 across strains (Table 3; toxins D-G; Fig. S5). All four toxins have a highly conserved N-terminal region -the first 330 or so amino acids are almost invariant, but the sequences diverge substantially in the toxin domain (Fig.  S5).
A small number of strains (approximately 4 % of those we analysed) have two genes at this locus encoding a protein of 262 amino acids with three predicted transmembrane domains at its N-terminus and a one of 207 amino acids with an N-terminal Tir domain (Fig. S2). We also observed that approximately 5 % of strains had undergone a recombination event next to the tRNA gene to introduce a recombinase gene at this locus (Fig. S2).

LXG proteins are encoded at other loci on the L. monocytogenes chromosome
To determine whether there may be LXG proteins encoded outside of the L. monocytogenes T7 gene cluster, we selected 14 strains (2 of each essC variant) from those listed in Fig.  S2 and searched the genome annotations for the term 'LXG' . This yielded 44 protein sequences containing this term, 17 of which we had not previously identified. We next entered the multiple protein accessions collectively into a single round of analysis using the FlaGs tool to see whether any of the encoding genes shared a common genomic location. The output (shown in Fig. S6 and summarized in Table 3) indicates that there are common genomic regions where toxins are likely to be encoded.
To ensure we had covered the diversity of LXG proteins in L. monocytogenes, we next searched RefSeq under the term 'Listeria monocytogenes LXG protein' which returned 569 protein accession numbers. Following manual inspection to see whether they differed from previously identified toxins, we found a further 10 unique protein sequences.
To identify the chromosomal loci where these additional LXG proteins were encoded, we used two accessions for each toxin that we identified that is encoded outside of the T7 locus and collectively submitted them for flanking gene analysis (shown in Fig. S7 and summarized in Table 3).
It is clear that there are preferred chromosomal locations where LXG proteins are encoded, shown on a map in Fig. 8. We noted that a cluster of ABC transporter genes (lmo0135/lmo0152) is a hotspot for LXG protein-encoding genes, and across L. monocytogenes strains we detected  Table 3) encoded at that region. Seven unique LXG proteinencoding sequences were also identified at an insertion site between DNA-directed RNA polymerase-and transketolase (lmo0334/lmo0342)-encoding genes (toxins S-Y in Table 3). As far as we could determine, each individual LXG protein is encoded at the same chromosomal locus across the strains. Other hotspots were also identified, although these generally encode fewer unique sequences.

different candidate toxins (toxins H-R in
The loci share a number of common features; insertions generally encode a single LXG protein, with the exception of the insertion between the glutamine-hydrolysing GMP synthase and α/β hydrolase-encoding genes, where two LXG protein-encoding genes are arranged in a headto-head organization (Fig. S7). Also encoded at the LXG loci are the likely cognate immunity proteins and often additional orphan immunities and toxin fragments. Interestingly, in almost all instances the LXG-encoding gene is preceded by a pair of genes encoding the same two small proteins (numbered 4 and 3 in Figs S6 and 1 and 2 in Fig.  S7). Although neither of these is classified as being in the WXG100 superfamily, structural prediction suggests that they share the same WXG100 protein fold, and gene number 4 in Fig. S6/2 in Fig. S7 encodes a protein of the DUF3130/SACOL2603 family that has been linked to the T7SS by phylogenetic profiling.
It is striking to note that all of the LXG proteins encoded next to this gene pair share a similar LXG domain sequence (Fig. S8), although proteins encoded at the same loci are clearly more closely related. Whitney et al. [30] have previously shown that WXG proteins bind to their cognate LXG proteins to promote export by the T7SS. It is very likely that one or both of these small proteins plays a similar role, presumably interacting with the highly conserved N-terminal region. We also observed that three LXG proteins, toxins θ, μ and ϕ, are quite divergent in their N-terminal sequence (Fig. S8). In the case of toxins θ and ϕ, they are preceded by a different pair of small genes (346/349 and 319/324, respectively, in Fig. S7) that may encode cognate export partners. No candidate small genes immediately precede toxins σ and μ, and they therefore may rely on partner proteins encoded elsewhere in the genome. Finally, when we analysed the bspA/oxrB (lmo0460/lmo0476) locus in the genome of EGDe where we had observed LXG protein-encoding genes in other strains, we observed that this strain encodes an unusual truncated LXG domain protein at this locus (Fig. S7). The protein, Lmo0473, lacks the first 130 amino acids of a canonical LXG domain protein and is preceded by a gene encoding a protein with a bacteriophage abortive infection AbiH domain. A second AbiH domain protein is also encoded nearby, and it appears that EGDe and a few other strains may have undergone a genomic rearrangement or insertion event at the locus. A manual search using the sequence of Lmo0473 revealed that several L. monocytogenes strains encode an otherwise sequence identical protein at this locus, but it is longer and is a bona fide LXG protein (toxin α on the sequence alignment shown in Fig. S8). Curiously, these longer proteins are not found in the RefSeq database, which is used as input for the FlaGs program (and are therefore missing from Fig. S7

DISCUSSION
In this study we have undertaken a genomic survey of the T7SS of L. monocytogenes. Our results show that there is exceptional diversity across strains in both the identity of the core component EssC and the likely secreted substrates.
We identified seven variants of EssC based on the sequence of the C-terminal 500-600 amino acids. The most common variant is EssC1, which is encoded in just over a quarter of sequenced L. monocytogenes strains, including the reference strain EGDe. The genetic organization of essC1 to some extent mirrors that of S. aureus essC1 strains, with genes encoding the secreted substrates EsxC and EsxB and the chaperone protein EsaE common between the two, and sharing conserved positioning. A toxin with an N-terminal YeeF domain is found in all L. monocytogenes essC1 strains. In S. aureus EsaE interacts with the YeeF domain of EsaD and with EssC1 and appears to target the toxin to the secretion machinery [34,36]. It is likely that the YeeF domain toxins are similarly targeted in L. monocytogenes. Since we failed to identify an EsaElike protein encoded in any of the other essC variants, we speculate that other substrate proteins, such as the LXG domain proteins, do not require an EsaE-like protein for targeting to the T7SS.
Strains encoding the EssC5, EssC6 and EssC7 variants are the least common. These strains share some similarity at the essC locus; all three strains encode an LXG domain protein and likely immunity protein downstream of essC. In each case the LXG protein-encoding gene is separated from essC by two intervening genes. Based on the positioning of these genes and the fact that one or both encode small proteins, we speculate that they may be LXG partners required to mediate export. For the other three essC variants the identity of substrate protein(s) encoded immediately downstream of essC is not obvious. Strains harbouring essC3 variants always encode the same two proteins downstream, the larger of which may represent a novel secretion substrate. The essC2 and essC4 variants have clusters of genes in the vicinity of essC that mainly encode proteins of unknown function. Some or all of these genes are frequently found at the essC locus of other variant strains. Often a smaller module of five genes (essC2) or four genes (essC4) is found across other strain variants, including some essC1 strains. Curiously, however, some other essC1 strains contain the full complement of 'foreign' genes -in this case the strains always adjacently encode an 'orphan' truncated EssC covering the C-terminal 728 amino acids, which shares the same sequence as the essC variant from where the genes were acquired. It is not clear whether these shorter EssC proteins (which lack transmembrane domains) are functional and can integrate into the T7SS. Alternatively, they may serve as a site of recombination with the full-length essC gene, which would splice out the intervening genes and convert an essC1 variant into an essC2/essC4 strain.
We identified a second variable locus at the T7 gene cluster that is present in most, but not all, strains. One of four different proteins, each sharing a highly conserved LXG domain, was found to be encoded at this region. Outside of the T7 cluster we also identified additional loci where LXG proteins are encoded. In total, across all strains we identified 40 different LXG proteins. Although none of these LXG proteins have yet been characterized, some of them could be structurally modelled on nuclease, NAD + glycohydrolase and membrane-depolarising toxins. Taken together, our findings imply an important role for the T7SS secretion system in bacterial antagonism and kin discrimination in L. monocytogenes. We note that prior studies failed to demonstrate a function for the T7SS in virulence and host cell invasion, if anything finding that it had a negative impact on L. monocytogenes infection. Intriguingly, genes encoding some of the T7SS components and the YeeF domain toxin are upregulated when strain EGDe is used to infect germ-free mice that have been colonized with Lactobacillus [59], which would be consistent with a role in interspecies competition. Our findings provide a framework for the functional analysis of the T7SS across the species.

Funding information
This study was supported by the Wellcome Trust (through Investigator Award 10183/Z/15/Z to T. P.).