AIDS, caused by the retrovirus HIV, is predicted by 2030 to become globally the single largest cause of morbidity, as measured by disability-adjusted life-years . African countries currently have the highest disease burden of HIV, with 9.2% prevalence in Addis Ababa in Ethiopia and over 10% in Dar-es-Salaam in Tanzania, yet almost all genetic studies have focused on cohorts from Western countries . The genetic architecture of HIV susceptibility in Africans is likely to be different to Europeans, yet genome-wide association studies of host susceptibility to HIV have not yielded any significant results . These studies miss regions that show copy number variation, particularly structurally complex regions that are not correlated with alleles at flanking SNP markers .
Copy number variation (CNV) is defined as the variation in copy number of a given DNA sequence in a diploid genome. CNV is common in the genome, affects gene expression, and involves immune response genes [5–7], suggesting that it may affect susceptibility of the host to infectious disease. CNV of the killer cell immunoglobulin receptor genes has been shown to affect host control of HIV infection, as determined by the viral load (VL) at setpoint , and we have recently shown association of β-defensin CNV both with HIV viral load at initiation of highly-active anti-retroviral therapy (HAART) and with consequent immune reconstitution .
The genes CCL3L1/CCL4L1 encode the chemokines MIP-1α and MIP-1β which are both ligands for the chemokine receptor CCR5 used as a co-receptor by R5 strains of HIV. These genes show CNV, and this has been shown to affect HIV acquisition, progression to AIDS, and immune reconstitution following highly active anti-retroviral therapy (HAART) [10–12]. An attractive model is that these chemokines and HIV compete for the same receptor CCR5, and that increasing copy number increases the levels of chemokine, thereby increasing competition with HIV for the receptor . A gene dosage effect linking gene copy number and protein levels is needed to support this hypothesis, and evidence has been contradictory. Early studies supported a gene dosage effect [10, 11], but recent studies have suggested that the influence of extra gene copies on total protein levels is low [14, 15]. A problem in these experiments is that the protein product of CCL3 (called MIP1α-LD78α) and CCL3L1 (MIP1α-LD78β) cannot be discriminated using standard antibodies. Thus analyses using antibody-based detection of protein products may not detect a gene dosage effect, particularly given the higher levels of CCL3 transcription and presumably MIP1α-LD78α in the blood. Although both protein isoforms signal through CCR5, only the LD78β isoform can be cleaved by dipetidyl peptidase IV to generate a monocyte attractant and CCR1 agonist [16, 17]. Indeed, functional evidence remains supportive: measuring the chemotactic response of cells to supernatants from lipopolysaccharide-stimulated monocytes from different individuals supports an effect of different CCL3L1 gene copy number . However, other mechanisms for an effect of CCL3L1 copy number can be invisaged, either directly or indirectly by affecting other immunological phenotypes such as the CD4+ cell count.
Attempts at replicating the genetic association of CCL3L1 copy number and HIV susceptibility have yielded contrasting results. A meta-analysis of nine studies has supported an association of lower CCL3L1 with susceptibility to HIV , but this study did not critically analyse the quality of the published data used in the meta-analysis. For example, the use of quantitative PCR to determine CCL3L1 copy number may generate false-positive associations [19–21]. It may be that CCL3L1 and CCL4L1 do not always vary in copy number as a block, which might explain at least some of the heterogeneity in results when different methods are used to determine copy number. However, when more robust reliable methods are applied to large European cohorts there is no evidence of this, suggesting that when measured with sufficient precision and accuracy, CCL3L1 and CCL4L1 covary as a block [22, 23]. In common with most of the literature, we refer to this copy number variation as CCL3L1 copy number variation, but it should be remembered that it also involves CCL4L1 and possibly TBC1D3.
CCL3L1 CNV has also been associated with a variety of other infectious diseases, including tuberculosis , hepatitis B , hepatitis C  and Kawasaki Disease . Such association studies are almost always small, use qPCR to type copy number, not necessarily replicated , and in some cases the reported association is seen only on a background of a particular genotype at another locus. While such studies are based on reasonable hypotheses concerning the function and interaction of proteins and pathogens, the marginal significance levels and limited power of such studies means that drawing definitive conclusions regarding the role of genetic variation remains difficult. In the most technically- and genetically-thorough study to date, a weak suggestive association with protection from anemia in malarial infection was found, but this family-based study too lacked power to detect anything but strong effects .
Evidence from other African studies of CCL3L1 and HIV has been contradictory. In a small Zimbabwean longitudinal cohort, no association of CCL3L1 copy number with HIV status or progression was found . However, analysis of mother-to-child transmission in South Africa suggested that higher copy number was protective against HIV transmission . In this context, we decided to analyse our previously described cohort of HIV patients from Ethiopia and Tanzania for association of CCL3L1 copy number with viral load immediately prior to HAART and immune reconstitution during HAART. African populations are known to have a higher average copy number than European populations [11, 31], due either to natural selection or genetic drift. This has the advantage, in an association study context, of providing a wider range of copy number and therefore a potentially larger gene dosage effect. However, there are significant technical challenges in accurately typing multiallelic copy numbers at this, or indeed other, loci. We decided to use the paralogue ratio test (PRT) to determine copy number, which is the most robust technique available for typing this locus on large cohorts [19, 21].