Introduction

Low back pain is a major health problem in Western countries [1, 2]. A variety of pathologies can cause low back pain, one of which is degenerative disc disease (DDD) [3]. It has been hypothesised that through disc dehydration, annular tears, and loss of disc height or collapse, DDD can result in abnormal motion of the segment and biomechanical instability causing pain [47].

When conservative treatment fails, patients and health care providers may consider other treatment options such as surgery. Although the rationale for surgery is often not clear and despite the lack of convincing evidence in the literature regarding the effectiveness of surgery in the treatment of symptomatic DDD, the number of surgical procedures performed is continually increasing [8, 9]. For a long time, lumbar fusion (arthrodesis) has been the “gold standard” surgical treatment for DDD. However, long-term results are poor and complications common [4, 10].

An alternative surgical procedure, total disc replacement, has increased in popularity. The purpose of this technique is to restore and maintain spinal segment motion, which is assumed to prevent adjacent level degeneration at the operated levels, while relieving pain [4, 1113].

Replacing a degenerated joint instead of fusing it was considered for the spine due to the success of total knee and hip arthroplasty [5, 14, 15]. The first described total disc replacement was the Fernstorm steelball endprosthesis in the late 1950s [16]. Since that time, multiple disc replacement prostheses have been designed for use in the lumbar spine. A large majority would never be implanted in humans [4, 10, 17]. The first prosthesis designed to be commercially distributed as an artificial disc was initiated in 1982 by Schellnack and Buttner-Janz. Currently, many different lumbar total disc prostheses are available and approved for the European market. In the United States, American Investigational Device Exemption (IDE) trials have let to FDA approval for Charité and Prodisc prostheses.

In this article, we systematically review the available literature on the effectiveness and safety of currently available prostheses for TDR in patients with systematic DDD.

Materials and methods

Objective

The objective of this systematic review was to assess the effectiveness and safety of total disc replacement surgery in patients with chronic low back pain due to DDD. The main research questions were:

  1. 1.

    What is the course of DDD complaints and/or symptoms following total disc replacement surgery?

  2. 2.

    What is the effectiveness of total disc replacement surgery compared to other treatments?

  3. 3.

    What is the safety of total disc replacement surgery?

For this systematic review, we used the method guidelines for systematic reviews as recommended by the Cochrane Back Review Group [18]. Below the search strategy, selection of the studies, data extraction, risk of bias assessment, and data analysis are described in more detail. All these steps were performed by two reviewers independently (KvdE and RO) and during consensus meetings, potential disagreements between the two reviewers regarding these issues were discussed. If they were not resolved a third reviewer (MvT) was consulted.

Search strategy

An experienced librarian performed a comprehensive systematic literature search. The MEDLINE, EMBASE and COCHRANE LIBRARY databases were searched for relevant studies from 1973 to October 2008. The search strategy consisted of a combination of keywords concerning the technical procedure (e.g. disc replacement, prosthesis, implantation, discectomy, arthroplasty) and keywords regarding the anatomical features and pathology (e.g. intervertebral disc degeneration, discitis, low back pain, lumbosacral region, lumbar vertebrae). These keywords were used as MESH headings and free text words. In addition, a search was performed using the specific names of the prostheses. The full search strategy is available upon request.

Selection of studies

The search was limited to studies published in English, German, and Dutch, because these are the languages that the review authors are able to read and to understand. Two review authors independently examined all titles and abstracts that met our search terms and reviewed full publications, when necessary. The reference section of all primary studies was inspected for additional references. For the assessment of the course of complaints and/or symptoms (research question 1), we included prospective cohort studies reporting on at least 20 cases and having a follow up period of more than 6 weeks. By definition, cohort studies do not provide information about effectiveness, so for assessment of the effectiveness (research question 2), we only included randomized controlled trials. When multiple articles were identified on the same study, but describing different follow up measurements, they were included. However, articles describing only one arm of the trial, or only describing the results of 1 centre of a multicentre trial were excluded. For assessing the safety (research question 3), we extracted data on all reported complications from the prospective cohort studies and randomized controlled trials we included for research questions 1 and 2. Furthermore, we included overview studies on complications. Case reports were excluded.

Data extraction

Two review authors independently extracted relevant data from the included studies regarding design, population (e.g. age, gender, duration of complaints), type of total disc replacement surgery, type of control intervention (e.g. no treatment, lumbar fusion), vertebral level(s) operated on, follow-up period, and outcomes. Primary outcomes that were considered relevant were pain intensity [e.g. visual analogue scale (VAS), functional status, e.g. Roland Morris Disability Scale, Oswestry Scale (ODI), global improvement and return to work]. All ODI scores and VAS scores were converted into 0–100 scale. Other outcome measures, such as physiological outcome, radiological outcomes, and patient satisfaction were considered as secondary outcome measures.

Risk of bias (RoB) assessment

Two review authors independently assessed the risk of bias (RoB) of the included studies. Controlled trials were assessed using a criteria list recommended by the Cochrane Back Review Group [18]. The following criteria are scored yes, no or unsure: (1) Was the method of randomization adequate? (2) Was the treatment allocation concealed? (3) Was the patient blinded to the intervention? (4) Was the care provider blinded to the intervention? (5) Was the outcome assessor blinded to the intervention? (6) Was the dropout rate described and acceptable? (7) Were all randomized participants analyses in the group to which they were allocated? (8) Are reports of the study free of suggestion of selective outcome reporting? (9) Were the groups similar at baseline regarding the most important prognostic indicators? (10) Were co-interventions avoided or similar? (11) Was compliance acceptable in all groups? (12) Was the timing of the outcome assessment similar in all groups? Criteria 11 was scored not applicable because we consider compliance not relevant for surgical interventions. If studies met at least 6 of the 12 items, the RoB was considered low. Disagreements were resolved in a consensus meeting and a third review author was consulted when necessary. Full assessment is available upon request. The overall grading of the evidence was based on the GRADE approach [19].

Results

Search and selection

A total of 1,962 references were identified from MEDLINE, EMBASE and the COCHRANE LIBRARY that were potentially relevant for this review on total disc replacement surgery. After checking titles and abstracts, a total of 112 full text articles were retrieved that were potentially eligible for answering all research questions. After reading full text, 21 articles reporting on 16 studies were relevant for answering research question 1, and 16 articles reporting on 3 studies were relevant for research question 2. Seven overview articles for answering research question 3. Figure 1 shows the search strategy process in a flow diagram. Reviewing the reference lists of these articles resulted in no additional studies.

Fig. 1
figure 1

Flow diagram

Type of studies

For assessing the course of DDD complaints and/or symptoms (research question 1) 16 prospective cohort studies were included, describing four different devices, 6 for Charité [2026], 8 for Prodisc [2736], 1 for Maverick [3739], and 1 for Acroflex [40]. For assessing the effectiveness of total disc replacement (research question 2), we identified three randomized controlled trials, all conducted in the USA in order to get FDA approval. Each trial examined a different prostheses for TDR (Charité, and ProDisc and Flexicore) and all used fusion (although different types) as control intervention. No studies comparing TDR surgery to other treatments were found. Two trials (Charité and ProDisc trial) were described in multiple articles [6, 10, 13, 4153]. Given our inclusion and exclusion criteria, we finally included two articles describing the Charité trial; one reporting the 24 months follow up [10] and the other the 5-year follow up [51]. However, 5-year follow-up results were only available of 57% of the originally randomized population. We included one article for the ProDisc trial reporting the 24-month follow-up results [52]. The Flexicore trial was described in one article, which should be considered as preliminary results as the final results of this trial have not yet been published [53] (Table 1).

Table 1 Prospective controlled studies

Assessment of risk of bias

After assessing, the risk of bias of the controlled trials there was 20% disagreement between the two review authors. Consultation of the third reviewer was not necessary because disagreements were resolved in a consensus meeting. For the 24 months follow up, the reporting on the Charité trial was considered to have a low risk of bias. However, the reporting on the 5-year follow up was considered to have a high risk of bias. The reporting on the ProDisc trial was considered to have a high risk of bias as it only met 4 out of the 11 risk of bias criteria. The Flexicore study was also considered to have a high risk of bias as it only met 2 out of 11 risk of bias criteria (Table 2). By design, the prospective cohort studies were not only included in the effectiveness analysis, but also used to describe the course of DDD complaints and/or symptoms after undergoing a TDR surgery.

Table 2 Methodological quality prospective controlled studies

Outcomes

(1) What is the course of DDD complaints and/or symptoms following total disc replacement surgery?

Charité® (Table 3)

The Charité prostheses is the first total disc prostheses, developed by Butter-Janz and Schellnack at the Charité clinic in former East Germany. The CharitéIII became commercially available for the first time in the late 1980s [54, 55]. Lemaire et al. reported 2 articles, respectively, with 51 months follow-up in 1997 [20] and with 11.3 years follow up in 2005 [23] on the same population. These articles report a good or excellent clinical result, respectively, in 85 and 90%. Several other prospective cohort studies report positive results as well on VAS improvement (range 16–66 points), ODI improvement (range 14–51%) and patients’ satisfaction (range 69–92%) [2126].

Table 3 Prospective cohort studies
ProDisc® (Table 3)

The ProDiscI was developed in France by Marnay, who operated on 64 patients and performed a single or multi-level total disc replacement in the beginning of the 1990s [29, 54, 55]. Fifty-five patients were available for follow-up after average 8.7 years. 82.6% of the patients were “completely satisfied” or “satisfied” with the results [29]. ProDiscII, the second generation ProDisc, is reported on in several publications with follow up range 3 months to 2 years [27, 28, 3036]. Primary outcome results suggest being positive, VAS improvement (range 40–62 points) and ODI improvement (range 21–48%). Moreover, the majority of patients seem to be satisfied with the results (range 79–100%).

Maverick® (Table 3)

Huec et al. published several studies on the Maverick device [3739]. At 2-year follow-up improvement was reported in VAS for low back pain and leg pain, decreasing by 44 and 18 points, respectively. Functional status improved as ODI score decreased 20.7%, and an overall improvement in functional status of 25% occurred in 75% of the patients. Since 2003, a prospective controlled trial has been ongoing in the USA [54].

Acroflex® (Table 3)

Fraser et al. [40] conducted two pilot studies and combined the results. The endplates were changed for the second study because of device failure. For the whole group, the functional impairment (ODI) improved from 14.9% and the Low Back Pain Score (LBPS) improved from 17.7 to 33 at 24 months follow-up. 50% of the patients were not working because of their back condition. Due to detection of mechanical failure, the randomized controlled trial has not been carried out.

In conclusion, many studies suggest pain relief, improvement in functional status and patient satisfaction after TDR surgery. The overall outcome is positive, reduction of pain intensity (range 16–66 points) and improvement of functional impairment (range 14–51%). Moreover, the majority of patients seem to be satisfied with the results (range 69–100%). Unfortunately, detailed information on how outcomes were measured was often lacking. Although outcome results from observational studies suggest a positive course after TDR surgery, a drawback is that a significant amount of complications was reported as well (which will be described later), and a control group was lacking in these studies.

(2) What is the effectiveness of total disc replacement surgery compared to other treatments?

Charité® trial (Table 1)

The Charité trial [10, 51], which was designed as a non-inferiority trail, randomized 304 patients to either TDR with the Charité III disc (n = 205) or anterior interbody fusion with BAK cage (n = 99) with a follow-up of 2 and 5 years. The primary outcomes were pain (VAS), functional impairment (ODI), overall clinical success (defined by using four criteria: ≥25% improvement in ODI, device failure, major complications, and neurological deterioration). As a secondary outcome, patient satisfaction was measured. The improvements on pain intensity (−40.6 vs. −34.1) and functional impairment (24.3 vs. 21.6%), for the TDR and the BAK, respectively, did not differ significantly at 2-year follow up. The overall clinical success (indeed statistically tested on non-inferiority) revealed that the Charité group was non-inferior to the lumbar fusion group (57.1 vs. 46.5%; P < 0.0001. P value based on the Blackwelder’s test for equivalence). Patient satisfaction was significantly better in the Charité group (73.7%) compared to the control group (53.1%) (P < 0.002). 5-year results, based on only 57% of the randomized patients and with a high risk of bias, were broadly in line with the 2-year results. At 5-year follow-up, outcomes on the composite score of clinical success showed that the Charité was non-inferior to the lumbar fusion group (57.8 vs. 51.2%; P < 0.04. P value based on the Blackwelder’s test for equivalence) [51]. There were no statistically significant differences in functional impairment and pain intensity. In conclusion, there is low quality evidence (based on one study only with a low risk of bias) that there are no clinically relevant differences on the primary outcome measures between the Charité group and the BAK cage at the 2-year follow up, and there is very low quality evidence (based on 1 study only with a high risk of bias) that there are no clinical relevant differences on the primary outcome measures at the 5-year follow up.

ProDisc® trial (Table 1)

The ProDisc trial [52], which had a high risk of bias, randomized 236 patients to either TDR with the ProDisc device (n = 161) or to anterior lumbar circumferential fusion (using femoral ring allograft and posterolateral fusion with autogenous iliac crest bone graft in combination with pedicle screws) (n = 75). Outcomes were reported with 2-year follow-up. Clinical success was defined using a combination of 10 outcomes as required by the FDA (Oswestry ≥ 15 points, SF-36 improvement, device success, neurologic success and six radiographic outcomes: no migration, no subsidence, no radiolucency, no loss of disc height, fusion status and ROM). Clinical success was statistically significantly better in the ProDisc (54.3%) than the fusion group (40.8%) (P < 0.05). Although this trail was designed as a non-inferiority study, it is unclear what statistical testing is applied. However, there were no significant differences between both groups on the mean functional impairment (−28.9 vs. −22.9%) and pain intensity scores (−39 vs. −32). In conclusion, there is very low quality evidence (based on 1 study only with a high risk of bias and inconsistent findings) for contradictory results on the primary outcome measures at the 2-year follow up for the ProDisc when compared with anterior lumbar circumferential fusion.

Flexicore® trial (Table 1)

The Flexicore trial [53], with a high risk of bias reported the initial results of 76 patients from two clinics involved in a randomized multicentre controlled trial comparing the Flexicore device (n = 44) versus anterior lumbar circumferential fusion (n = 23) with 2 year follow-up. These 76 patients are only a small proportion of all randomized patients (n = 401) included in the complete trail. Overall, dropout rate was high, 33 patients (75%) in the intervention group and 16 patients (70%) in the control group after two years. Improvement in pain intensity (VAS −70 vs. −62) and functional impairment (Oswestry −56 vs. −46%) was slightly better in the Flexicore group than in the fusion group, but the authors did not report whether this difference was statistically significant or not. Because these are preliminary results, in addition to the high risk of bias, we refrain from drawing conclusions based on this study. In general, these results suggest no clinical relevant differences between TDR surgery and fusion techniques and a small overall success rate in both groups (approximately 50%).

(3) What is the safety of total disc replacement surgery?

Although some studies reported no major complications, other cohort studies describe a wide range (1.0–91.0%) of complication rates following TDR. The majority of these studies reported complication rates ranging from 10 to 40%. Complications can be separated into those related to the surgical approach (e.g. vascular injury, nerve root damage, retrograde ejaculation) range from 2.1 to 18.7%, related to the prosthesis (e.g. subsidence, migration, implant displacement, implant failure, end plate fracture) range from 2.0 to 39.3% and related to the treatment (e.g. wound, pain, neuromusculoskeletal) range from 1.9 to 62.0%. General surgical related complications ranged from 1.0 to 14.0%. Reoperation at index level was seen in 1.0–28.6% (Table 4). These reported complication rates and reoperation rates have to be interpreted carefully, because they have been described poorly.

Table 4 Overview complication prospective cohort studies

Below we will describe the complications rates and re-operation rates as found in the three trials. The Flexicore trial [53] report 22.7% complications in the TDR group and 43.5% in the fusion group. Reoperations are reported in both groups; 11.4% for TDR and 26.1% for fusion (Table 5).

Table 5 Overview complications trials

In the Charité trial, overall complication rates published by Blumenthal et al. [10] were 29.1% for TDR and 50.2% for fusion at 2-year follow-up. Device failures necessitating reoperation were reported in 5.4% of patients in de TDR group and 9.1% of patients in the fusion group at 2 year follow-up (Table 5). However, in the FDA report on the Charité trial much higher scores of adverse effects (TDR group 181.9% and fusion group 189.6%) were reported [56]. In an article from McAfee et al. [57], analysing the incidence of reoperations, even higher reoperation rates in the Charité trial are reported (6.3% in de TDR group and 10.1% in the fusion group).

In the ProDisc trial, there was a similar discrepancy between the article and the FDA report. The overall complication rate as reported by Zigler et al. [52] were 7.3% and 6.3% for TDR and fusion, respectively, but in the FDA report on the ProDisc trial much higher scores on adverse events were reported (TDR group 255.5% and fusion group 270.7%) [58]. Reoperation was necessary for 3.7% TDR patients and 5.4% fusion patients according to Zigler et al. The number of patients needed a reoperation was similar in the FDA report; however, the included number of patients in the trial was higher so the percentage of reoperation in the FDA rapport was slightly higher (Table 5).

Geisler et al. [59] analysed only the neurological complications in the Charité trial. The incidence was no higher in patients with the Charité (16.6%) than patients with BAK fusion (17.2%) (P > 0.3). Major neurologic complications in the Charité group (e.g. burning or dysesthetic leg pain, motor deficit in index level, nerve root injury) were reported in 4.9% and in the fusion group (e.g. burning or dysesthetic leg pain, motor deficit at the index level) in 4%. One device related major complication, nerve root injury, was reported in the TDR group.

Leary et al. [60] reported on 18 patients requiring an anterior revision procedure for repositioning or removal of the Charité prosthesis because of complications. Three patients required revision of two levels. One patient had both levels revised in a single procedure, whereas two patients required staged procedures in order to revise both implants. Therefore, 21 implants were revised via 20 anterior procedures in 18 patients. Six revision cases were performed within the early postoperative period (7–14 days), all as a result of implant migration or dislocation. Late revision cases were required in 14 cases (range 3 weeks–4 years) due to implant migration, dislocation, end plate fractures, subsidence or persistent low back pain.

Van Ooij et al. [6165] reported in several publications patients following implantation of the Charité prosthesis who experienced complications. Over the last 10 years, 75 patients with persisting back and leg pain and being unsatisfied with their clinical condition have been seen and analysed. An overview on late complications after TDR: subsidence (n = 39), prosthesis too small (n = 24), adjacent disc degeneration (n = 36), degenerative scoliosis (n = 11), facet joint degeneration on CT scan (n = 25), anterior migration (n = 6), posterior migration (n = 2), breakage metal wire (n = 10), wear (n = 5), severe osteolysis (n = 1), subluxation PE core (n = 1). 46 out of these 75 needed one or more salvage operations after their TDR. Fifteen patients were receiving posterior fusion without removal of the prosthesis. Because of persisting pain, afterwards 4 patients had their prosthesis removed in an additional operation. In 22 patients, 26 prostheses were removed and an anterior and posterior fusion was performed. In addition, seven patients received posterior fusion elsewhere, and in two patients, the disc prosthesis was removed elsewhere. Intraoperatively, the surgeon encountered three times vessel damage. In conclusion, a wide range of complications rates following TDR (1–90.0%) was found in all cohort studies. The majority of the studies reported complication rates ranging from 10 to 40%. Reoperation at index level was reported in 1.0–28.6%. The three randomized controlled trials published overall complication rates range from 7.2 to 28.6% in the TDR group and 6.7 to 50.2% in the fusion group. The overall reoperation rate at the index level ranged from 3.7 to 11.4% in the TDR group and 5.4–26.1% in the fusion group. However, much higher rates were reported in FDA reports on the Charité and ProDisc trials.

Discussion

In this article, we systematically reviewed the available literature on the clinical course, effectiveness, cost-effectiveness, and safety of TDR in patients with symptomatic DDD. Sixteen prospective cohort studies were identified that assessed the course of complaints and symptoms. These studies suggest pain relief, improvement in functional status and patient satisfaction after TDR. However, the quality of reporting on outcomes was often poor, hampering an adequate interpretation. In addition, a significant amount of complications was reported. These cohort studies lacked control group, which is necessary to evaluate effectiveness of TDR. Only three randomized controlled multicentre trials were identified that had assessed the effectiveness of TDR. The results show that there is low quality evidence (based on one study only with a low risk of bias) that there are no clinically relevant differences on the primary outcome measures between the Charité group and the BAK cage at 2 years follow up, and there is very low quality evidence (based on 1 study only with a high risk of bias) that there are no clinically relevant differences on the primary outcome measures at 5 years follow up. Furthermore, there is very low quality evidence (based on one study only with a high risk of bias) for contradictory results on the primary outcome measures for the ProDisc when compared with anterior lumbar circumferential fusion at the 2-year follow up. There is insufficient evidence on the Flexicore, because this trial had a high risk of bias, and should be considered as a preliminary report because it only reported on a small proportion of all included patients who participated in this multi centre trial.

For assessing the complication rate, all reported complications were extracted from the cohort studies and randomized controlled trials included in this review, as well as overview studies on complication rates. A wide range of complications rates following TDR (1–91.0%) was found in the cohort studies. The majority of the studies reported complication rates ranging from 10 to 40%. Reoperation at the index level was reported in 1.0–28.6%. In the three randomized controlled trials, overall complication rates ranged from 7.3 to 29.1% in the TDR group and from 6.3 to 50.2% in the fusion group. The overall reoperation rate at the index level ranged from 3.7 to 11.4% in the TDR group and from 5.4 to 26.1% in the fusion group. However, much higher rates were reported in FDA reports on the Charité and ProDisc trials. No full economic evaluation was identified, so there is no evidence regarding the costeffectiveness of TDR.

The course of DDD complaints and/or symptoms following total disc replacement surgery

We identified 16 prospective cohort studies to evaluate the course of DDD complaints and/or symptoms. The outcome results suggested a positive course after TDR with a high proportion of patients satisfied with the result. However, these studies were of poor methodological quality and detailed information on how outcomes were measured was often lacking. For example, it was often unclear which criteria were used for clinical success and how return to work was measured. Furthermore, another drawback is that a significant amount of complications was reported as well.

Moreover, these results have to be interpreted in light of controversy and limited literature regarding the causal relationship between DDD and chronic low back pain [4]. Boden et al. [66] reported on 67 asymptomatic individuals assessed for DDD with MRI. DDD was seen in 34% of the individuals between 20 and 39 years of age; 59% of individuals between 40 and 59 years of age, and in all but one (93%) between 60 and 80 years of age. Jensen et al. [67] reported on 98 asymptomatic people after MRI and concluded that 64% of these people had an intervertebral disc abnormality. This challenges the rationale of surgery for DDD in the absence of convincing pathological pathways of DDD.

The effectiveness of total disc replacement surgery compared to other treatments

The Flexicore trial should be interpreted with great caution because of the high risk of bias. Of the three randomized controlled trials, the 2-year follow up of the Charité trial was considered to have a low risk of bias. However, the fusion technique with BAK cages and the iliac crest bone autograft used in this trial are techniques that are no longer commonly used because of poor outcomes [6870]. A better comparator would be the circumferential fusion technique which was used in the ProDisc and Flexicore trials. The use of autograft in all three studies may also be criticized as many surgeons now use both recombinant BMP-2 and/or percutaneous pedicle screw fixation when performing lumbar fusion [55]. Use of an inadequate control intervention brings into question the clinical relevance of the results of the three trials. An additional concern is the fact that the literature is still controversial about the superiority of fusion compared to conservative treatment [5, 15, 71, 72]. For this reason, it can be interesting to compare the effectiveness of TDR to conservative treatment. At present, no studies comparing total disc replacement surgery to other treatments have been published.

The three randomized controlled trials selected patients carefully, scrutinizing various contraindications for TDR. Because of this careful selection, the published trials do not provide evidence for the widespread use of TDR in all patients with DDD. The relevance of the clinical outcomes in the Charité and the ProDisc trials can also be challenged. First, modest success rates were observed in both the TDR and the fusion groups. In the Charité trial, only 57.1% of patients with TDR met all 4 criteria for success, when compared with 46.5% in the fusion group (P < 0.0001). In the ProDisc group, only 53.4% of patients with TDR met all 10 FDA criteria for success, when compared with 40.8% in the fusion group (P = 0.0438). Second, in the Prodisc trial, 69.1% of TDR subjects improved by more than 25% on the Oswestry, when compared with 54.9% in the fusion group. In the Charité trial, 63.9% of TDR subjects improved by more than 25% on the Oswestry, compared to 50.5% in the fusion group. The use of the 25% benchmark for improvement should be interpreted against a background of a recently published consensus statement that advocates a 30% improvement in Oswestry as a benchmark for clinically relevant improvement. This recommendation focussed primarily on conservative interventions in a primary care setting. It was suggested that it might be more appropriate to use larger change scores as benchmarks for expensive and risky procedures [73]. Third, one of the purposes of the device implementation is to reduce low back pain whereas the definition of success did not consider pain relief or opioid use. Finally, Oswestry and VAS cannot discriminate between pain that is residual from the iliac crest after fusion surgery versus the lumbar spine. Therefore, Oswestry and VAS may be artificially higher in the fusion group compared with TDR.

The ODI was used in all included RCT’s, but different versions of the ODI were used. Sasso used ODIv2.0 [53]. Blumenthal used the ODIv1.0 and Zigler used the ODI (chiropractic revised version [74]) [75]. Because different versions of the ODI are used, a direct comparison between studies is hampered. Zigler, however, holds the opinion that the differences between the various ODI versions are subtle and, they think, inconsequential [77].

Davidson [77] and Fairbanks [75] hold the opinion that the amendments of this ‘chiropractic revised version’ are major and therefore this version cannot be compared with the official versions of ODI.

The safety of the total disc replacement surgery

Complications have been poorly described in the prospective cohort studies and the randomized controlled trials. It is interesting that the complications rates and reoperation rates are lower in the published articles than in the FDA reports [56, 58]. This illustrates the complexity of reporting on adverse effects. Compared to the journals where the papers were published, apparently the FDA requires exhaustive and detailed reporting of “adverse events” most of which have no relationship to the success or failure of the prosthesis. Complications associated with lumbar fusion include incomplete relief of pain, loss of motion, loss of sagittal balance, pseudoarthrosis, adjacent segment degeneration, and bonegraft donor site complication. However, a separate set of concerns exist in TDR. Wear debris leading to osteolysis and systematic effects, vertebral body damage, posterior migration or extrusion may lead to device failure and serious vascular complications. Prosthesis that fail to adequately replicate the physiologic kinematics of the lumbar spine may predispose the patient to facet joint degeneration. Without true motion preservation, the devices will merely act as interbody spacers with no potential to prevent adjacent level degeneration [78]. Finally, reported complications for TDR show there can be severe and even life threatening, e.g. major vascular injury, major nerve root damage and device failure. However, these complication rates are low [9, 17, 79, 80].

Furthermore, in the two low risks of bias studies [10, 52], the re-operation rates in the TDR group are slightly higher than in the fusion group. However, this has to be balanced against the fact that re-operation procedures for TDR are more complex.

The use of intervertebral disc prostheses as an alternative to spinal fusion has been advocated to preserve segmental motion and to prevent adjacent degeneration. However, there is no consensus on this subject in literature. Some studies suggest adjacent level degeneration is prevented after TDR [6, 12]. However, other studies show adjacent disc degeneration after TDR [61, 81]. This could be the result of the DDD itself, spreading to multiple levels of the spine, and/or be a consequence of stresses on adjacent levels, generated from unphysiological motion and functioning of the disc prosthesis [61]. Moreover, there is little knowledge regarding complications on the long term. Putzier et al. [81] published a retrospective study with 17 years follow-up and reoperation was necessary in 11% of patients. It is important to know more about long-term complications because most operated patients are of relatively young age, between 30 and 50 years. A disc prosthesis used for TDR should survive for at least 40 years. It is very questionable if the lifetime of the designs now available will be that long as little is known about long-term behaviour of biomaterials in the spine. We do know that revision surgery can be dangerous because of adherence to great vessels and the nerve plexus. Studies that review long-term complications and longevity of the prostheses are highly recommended.

Conclusion

There is low quality evidence that there are no clinically relevant differences on the primary outcome measures between the Charité group and the BAK cage at 2 years follow up, and there is very low quality evidence that there are no clinical relevant differences on the primary outcome measures at 5 years follow up. For the ProDisc device, there is very low quality evidence for contradictory results on the primary outcome measures when compared with anterior lumbar circumferential fusion. Furthermore, reported complication rates varied from 1.0 to 91.0% in cohort studies and 7.3 to 29.1% in randomized controlled trials. Still lacking are high quality prospective, controlled, long-term follow-up studies, including a full economic evaluation taking into account all relevant cost when compared with the clinical benefit, and with relevant control groups to establish the efficiency and the longevity of the devices. The existing evidence, specifically regarding long-term effectiveness and/or safety is considered insufficient to justify the widespread use of TDR over fusion for single level degenerative disc. It is recommended that disc replacement surgery at this time only is performed within prospective scientific studies until further documentation of its efficiency is provided.