New database of more than 83,000 surgical outcomes aimed at advancing research and training artificial intelligence algorithms now online

The dataset is expected to help the research community develop new algorithms and predictive tools to improve the care of surgical patients globally
Electronic medical record

A team of researchers from UCLA and UC Irvine have created a unique repository of electronic health record data and high-fidelity physiological waveform data from tens of thousands of surgeries that will integrate artificial intelligence to improve patient outcomes. 

The project led is by Dr. Maxime Cannesson, professor and chair of anesthesiology and perioperative medicine at the David Geffen School of Medicine at UCLA; and Dr. Pierre Baldi, Distinguished Professor of information and computer sciences and Dr. Joe Rinehart, clinical professor of anesthesiology, both at UC Irvine. It is freely available to legitimate researchers who sign a data use agreement (DUA).

All data in the repository, called the Medical Informatics Operating Room Vitals and Events Repository (MOVER), has been stripped of patient identifiers in accordance with patient privacy laws. It can be downloaded at

The team has published a paper describing the database and its uses in JAMIA Open

“We expect it to help the research community to develop new algorithms, new predictive tools, to improve the care of surgical patients basically globally,” Cannesson said. “It’s the first time a surgical database like this has been released. It’s a very wide spectrum of surgeries.”

The repository, which had been in the works since 2012, fills a gap in publicly accessible databases that researchers can use to train and test AI algorithms. It is intended to advance a wide variety of healthcare research and serve as a resource to evaluate new clinical decision support and monitoring algorithms for patients undergoing surgery and anesthesia.

It contains data, collected over seven years, of hospital visits for patients undergoing surgery at UCI Medical Center, consisting of comprehensive electronic health record and high-fidelity physiological waveforms. Waveforms are data from monitors such as EKGs that measure the physiology of the patient either minute by minute or sometimes in real time, for instance during a high-risk surgical procedure.

Specifically, the dataset contains general information about each patient and their medical history, including details about the surgical procedure, medicines used, lines or drains utilized during the procedures, and postoperative complications. In all, it now contains data from nearly 59,000 patients who underwent about 83,500 surgeries. 

“This information is truly information that physicians and the care team use to make clinical decisions in the acute care setting,” Cannesson said. “Before this there was no single repository where a very, very large volume of data that includes the physiological waveforms are accessible to researchers.”

The MOVER team took the project through a rigorous process to ensure that patient privacy is preserved.

“Patient privacy has been at the forefront of the development of MOVER,” Cannesson said. “It’s been through a lot of de-identification process. There is no patient identifier, no date of surgery. Patients above 90 years old, their age is not available. So it’s been through a lot of de-identification to make sure that no patient identifier is available.”

There is a precedent for sharing datasets like this for patients in the intensive care unit, the largest and most widely known being MIMIC, which also includes de-identified electronic health record patient information and waveforms, he noted. “Our main innovation was to start more than 10 years ago recording these waveforms during surgery,” he said. “This could be helpful to the whole perioperative surgical community.”

At this point the focus is on sharing the UC Irvine information with qualified physicians and researchers, he said. But a National Institutes of Health initiative called “Bridge to AI”, of which UCLA is a part, aims to standardize this data across multiple institutions to eventually create a single repository with the same vocabulary and data architecture. 

It is designed so that the data can be thoroughly checked, achieving transparency. “The goal is eventually to increase the trust that clinicians and patients have with what you are going to see in the near future – the development of more and more artificial intelligence-based models, especially for the surgical setting,” Cannesson said.

The work was supported by the National Institutes of Health (NIH) through the National Institute of Biomedical Imaging and Bioengineering (R01EB029751).