Promoting and Expanding the Structured Reporting of Cancer (SRC)

NLP Colorectal Cancer reports

Page last updated: 24 June 2013

A contract with Professor Jon Patrick of Health Language Laboratories (HLL) was signed in January 2012. The objectives of the project were to:

  1. Create an automatic processor that populates structured reports for colorectal cancer from sample anatomical pathology reports.
  2. Assess the extent to which current colorectal pathology reports are already using the elements of structured reports. The rate of use should be reported as the percentage of the report(s) presented in a structured format ie question and answer format.
  3. Assess the level of completeness of structured reporting for colorectal cancer. Completeness should be reported as a quantitative measure of adherence to the (1) standards only and (2) guidelines in the Colorectal Cancer Structured Pathology Reporting protocols 1st edition.
The stages in this project are
  1. Acquire a corpus of 400 colorectal cancer reports.
  2. Design a Tagset for annotating the reports.
  3. Build a mapping between the Tagset and the Structured report fields.
  4. Train staff in the use of the tagset.
  5. Annotate the corpus using the tagset.
  6. Build a machine learning model for computing the annotation tags;
  7. Run a series of evaluations on different models;
  8. Develop the programme code to extract the descriptive statistics;
  9. Project report writing
38 reports (7 word documents and 31 pdf documents) were submitted from pathologists as a result of a request by QAP and through Pathology Today and SPR newsletters. These were supplied to Prof Patrick in Jan 2012 to initiate the project.

Reports were requested from Cancer Institute NSW (CI NSW) and Cancer Council Victoria for large numbers of colorectal cancer reports. Unfortunately the CI NSW had a problem supplying de-identified reports so the SPR Project Manager volunteered to undertaken the de-identification process. CI NSW required ethics approval before this could be undertaken, therefore the ethics applications and supporting documentation were written and submitted in December 2011 in anticipation of signing of the contract with Prof Patrick.

An issue was reported by Prof Patrick in early February in that he was not able to use any scanned reports but required the reports to be in a txt or excel format. A specific format was not included in the contract nor in any pre-contract conversations. Fortunately Cancer Council Victoria was able to deliver 270 reports in an excel format so that the project could commence. Another 74 reports from Cancer Council Victoria sent to Prof Patrick were pdf or jpeg type documents. It was agreed that Prof Patrick would contract for these and the initial 31 reports to be OCR’d so that they could be used.

Cancer Institute NSW was contacted and clarification was requested on whether they were able to send any reports in an electronic format. They replied to say that they only store the extracted details they need in an electronic format and the whole of the report is only available in a scanned version. Therefore no reports were able to be obtained without considerable additional cost from Cancer Institute NSW.

Cancer Council WA was also contacted in hopes that as they had a more advanced cancer registry system they may be able to supply additional reports in the required format however they were unable to do this in the required format as well. Cancer Control NZ were also contacted and were unable to supply reports in the required format. Cancer Council Victoria was asked to supply additional reports to make up the shortfall and a further 362 additional reports have been supplied to Prof Patrick.

Prof Patrick’s first progress report is attached as follows as part of this report. Stages 1-6 are complete.
  1. Provide a report on the project with the Health Language Laboratories of the University of Sydney for Prof Jon Patrick to build a clinical language processor engine for colorectal cancer and the audit of approximately 4oo cases.
    See above section 11.4(b) NLP Colorectal Cancer reports
  2. Provide a summary of feedback on the uptake and use of the protocols where available
    There are 9 stages to the contracted audit process with Prof Patrick. Stages 1-6 are complete. The report on uptake and use of structured reporting is not available at this time – this will be completed as stage 9.
    Prof Patrick’s first progress report is attached as Appendix X.
  3. Provide a review of all tasks and their current stage of development in this project.
    See section 11.4(e) (i)