This preliminary study investigates the magnitude of concordance, affecting factors and restrictions when radiologists' make annotations on mammographic images. Annotated data is key to the development of artificial intelligence (AI) tools and errors from annotations can reduce the accuracy of these tool. Two highly experienced radiologists (>20 years’ experience) provided annotations as rectangular regions of interest to mark the location of lesions when they read 856 mammographic images with known cancer signs. Mammographic images were resized to same resolution of 1664 × 768 pixels using bilinear interpolation. We calculated Lin’s concordance correlation coefficient (CCC) between the coordinates in x-axis and y-axis of the 4 corners of the overlapped annotations. The two overlapped annotations in different views (cranio-caudal (CC) and medio-lateral oblique (MLO)) were evaluated for agreement between radiologists. The values of Lin’s CCC were classified in four interpretation levels: the ‘almost perfect’, ‘substantial’, ‘moderate’ and ‘poor’ according to McBride's guide (2015). The results demonstrated ‘almost perfect’, ‘substantial’, ‘moderate’ and ‘poor’ concordance in 50.1%, 29.8%, 9.5% and 10.6% of the total overlapped annotations in the MLO view, with 93.1%, 5.6%, 0.3% and 1.0% of the total overlapped annotations in the CC view, respectively. Overall, the radiologists demonstrated stronger concordance when annotating the CC view compared to the MLO. Breast density (BD) also affected the concordance of the radiologists’ annotations with a decrease in the strength of concordance agreement between breast density classifications, from 0-50% BD = higher concordance to 50-100% BD = lower concordance. Our annotation investigation has implications for AI, where delineation of lesions is often the starting point for training data.
|