A computer implemented method, apparatus, and computer usable program code for automatically selecting an optimal control cohort. Attributes are selected based on patient data. Treatment cohort records are clustered to form clustered treatment cohorts. Control cohort records are scored to form potential control cohort members. The optimal control cohort is selected by minimizing differences between the potential control cohort members and the clustered treatment cohorts.