A method for managing data. A datum regarding a first patient is received. A first set of relationships is established. The first set of relationships comprises at least one relationship of the datum to at least one additional datum existing in at least one database. A plurality of cohorts to which the first patient belongs is established based on the first set of relationships. Ones of the plurality of cohorts contain corresponding first data regarding the first patient and corresponding second data regarding a corresponding set of additional information. The corresponding set of additional information is related to the corresponding first data. The plurality of cohorts is clustered according to at least one parameter, wherein a cluster of cohorts is formed. A determination is made of which of at least two cohorts in the cluster are closest to each other. The at least two cohorts can be stored.