In a non-volatile memory, the initiation of program verification is adaptively set so that programming time is decreased. In one approach, non-volatile storage elements are programmed based on a lower page of data to have a voltage threshold (VTH) that falls within a first VTH distribution or a higher, intermediate VTH distribution. Subsequently, the non-volatile storage elements with the first VTH distribution either remain there, or are programmed to a second VTH distribution, based on an upper page of data. The non-volatile storage elements with the intermediate VTH distribution are programmed to third and fourth VTH distributions. The non-volatile storage elements being programmed to the third VTH distribution are specially identified and tracked. Verification of the non-volatile storage elements being programmed to the fourth VTH distribution is initiated after one of the identified non-volatile storage elements transitions to the third VTH distribution from the intermediate VTH distribution.