Apparatus generates data in processor usable form from natural language input data units in different unit categories. Input data categorized into categories generates data units having unit identification data and corresponding unit category data for input to a cascaded plurality of matching processing stages. Each matching processing stage of this cascade except the first uses any unmatched unit category data and group category data from previous matching processing stages in place of matched category data to match the unit and/or group category data with at least one predetermined pattern of unit and/or group category data. New group category data is output for any unit and/or group category matching each predetermined pattern of unit and/or group category data. At least one of the matching processing stages outputs unit data corresponding to matched unit category data as a plurality of variables. At least one of these variables is indexed by another of the variables.