A data-flow architecture and software environment for high-performance signal and data procesing. The programming environment allows applications coding in a functional high-level language 20 which a compiler 30 converts to a data-flow graph form 40 which a global allocator 50 then automatically partitions and distributes to multiple processing elements 80, or in the case of smaller problems, coding in a data-flow graph assembly language so that an assembler 15 operates directly on an input data-flow graph file 13 and produces an output which is then sent to a local allocator 17 for partitioning and distribution. In the former case a data-flow processor description file 45 is read into the global allocator 50, and in the latter case a data-flow processor description file 14 is read into the assembler 15. The data-flow processor 70 consists of multiple processing elements 80 connected in a three-dimensional bussed packet routing network. Data enters and leaves the processor 70 via input/output devices 90 connected to the processor. The processing elements are designed for implementation in VLSI (Very large scale integration) to provide realtime processing with very large throughput. The modular nature of the computer allows adding more processing elements to meet a range of throughout and reliability requirements. Simulation results have demonstrated high-performance operation, with over 64 million operations per second being attainable using only 64 processing elements.