- Currently the measurement of the quality and the performance of multimedia systems is often evaluated by hand with a small number of images and videos, software and hardware platforms, and a small group of system parameters. This restricts the completeness and the meaningfulness of the measurement and needs a big amount of time and resources. To overcome these limitations, we create a framework for the generation of test sets for recent multimedia workflows (MuTeSys). Based on an abstract description each test case, its transformation to the designated target platforms as well as the operations and parameters to be processed within the evaluation is defined. To control our workflow a combination of Python and Apache Ant is used and leads to the possibility to use various tools in a flexible and purpose-dependent way. To be testament to the usefulness we create a test set of only 900 test cases and use them to detect artefacts as result of video encoding processes of widely distributed video encoding software. The results be compared with the inputs and the difference of the quality is measured. Additionally, a visual inspection of result is performed to find noticeable as well as new artefacts.