Methods of assessing on-farm animal welfare must be valid, feasible and reliable. Regarding reliability, inter-observer and intra-observer reliability and consistency over time are important aspects. The aim of the present study was to evaluate the consistency over time for several measures and aggregated scores of dairy cattle welfare. Data were gathered on 30 dairy farms at different locations in Austria. The Welfare Quality® Assessment protocol for dairy cows, as well as the rising behaviour and selected indices of stall use were assessed. Each of the two observers assessed 15 farms twice at an interval of four days. 9 out of 22 measures of the Welfare Quality® Assessment protocol showed a Spearman rank correlation coefficient higher than 0,7. Some other measures revealed a low consistency over time. For the qualitative assessment of rising behaviour three out of five scores correlated almost satisfactorily (rS0,69). The different indices of stall use showed only low correlations (‘cow comfort index 0,47 and ‘stall use index 0,42). Reasons for the low consistency over time could be: the small sample size, a low prevalence in the sample, the response to small changes in the environment and a low intra-observer reliability. Consistency improved with the aggregation of single parameters. At criterion level, scores for 9 out of 11 criteria were considered consistent and at principle level this was the case for three out of four principles. The combination of two or more measures, the calculation using threshold values and decision trees are potential reasons for achieving a higher correlation compared to the single measures. In order to obtain a higher consistency over time, a more intensive observer training as well as a redefinition of selected measures might be advisable.