Assessing comfort of running footwear reliably is challenging. The purpose of this study was to compare the intra-rater reliability between different assessment types, to calculate intra-individual reliability scores and to evaluate the effect of rater selection based on individual reliability scores on group level reliability. Three assessment types: ranking, Visual Analogue Scale (VAS), and Likert Scale (LS) were provided twice in six separate sessions among 30 participants, who assessed comfort of five shoes after treadmill running. Spearman's rho provided an evaluation of inter-session relative reliability and typical error as a measure of absolute reliability for each assessment type. Ranking (r=0.70, 95% confidence interval [CI] 0.610.78) yielded the highest relative reliability for overall comfort, followed by VAS (r=0.67, 95% CI 0.560.75) and LS (r=0.63, 95% CI 0.520.72), with large-scale overlaps of CIs between assessment types. The same order of assessment types was found for the percentage of reliable raters (r0.7) with 60% in ranking scale, 47% in VAS and 37% in LS. Forming subgroups corresponding to the intra-individual reliability substantially increased group level reliabilities. Based on measures of relative reliability, an extreme reduction in resolution as provided by the ranking from pairwise comparisons seems to be a valuable tool in footwear comfort assessments if assessment time is of minor importance. No preference can be provided for the two investigated rating scales. Besides the assessment type, a selection of the best raters in additional reliability checks seems to be a prerequisite for further comfort-related studies.