The axiom of reducibility was introduced by Bertrand Russell in the early 20th century as part of his ramified theory of types. Russell devised and introduced the axiom in an attempt to manage the contradictions he had discovered in his analysis of set theory.
With Russell's discovery (1901, 1902) of a paradox in Gottlob Frege's 1879 Begriffsschrift and Frege's acknowledgment of the same (1902), Russell tentatively introduced his solution as "Appendix B: Doctrine of Types" in his 1903 The Principles of Mathematics. This contradiction can be stated as "the class of all classes that do not contain themselves as elements". At the end of this appendix Russell asserts that his "doctrine" would solve the immediate problem posed by Frege, but "there is at least one closely analogous contradiction which is probably not soluble by this doctrine. The totality of all logical objects, or of all propositions, involves, it would seem a fundamental logical difficulty. What the complete solution of the difficulty may be, I have not succeeded in discovering; but as it affects the very foundations of reasoning..."
By the time of his 1908 Mathematical logic as based on the theory of types Russell had studied "the contradictions" (among them the Epimenides paradox, the Burali-Forti paradox, and Richard's paradox) and concluded that "In all the contradictions there is a common characteristic, which we may describe as self-reference or reflexiveness".
In 1903, Russell defined predicative functions as those whose order is one more than the highest-order function occurring in the expression of the function. While these were fine for the situation, impredicative functions had to be disallowed:
A function whose argument is an individual and whose value is always a first-order proposition will be called a first-order function. A function involving a first-order function or proposition as apparent variable will be called a second-order function, and so on. A function of one variable which is of the order next above that of its argument will be called a predicative function; the same name will be given to a function of several variables [etc].
He repeats this definition in a slightly different way later in the paper (together with a subtle prohibition that they would express more clearly in 1913):