Structured prediction requires large training sets, but crowdsourcing is ineffective— so, existing models ignore visual relationships without sufficient labels.
Our method uses 10 relationship labels to generate training data for any scene graph model!