According to #77, FiLM conditioning is used for the time embeddings instead of just simple shifting. May I know what is the reason for the +1 added to the scaling?
x = x * (scale + 1) + shift
On a related note, if I wanted to implement classifier-free guidance to do conditional generation, would you recommend implementing the class-conditioning embedding with FiLM in the same way as time? Would appreciate if anyone could point me to a github implemention of classifier-free guidance for DDPM--looks like the original paper does not have open-source code.
Thank you!