ํ‹ฐ์Šคํ† ๋ฆฌ ๋ทฐ

๐Ÿ”† ์ด๋ฒˆ ๊ธ€์—์„œ ๋‹ค๋ฃฐ ๋…ผ๋ฌธ์€ 2019๋…„ ICCV์—์„œ ๋ฐœํ‘œ๋œ RelGAN: Multi-Domain Image-to-Image Translation via Relative Attributes์ž…๋‹ˆ๋‹ค. ์„ธ์„ธํ•œ ํŠน์ง•์„ ๊ฐ€์ง€๊ณ  ์žˆ๋Š” ์–ผ๊ตด ๋ถ€์œ„์—์„œ ์›ํ•˜๋Š” ์˜์—ญ๋งŒ ๋ฐ”๊ฟ€ ์ˆ˜ ์žˆ๊ณ  ์–ผ๊ตด ์ •์ฒด์„ฑ์€ ๊ทธ๋Œ€๋กœ ๋ณด์กดํ•˜๋Š” ๊ฒƒ์ด ๋ณธ ๋ชจ๋ธ์˜ ํŠน์ง•์ž…๋‹ˆ๋‹ค.

โญ๏ธ Summary

  • RelGAN์€ relative target attribute๋ฅผ ์ด์šฉํ•ด ์–ผ๊ตด ์˜์—ญ ์ค‘ ๋ณ€ํ™”ํ•˜๊ณ  ์‹ถ์€ ๋‹ค์ค‘ ์†์„ฑ(๋ˆˆ, ๋จธ๋ฆฌ์นด๋ฝ, ์ž…๋ชจ์–‘ ๋“ฑ)๋งŒ์„ ๋ณ€ํ™˜ํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•œ๋‹ค.
  • generator์—๊ฒŒ relative attributes๋ฅผ ํ•™์Šตํ•˜๊ธฐ ์œ„ํ•ด, ์›๋ณธ ์ด๋ฏธ์ง€์™€ ์ƒ์„ฑ๋œ ์ด๋ฏธ์ง€์˜ relative attributes์ด ์ผ์น˜๋˜๋Š”์ง€ ๊ฒฐ์ •ํ•˜๋Š” match-aware discriminator๋ฅผ ์ œ์•ˆํ•œ๋‹ค.
  • ๋ชจ๋“  ์†์„ฑ๋“ค์„ ์™„์ „ํžˆ ํŒŒ์•…ํ•˜์ง€ ์•Š์•„๋„ ๋˜๊ณ  ๊ฐ ์†์„ฑ์ด ์–ด๋–ค ๊ฐ’์œผ๋กœ ๋ณ€ํ™”ํ•˜๋Š”์ง€์— ์ง‘์ค‘ํ•œ๋‹ค.
  • interpolation quality๋ฅผ ํ–ฅ์ƒ์‹œํ‚ค๊ธฐ ์œ„ํ•ด interpolation discriminator๋ฅผ ์ œ์•ˆํ•œ๋‹ค.

๐Ÿญ Method

  • ๋‹จ์ผ generator๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์œผ๋ฉฐ 3๊ฐœ์˜ Discriminator๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๋‹ค.
  • ์ƒ์„ฑ๋œ ์ด๋ฏธ์ง€๊ฐ€ ํ˜„์‹ค์ ์ด์—ฌ์•ผ ํ•˜๋ฉฐ, ํƒ€๊ฒŸ ์†์„ฑ ์ด์™ธ์—๋Š” ์›๋ณธ ์ด๋ฏธ์ง€์—์„œ ๋ณ€ํ™”๋˜์ง€ ์•Š๊ฒŒ ํ•˜๊ณ , ์ƒ์„ฑ๋œ ์ด๋ฏธ์ง€์—์„œ ํƒ€๊ฒŸ ์†์„ฑ์˜ ๋ฐ˜๋Œ€๋กœ ๋ณ€ํ™”ํ–ˆ์„ ๋•Œ ์›๋ณธ ์ด๋ฏธ์ง€์™€ ์ฐจ์ด๊ฐ€ ์—†๊ฒŒ ํ•˜๊ธฐ ์œ„ํ•ด 5๊ฐœ์˜ loss๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค.

๐Ÿฆ„ Future works

  • adversarial learning
  • mask mechanism ํ™œ์šฉ

 

๊ธฐ์กด multi-domain image-to-image translation model์˜ ๋ฌธ์ œ์ 

  • binary attribute์ด๊ธฐ ๋•Œ๋ฌธ์— interpolation ํ€„๋ฆฌํ‹ฐ๊ฐ€ ์ข‹์ง€ ์•Š์Œ
  • interpolation์ด ์ค‘์š”ํ•œ ์ด์œ : ํŠน์„ฑ์˜ ๊ฐ•๋„์— ๋Œ€ํ•œ ์„ธ๋ฐ€ํ•œ ์กฐ์ ˆ(๊ฐˆ์ƒ‰๊ณผ ๊ธˆ๋ฐœ ๋จธ๋ฆฌ์ƒ‰์˜ ๋น„์œจ ๋ฏธ์†Œ/ํ–‰๋ณต์˜ ์ •๋„)์ด ๊ฐ€๋Šฅํ•˜๊ธฐ ๋•Œ๋ฌธ
  • ๋ณ€ํ™˜ํ•ด์•ผํ•  ๊ฒƒ์€ ๋ณ€ํ™˜ํ•˜๋˜ ๊ธฐ์กด ํŠน์„ฑ์€ ๋ณ€ํ•ด์„œ๋Š” ์•ˆ๋จ → ์„ธ๋ฐ€ํ•œ ์ œ์–ด๊ฐ€ ํ•„์š”ํ•จ

 

๊ธฐ์กด ๋ชจ๋ธ๋“ค์˜ ๋ฌธ์ œ์ ๋“ค์„ ๊ทน๋ณตํ•˜๊ธฐ ์œ„ํ•œ ๋ฐฉ๋ฒ•

์ด์ „ ๋ชจ๋ธ์€ input pair๋ฅผ \((x, \hat{a})\)๋กœ ๋‘์—ˆ์œผ๋ฉฐ ์—ฌ๊ธฐ์„œ x๋Š” ์›๋ณธ ์ด๋ฏธ์ง€, \(\hat{a}\)์€ target ์†์„ฑ์ด๋‹ค. ๋ฐ˜๋ฉด์— RelGAN์€ \((x, v)\)๋กœ ์„ค์ •ํ•˜์˜€๋Š”๋ฐ ์—ฌ๊ธฐ์„œ v๋Š” relative attributes์ด๋‹ค. relative attributes๋Š” ์›๋ณธ ์†์„ฑ์—์„œ target ์†์„ฑ๋“ค์„ ๋บ€ ๋ฒกํ„ฐ๋กœ ํ‘œํ˜„๋œ๋‹ค.

$$ v \stackrel{\Delta}{=} \hat{a}-a $$

์œ„์˜ ๊ทธ๋ฆผ์„ ๋ณด๋ฉด target attributes๋Š” ๋ณ€ํ•ด์•ผํ•˜๋Š” hair color๋งŒ 1์„ ์ฃผ์—ˆ๊ณ  smile๋„ 1์„ ์ฃผ์–ด ๋ณ€ํ™˜ํ•ด์•ผํ•˜๋Š” ์ด๋ฏธ์ง€๋„ smile์„ ๋ณด์กดํ•˜๊ฒŒ๋” ํ•˜์˜€๋‹ค. relative attributes๋Š” ๋ณ€ํ•ด์•ผํ•˜๋Š” ์˜์—ญ๋งŒ 1๋กœ ๋‘์–ด hair color๋งŒ 1๋กœ ๋‘” ๊ฒƒ์ด๊ณ  smile ๋˜ํ•œ ๊ทธ๋Œ€๋กœ ๋ณด์กดํ•˜๊ฒŒ ํ•˜๊ธฐ ์œ„ํ•ด 0์œผ๋กœ ๋งคํ•‘๋˜์—ˆ๋‹ค. 1์€ turn on, -1์€ turn off, ๋ณ€ํ™”ํ•˜์ง€ ์•Š์œผ๋ฉด 0์ด๋‹ค. hair color ๊ฐˆ์ƒ‰์„ turn off ํ•ด์ฃผ๊ณ  ๊ธˆ๋ฐœ ๋จธ๋ฆฌ๋ฅผ turn on ํ•ด์ค€ ๊ฒƒ์œผ๋กœ ํ•ด์„ํ•  ์ˆ˜ ์žˆ๋‹ค.

RelGAN์—์„œ interpolation์„ ์ด์šฉํ•ด ์†์„ฑ์˜ ์ •๋„๋ฅผ ์ˆ˜์ •ํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•˜์˜€๋‹ค. ์ด์™ธ์— ๋‹ค๋ฅธ attribute๋“ค์€ 0์œผ๋กœ ๋งคํ•‘ํ•ด ๋ณ€ํ™˜์‹œํ‚ค์ง€ ์•Š์œผ๋ฉฐ ์œ„์˜ ๊ทธ๋ฆผ์—์„œ๋Š” ์›ƒ๋Š” ์ •๋„์™€ ๋‚˜์ด ์ •๋„๋ฅผ ์ด์šฉํ•ด ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ํ•ด๋‹น ์†์„ฑ์„ ์•ฝํ•˜๊ฒŒ ๋ณ€ํ™˜ํ• ์ง€ ๊ฐ•ํ•˜๊ฒŒ ๋ณ€ํ™˜ํ• ์ง€ ์กฐ์ ˆํ•  ์ˆ˜ ์žˆ๋‹ค.

๊ทธ๋ž˜์„œ relative-attribute ๊ธฐ๋ฐ˜์˜ method์ธ RelGAN์„ ์ œ์•ˆํ•œ๋‹ค. RelGAN์€ ๋‹จ์ผ generator G๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ๊ณ  3๊ฐœ์˜ discriminator \(D_{Real}, D_{Match}, D_{Interp}\)๋กœ ๋˜์–ด์žˆ๋‹ค. discriminator๋“ค์€ ๊ฐ๊ฐ G๊ฐ€ ํ˜„์‹ค ์ด๋ฏธ์ง€, relative attributes๋กœ์˜ ์ •ํ™•ํ•œ interpolation, ํ˜„์‹ค์ ์ธ inerplotation ์ด๋ฏธ์ง€๋ฅผ ํ•™์Šตํ•  ์ˆ˜ ์žˆ๊ฒŒ ๊ฐ€์ด๋“œํ•œ๋‹ค.

 

๊ด€๋ จ ๋…ผ๋ฌธ

RelGAN๊ณผ ๊ด€๋ จ์žˆ๋Š” conditional image generation๊ณผ facial attribute transfer์— ์ง‘์ค‘ํ•˜์˜€๋‹ค.

  • GAN: supervised generative model
  • cGAN: text-to-image ํ•ฉ์„ฑ๊ณผ image-to-image ๋ณ€ํ™˜
  • facial attribute transfer: IcGAN, StarGAN, AttGAN(StarGAN+encoder-decoder ๊ตฌ์กฐ๋ฅผ ์‚ฌ์šฉํ•จ), ModularGAN(modular ๊ตฌ์กฐ๋ฅผ ์ œ์•ˆํ•จ), FANimation

 

Method

n๊ฐœ์˜ ์ฐจ์›์˜ attribute vector๋กœ \(a=[a^{(1)}, a^{(2)}, ..., a^{(n)}]^{T}\)๋กœ ์ •์˜ํ•œ๋‹ค. ์—ฌ๊ธฐ์„œ \(a^{(i)}\)๋Š” ์„ฑ๋ณ„, ๋‚˜์ด, ๋จธ๋ฆฌ์ƒ‰๊ณผ ๊ฐ™์€ ์–ผ๊ตด ์ด๋ฏธ์ง€์˜ ์†์„ฑ์ด๋‹ค. ๋ชจ๋ธ์˜ ์ฃผ ๋ชฉ์ ์€ ์›๋ณธ ์ด๋ฏธ์ง€ x๋ฅผ ๋ณ€ํ™˜ํ•ด y ์ด๋ฏธ์ง€๋ฅผ ๋งŒ๋“œ๋Š” ๊ฒƒ์ด๋ฉฐ y๋Š” ํƒ€๊ฒŸ ์†์„ฑ์ด ๋ณ€ํ™”๋œ ์ฑ„ ํ˜„์‹ค์ ์ด์—ฌ์•ผํ•˜๋ฉฐ ํƒ€๊ฒŸ ์†์„ฑ ์ด์™ธ์—๋Š” ์›๋ณธ ์ด๋ฏธ์ง€์—์„œ ๋ณ€ํ™”๋˜์ง€ ์•Š์•„์•ผํ•œ๋‹ค. ์ฆ‰ (x, v)๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ y๊ฐ€ ๋‚˜์˜ค๊ฒŒ ํ•™์Šตํ•˜๋Š” ๊ฒƒ์ด๋‹ค. ์ „์ฒด์ ์ธ ๊ตฌ์กฐ๋Š” ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

Relative Attributes

v๋Š” ์›๋ณธ ์ด๋ฏธ์ง€๊ฐ€ ๊ฐ€์ง€๊ณ  ์žˆ๋Š” ์†์„ฑ \(a\)์—์„œ ๋ณ€ํ™”ํ•  ์†์„ฑ์ธ \(\hat{a}\)๋ฅผ ๋บ€ ๋ฒกํ„ฐ๋ฅผ ์˜๋ฏธํ•œ๋‹ค. ์ด๋ฏธ์ง€ ์†์„ฑ์€ 0๊ณผ 1๋กœ ์ด๋ฃจ์–ด์ ธ ์žˆ์œผ๋ฉฐ relative attribute๋Š” -1, 0, 1๋กœ ์ด๋ฃจ์–ด์ ธ์žˆ๋‹ค. 1์€ turn on, -1์€ turn off, ๋ณ€ํ™”ํ•˜์ง€ ์•Š์œผ๋ฉด 0์ด๋‹ค. 0์—์„œ 1 ์‚ฌ์ด์˜ ๊ฐ’์„ ๊ฐ–๋Š” \(\alpha\)๋ฅผ v์— ๊ณฑํ•ด์„œ attribute์˜ ๊ฐ•๋„๋ฅผ ์กฐ์ ˆํ•˜๋ฉฐ interpolation์ด๋ผ ๋งํ•  ์ˆ˜ ์žˆ๋‹ค.

$$ G(x,\alpha v) $$

Adversarial Loss

D๋Š” GAN์—์„œ ์‚ฌ์šฉ๋˜๋Š” ์ด๋ฏธ์ง€์™€ ๊ฐ€์งœ์ด๋ฏธ์ง€๋ฅผ ๋น„๊ตํ•˜๋Š” ํŒ๋ณ„์ž์ด๋‹ค. E๋Š” ํ‰๊ท ๊ฐ’์„ ์˜๋ฏธํ•œ๋‹ค.

Conditional Adversarial Loss

๋ณธ ๋…ผ๋ฌธ์—์„œ ์ถœ๋ ฅ ์ด๋ฏธ์ง€ \(G(x, v)\) ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ์›๋ณธ ์ด๋ฏธ์ง€ x์™€ \(G(x,v)\)์˜ ์ฐจ์ด๊ฐ€ relative attributes v์™€ ์ผ์น˜ํ•˜๋Š” ๊ฒƒ์„ ๊ณ ๋ คํ•œ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด conditional GANs์™€ conditional discriminator์ธ \(D_{Match}\)๋ฅผ ํ™œ์šฉํ•œ๋‹ค.

  • x, x’: 2๊ฐœ์˜ real image → x์™€ x’๋Š” ๋‹ค๋ฅธ identity์ž„
  • v: relative attribute vector
  • real triplet๊ณผ fake triplet์˜ \(D_{Match}\) loss๋ฅผ ๊ตฌํ•จ

์•„๋ž˜๋Š” conditional adversarial loss๋ฅผ ๊ตฌํ˜„ํ•œ pseudo-code๋‹ค.

Reconstruction Loss

adversarial loss์™€ conditional adversarial loss๋Š” low level์˜ ๋ฐฐ๊ฒฝ ์ด๋ฏธ์ง€๋ถ€ํ„ฐ high level์ธ ์–ผ๊ตด์˜ identity๊นŒ์ง„ ๋ณด์กดํ•ด์ฃผ์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์—, cycle-reconstruction loss์™€ self-reconstruction loss๋ฅผ ์ œ์•ˆํ•˜์˜€๋‹ค.

cycle-reconstruction loss

generator๋ฅผ ์ด์šฉํ•ด ์›๋ณธ ์ด๋ฏธ์ง€๋ฅผ ๋งŒ๋“ค์–ด ์ด๋ฏธ์ง€ ๊ฐ„์˜ ์ฐจ์ด๊ฐ€ ์ ์–ด์ง€๊ฒŒ๋” ํ•™์Šตํ•˜๋Š” ๊ฒƒ์ด๋‹ค. ์›๋ณธ ์ด๋ฏธ์ง€์— v๋ฅผ ํ™œ์šฉํ•ด ์ƒ์„ฑํ•œ ์ด๋ฏธ์ง€ G(x, v)์— ๋ฐ˜๋Œ€์ธ -v๋ฅผ ํ™œ์šฉํ•ด ์›๋ณธ ์ด๋ฏธ์ง€๋กœ ๋‹ค์‹œ ๋งŒ๋“ ๋‹ค. ๊ธฐํ˜ธ๋กœ ์ž‘์„ฑํ•˜๋ฉด G(G(x, v), -v)์ด๋‹ค. ์›๋ณธ์ด๋ฏธ์ง€๊ฐ€ ๋˜๊ฒŒ๋” ์ƒ์„ฑํ•œ G(G(x, v), -v)์™€ ์›๋ณธ ์ด๋ฏธ์ง€ x๋ฅผ 1์ฐจ norm์œผ๋กœ ์ฐจ์ด๋ฅผ ๊ณ„์‚ฐํ•œ๋‹ค. 1์ฐจ norm์˜ ์ˆ˜์‹์€ ์•„๋ž˜์™€ ๊ฐ™๋‹ค. ์ฆ‰, ์›๋ณธ ์ด๋ฏธ์ง€์™€ ์›๋ณธ์ด๋ฏธ์ง€๊ฐ€ ๋˜๊ฒŒ๋” ์ƒ์„ฑํ•œ G(G(x, v), -v)๋ฅผ ๋บด์„œ ์ ˆ๋Œ€๊ฐ’์„ ์”Œ์šด ํ›„ ๋”ํ•ด์ฃผ๋Š” ๊ฒƒ์ด๋‹ค.

$$ ||x||1= \sum{i=1}^{n}|x| $$

Self-reconstruction loss

relative attribute vector๊ฐ€ 0์ผ ๋•Œ(๋ณ€ํ™”ํ•˜์ง€ ์•Š๋Š” ์†์„ฑ์ผ ๋•Œ), output image๋Š” G(x, 0)์ด๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์›๋ณธ ์ด๋ฏธ์ง€์ธ x์™€ ๋น„์Šทํ•ด์•ผ ํ•œ๋‹ค๋Š” ์ ์„ ์ด์šฉํ•ด self-reconstruction loss๋ฅผ ์ •์˜ํ•œ๋‹ค.

Interpolation Loss

์งˆ ์ข‹์€ interpolation์„ ์œ„ํ•ด \(\alpha\) ์ƒ์ˆ˜๋กœ Interpolationํ•œ \(G(x,av)\)๊ฐ€ ํ˜„์‹ค์ ์œผ๋กœ ๋ณด์ด๊ฒŒ ํ•ด์•ผํ•œ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด \(D_{Interp}\)์—์„œ interpolation์˜ ์ •๋„๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” \(\hat{\alpha}\)๋ฅผ ์˜ˆ์ธกํ•œ๋‹ค. \(\hat{\alpha}\)๊ฐ€ 0์ผ๋• interpolation์„ ํ•˜์ง€ ์•Š์€ ๊ฒƒ์ด๊ณ  \(\hat{\alpha}\)๊ฐ€ 0.5์ธ ๊ฒฝ์šฐ ์ตœ๋Œ€ํ•œ interpolationํ•œ ๊ฒƒ์„ ์˜๋ฏธํ•œ๋‹ค. ์™œ ์—ฌ๊ธฐ์„œ 0.5๊ฐ€ ์ตœ๋Œ€๊ฐ€ ๋˜๋Š” ์ด์œ ๋Š” \(\hat{\alpha}\)๊ฐ€ 0์—์„œ 0.5๊นŒ์ง€๋Š” ์›๋ณธ ์ด๋ฏธ์ง€๋ฅผ ๋ณด๊ฐ„ํ•œ ์ •๋„๋กœ ๋ณด๋ฉฐ 0.5์™€ 1 ์‚ฌ์ด์—์„  ํ•ด๋‹น ์†์„ฑ์ด ๋ฐ˜์˜๋œ ์ด๋ฏธ์ง€๋ฅผ ๋ณด๊ฐ„ํ•œ ์ •๋„๋กœ ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.

๋‘๋ฒˆ์งธ ํ•ญ์€ ์›๋ณธ ์ด๋ฏธ์ง€๋ฅผ ์ „ํ˜€ ๋ณด๊ฐ„ํ•˜์ง€ ์•Š์€ ์ด๋ฏธ์ง€์ผ ๋•Œ interpolation ์ •๋„๋ฅผ ์˜ˆ์ธก, ์„ธ๋ฒˆ์งธ ํ•ญ์€ relative attributes v๋ฅผ ๋ฐ˜์˜ํ•œ ์ด๋ฏธ์ง€์ผ ๋•Œ interpolation ์ •๋„๋ฅผ ์˜ˆ์ธกํ•œ ๊ฒƒ์ด๋ฉฐ ๋‘ ํ•ญ ๋ชจ๋‘ \(\hat{\alpha}\)๊ฐ€ 0์ด ๋˜์–ด์•ผํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๋”ฐ๋กœ \(\hat{\alpha}\)๋ฅผ ๋นผ์ฃผ์ง€ ์•Š๋Š”๋‹ค.

์ด ์‹์€ ์•„๋ž˜์™€ ๊ฐ™์€ ์˜๋ฏธ๋กœ ์ž‘์„ฑํ•  ์ˆ˜ ์žˆ๋‹ค. ์•„๋ž˜ ์‹์„ ๋ณด๋ฉด ํ•œ๊ฐœ์˜ ํ•ญ์ด ์ค„์—ˆ๋Š”๋ฐ ๋‘๋ฒˆ์งธ์™€ ์„ธ๋ฒˆ์งธ ํ•ญ์„ ํ•ฉ์ณ์„œ ์ž‘์„ฑํ•˜์˜€๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. II๋Š” ๊ด„ํ˜ธ ์•ˆ์— ์žˆ๋Š” ์ธ์ž๊ฐ€ ์ฐธ์ผ ๊ฒฝ์šฐ(0.5๋ณด๋‹ค ํด ๊ฒฝ์šฐ) 1 ๊ฑฐ์ง“์ผ ๊ฒฝ์šฐ(0.5๋ณด๋‹ค ์ž‘์„ ๊ฒฝ์šฐ) 0์„ ๋ฐ˜ํ™˜ํ•˜๋Š” ํ•จ์ˆ˜์ด๋‹ค.

์ตœ์ข…์ ์œผ๋กœ ์ „์ฒด loss๋Š” ์•„๋ž˜์™€ ๊ฐ™์ด ์ž‘์„ฑํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ \(\lambda_1, \lambda_2, \lambda_3, \lambda_4, \lambda_5\) ๋Š” ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ์ด๋‹ค.

 

Experiments

์‹คํ—˜์„ ์œ„ํ•ด celebA, celebA-HQ, FFHQ๋ฅผ ์‚ฌ์šฉํ•˜์˜€๋‹ค. ์ด๋ฏธ์ง€๋Š” ์–ผ๊ตด์„ ์ค‘์•™์ •๋ ฌํ•˜์—ฌ cropํ•˜์˜€๊ณ  256x256์œผ๋กœ resizeํ•˜์˜€๋‹ค. generator network๋Š” starGAN์„ ํ™œ์šฉํ–ˆ๋‹ค. starGAN์€ down sampling์„ ์œ„ํ•ด stride๊ฐ€ 2์ธ convolution layer 2๊ฐœ, residual block 6๊ฐœ, up-sampling์„ ์œ„ํ•œ stride๊ฐ€ 2์ธ convolution layer 2๊ฐœ๋กœ ๊ตฌ์„ฑ๋˜์–ด์žˆ๋‹ค. generator์—๋Š” switchable normalization์„ ์‚ฌ์šฉํ–ˆ๋‹ค.

discriminator๋Š” \(D_{Real}\), \(D_{Match}\), \(D_{Interp}\) 3๊ฐœ๋กœ sub-network๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์œผ๋ฉฐ, sub-network๋Š” stride๊ฐ€ 2์ธ 6๊ฐœ์˜ convolution layer๋กœ ๊ตฌ์„ฑ๋˜์–ด์žˆ๋‹ค. training ๊ณผ์ •์„ ์•ˆ์ •ํ™”์‹œํ‚ค๊ธฐ ์œ„ํ•ด LSGANs-GP(the Least Squares Generative Adversarial Networks with gradient penalty)๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค.

์„ค์ •ํ•œ ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ๋Š” ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

  • \(\lambda_1=1\), \(\lambda_2=\lambda_3=\lambda_4=10\), \(\lambda_5=10^{-6}\)
  • Adam optimizer with \(\beta_1=0.5\), \(\beta_2=0.999\)
  • learning rate: \(5 \times 10^{-5}\)
  • batch size: 4
  • 100K์˜ iteration (์•ฝ 13.3epochs)

baseline ๋ชจ๋ธ๋กœ multi-domain image-to-image translation์ธ StarGAN, AttGAN ๋ชจ๋ธ์„ ๋น„๊ตํ•˜์˜€๋‹ค.

facial attribute transfer

evaluation metric์€ FID๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค.

  • celebA: 9 attributes
  • celebA-HQ: 9 attributes
  • celebA-HQ: 17 attributes

Classification accuracy

image translation์˜ quality๋ฅผ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•ด facial attribute ๋ถ„๋ฅ˜๊ธฐ๋ฅผ ํ•™์Šต์‹œ์ผฐ๋‹ค. CelebA-HQ dataset์˜ Resnet-18 ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•œ๋‹ค. ํ•™์Šต๊ณผ ํ‰๊ฐ€๋ฅผ ์œ„ํ•ด 9:1๋กœ ๋ฐ์ดํ„ฐ์…‹์„ ๋‚˜๋ˆ„์—ˆ๋‹ค.

Qualitative results

์•„๋ž˜๋Š” facial attribute transferํ•œ ๊ฒฐ๊ณผ ์ด๋ฏธ์ง€์ด๋‹ค. ์›ํ•˜๋Š” ์˜์—ญ๋งŒ ๋ณ€ํ™”ํ•˜๊ณ  ์›๋ณธ ์ด๋ฏธ์ง€์˜ ์ •์ฒด์„ฑ์€ ๊ทธ๋Œ€๋กœ ๋ณด์กด๋œ ๊ฒฐ๊ณผ๋ฅผ ๋ณผ ์ˆ˜ ์žˆ๋‹ค.

์•„๋ž˜ ์ด๋ฏธ์ง€๋Š” baseline ๋ชจ๋ธ์ธ StarGAN๊ณผ AttGAN๊ณผ ๋น„๊ตํ•œ ๊ฒฐ๊ณผ์ด๋‹ค. StarGAN์€ ์–ผ๊ตด์˜ ์ •์ฒด์„ฑ์ด ๋ณด์กด๋˜์ง€ ์•Š๋Š” ๊ฒƒ์ด ๋ณด์ด๊ณ  AttGAN์€ ์›์น˜ ์•Š๋Š” attribute์ธ ์›ƒ๋Š” ๋ชจ์Šต์ด ํ•จ๊ป˜ ์ ์šฉ๋˜์–ด ์žˆ๋Š” ๊ฒƒ์ด ๋ณด์ธ๋‹ค. ๊ทธ์— ๋น„ํ•ด RelGAN์€ ์—ฌ์ž์˜ ์–ผ๊ตด๊ณผ ํ‘œ์ • ๋“ฑ์˜ ๋‹ค๋ฅธ attribute๊ฐ€ ๋ณด์กด๋˜๋ฉด์„œ ๋ณ€ํ˜•ํ•˜๊ณ ์žํ•˜๋Š” attribute๋งŒ ๋ฐ”๊ปด์ง„ ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค.

์•„๋ž˜ ์ด๋ฏธ์ง€๋Š” loss์— ๋Œ€ํ•œ ์‹คํ—˜์ด๋‹ค. ๋งจ ๋ฐ‘ ์ด๋ฏธ์ง€๋“ค์€ ๋ชจ๋“  loss๋ฅผ ํ™œ์šฉํ–ˆ์„ ๋•Œ์˜ ์–ผ๊ตด์ด๋ฉฐ ์ž์—ฐ์Šค๋Ÿฌ์šด ์–ผ๊ตด ์ด๋ฏธ์ง€๋ฅผ ๋งŒ๋“ค์–ด๋‚ด๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค. ์ด์™ธ์—๋Š” loss๋ฅผ ํ•˜๋‚˜์”ฉ ์ œ๊ฑฐํ•˜์—ฌ ์‹คํ—˜์„ ํ–ˆ๋‹ค.

  • ์ฒซ๋ฒˆ์งธ, \(L_{Cycle}+L_{Self}\)๊ฐ€ ์—†์„ ๋•Œ์ด๋ฉฐ ์›๋ณธ ์ด๋ฏธ์ง€์˜ identity๊ฐ€ ๋ณด์กด๋˜์ง€ ์•Š๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.
  • ๋‘๋ฒˆ์งธ, \(L_{Match}\)๊ฐ€ ์—†์„ ๋•Œ์ด๋ฉฐ ๋ณ€ํ™”์‹œํ‚ค๊ณ  ์‹ถ์€ attribute๊ฐ€ ๋ณด์ด์ง€ ์•Š๋Š”๋‹ค.
  • ์„ธ๋ฒˆ์งธ, \(L_{Real}\)๊ฐ€ ์—†์„ ๋•Œ์ด๋ฉฐ ๋ณ€ํ™”์‹œํ‚ค๊ณ  ์‹ถ์€ attribute๋Š” ์–ด๋А์ •๋„ ๋ณ€ํ™”๋์ง€๋งŒ gender๋‚˜ mustache์€ ์ž์—ฐ์Šค๋Ÿฝ์ง€ ๋ชปํ•œ ๊ฒฐ๊ณผ๋ฅผ ๋งŒ๋“ค์–ด ๋‚ด์—ˆ๋‹ค.

Facial Image Reconstruction

RelGAN์˜ ๊ฐ€์žฅ ์ค‘์š”ํ•œ ์žฅ์ ์€ ๋ณ€ํ•ด์•ผํ•˜์ง€ ์•Š๋Š” ์†์„ฑ์€ ๋ณด์กด๋œ๋‹ค๋Š” ๊ฒƒ์ด๋‹ค. ๋ชจ๋“  ์†์„ฑ๋“ค์ด ๋ฐ”๋€Œ์ง€ ์•Š๋Š”๋‹ค๋ฉด(target attribute vector๊ฐ€ ์›๋ณธ attribute vector์™€ ๊ฐ™๋‹ค๋ฉด) facial attribute translation์€ ์›๋ณธ ์ด๋ฏธ์ง€๋ฅผ ๊ทธ๋Œ€๋กœ ๋งŒ๋“ค์–ด๋‚ด๋Š” reconstruction task๊ฐ€ ๋œ๋‹ค. ์ €์ž๋Š” ๋ชจ๋ธ ๋ณ„๋กœ reconstruction task์˜ ์„ฑ๋Šฅ์„ L1, L2 norm๊ณผ SSIM similarity๋ฅผ ํ™œ์šฉํ•ด ๋น„๊ตํ•ด๋ณด์•˜๋‹ค. ์•„๋ž˜ table์—์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋“ฏ์ด \(L_{Cycle}\)์ด ์—†์–ด๋„ StarGAN๊ณผ AttGAN์˜ ์„ฑ๋Šฅ์„ ๋Šฅ๊ฐ€ํ•˜๋Š” ๊ฒฐ๊ณผ๋ฅผ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

Facial Attribute Interpolation

interpolation ์„ฑ๋Šฅ์„ ๋น„๊ตํ•˜๊ธฐ ์œ„ํ•ด StarGAN๊ณผ AttGAN ์ด๋ฏธ์ง€๋ฅผ \(G(x, \alpha a+(a-\alpha) \hat{a})\) ์‹์„ ํ™œ์šฉํ•ด ์ƒ์„ฑํ–ˆ๋‹ค. ์—ฌ๊ธฐ์„œ \(a\)์™€ \(\hat{a}\)๋Š” ์›๋ณธ ์ด๋ฏธ์ง€์™€ target attribute vector๋ฅผ ์˜๋ฏธํ•œ๋‹ค. ์•„๋ž˜ ์ด๋ฏธ์ง€๋ฅผ ๋ดค์„ ๋•Œ RelGAN์ด ๊ฐ€์žฅ ๋ถ€๋“œ๋Ÿฝ๊ณ  ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ interpolationgํ•˜๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค.

์ •๋Ÿ‰์ ์ธ ํ‰๊ฐ€์ง€ํ‘œ๋Š” interpolation quality๋ฅผ ํ™œ์šฉํ–ˆ๋‹ค. SSIM score๋Š” 2๊ฐœ์˜ ์ด๋ฏธ์ง€์˜ similarity๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” ์ธก๋„๋กœ ์‚ฌ์šฉ๋œ๋‹ค. SSIM์€ ๋น›์˜ ๋ฐ๊ธฐ(Luminance), ๋Œ€์กฐ(Contrast), ํ”ฝ์…€๊ฐ’์˜ ๊ตฌ์กฐ์ ์ธ ์ฐจ์ด(Structure) 3๊ฐ€์ง€ ์š”์†Œ๋ฅผ ์ด์šฉํ•ด ์ด๋ฏธ์ง€๋ฅผ ๋น„๊ตํ•œ๋‹ค. ์ด 3๊ฐ€์ง€์˜ ์š”์†Œ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ์ด์œ ๋Š” ์‹ค์ œ ์ธ๊ฐ„์˜ ์‹œ๊ฐ ๊ธฐ๊ด€๊ณผ ์œ ์‚ฌํ•œ ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•ด์„œ๋‹ค.

AttGAN, StarGAN, \(L_{Interp}\)๋ฅผ ์ œ๊ฑฐํ•œ RelGAN, RelGAN์˜

์„ฑ๋Šฅ์„ ๋น„๊ตํ–ˆ์œผ๋ฉฐ ๊ทธ ๊ฒฐ๊ณผ RelGAN์˜ ์„ฑ๋Šฅ์ด ๊ฐ€์žฅ ์ข‹์€ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

 

User Study

Celeba-HQ ์ด๋ฏธ์ง€๋ฅผ ๋žœ๋ค์œผ๋กœ ์ƒ์„ฑํ•˜์—ฌ ์‚ฌ๋žŒ๋“ค์—๊ฒŒ 40๊ฐœ์˜ ์งˆ๋ฌธ์„ ๋‹ตํ•˜๊ฒŒ ํ•˜์˜€๊ณ  ๊ทธ ๊ฒฐ๊ณผ 1๊ฐœ๋ฅผ ์ œ์™ธํ•˜๊ณค RelGAN์ด ๊ฐ€์žฅ ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์ด๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค.


RelGAN์€ relative attributes ๊ธฐ๋ฐ˜์˜ multi-domain image-to-image translation model์„ ์ œ์•ˆํ•œ๋‹ค. facial image translation ์˜์—ญ์—์„œ์˜ ์›๋ณธ ์ด๋ฏธ์ง€์˜ ์–ผ๊ตด ์ •์ฒด์„ฑ์€ ๊ทธ๋Œ€๋กœ ์œ ์ง€ํ•˜๋˜ ํ•„์š”ํ•œ ์˜์—ญ๋งŒ ๋ฐ”๊พธ๋Š” ๊ฒƒ์€ ์‰ฝ์ง€ ์•Š๋‹ค. RelGAN์€ 3๊ฐœ์˜ discriminator์™€ 5๊ฐœ์˜ loss๋ฅผ ์ถ”๊ฐ€ํ•จ์œผ๋กœ์จ ์–ผ๊ตด ์ •์ฒด์„ฑ์„ ์œ ์ง€ํ•˜๊ณ  ์›์น˜ ์•Š์€ ์˜์—ญ์€ ๊ทธ๋Œ€๋กœ ๋ณด์กด๋˜๋Š” ๊ฒฐ๊ณผ๋ฅผ ๋ณด์—ฌ์คฌ๋‹ค. RelGAN์˜ ์ €์ž๋Š” future work๋กœ adversarial learning๊ณผ mask mechanism์„ ํ™œ์šฉํ•ด ๊ณ ๋„ํ™”ํ•  ๊ฒƒ์ด๋ผ ํ•œ๋‹ค.

RelGAN์€ relative attribute๋ผ๋Š” ๊ฐœ๋…์„ ๋„์ž…ํ•ด ์›๋ณธ ์ด๋ฏธ์ง€์—์„œ ํ•„์š”ํ•œ ์˜์—ญ๋งŒ ๋ณ€ํ™˜ํ•˜๋Š” ๊ฒƒ์ด ํฅ๋ฏธ๋กœ์› ๋‹ค. ํŠนํžˆ ์›๋ณธ ์ด๋ฏธ์ง€๋ฅผ ์žฌ ์ƒ์„ฑํ•˜์—ฌ ์ด๋ฏธ์ง€ ๊ฐ„์˜ ์ฐจ์ด๋ฅผ ๋น„๊ตํ•ด ์›๋ณธ ์ด๋ฏธ์ง€์˜ identity๋ฅผ ๋ณด์กดํ•˜๊ธฐ ์œ„ํ•œ ๋…ธ๋ ฅ์ด ์ƒˆ๋กœ์› ๋‹ค. ์ƒ์„ฑ๋œ ์ด๋ฏธ์ง€๋ฅผ ๋ณด๋ฉด ๋ฐฐ๊ฒฝ๊ณผ ๋จธ๋ฆฌ์นด๋ฝ๊ณผ ๊ฐ™์€ ๋‹ค์†Œ ์˜ํ–ฅ์„ ๋งŽ์ด ๋ฐ›์„ ์ˆ˜ ์žˆ๋Š” ์˜์—ญ๋“ค์ด ๊ทธ๋Œ€๋กœ ๋ณด์กด๋˜์–ด ์žˆ๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๋Š”๋ฐ ์–ผ๊ตด์—์„œ ํŠน์ •(๋จธ๋ฆฌ๋งŒ ์—ผ์ƒ‰์„ ํ•˜๊ฑฐ๋‚˜ ์›ƒ๋Š” ์–ผ๊ตด์„ ์šฐ๋Š” ์–ผ๊ตด๋กœ ๋ฐ”๊พผ๋‹ค๊ฑฐ๋‚˜) ์˜์—ญ๋งŒ ๋ฐ”๊ฟ€ ์ˆ˜ ์žˆ์–ด ํ™œ์šฉํ•˜๊ธฐ์—๋„ ์ข‹์„ ๊ฒƒ์œผ๋กœ ์ƒ๊ฐ๋œ๋‹ค.

๊ทธ๋Ÿฌ๋‚˜ ์•„์ง๊นŒ์ง€ ํ”ผ๋ถ€์ƒ‰์€ ๋ณด์กด๋˜์ง€ ๋ชปํ•œ ๊ฒƒ์œผ๋กœ ๋ณด์ธ๋‹ค. reference ์ด๋ฏธ์ง€๊ฐ€ ์›๋ณธ ์ด๋ฏธ์ง€๋ณด๋‹ค ํ•˜์–—๊ฑฐ๋‚˜ ์–ด๋‘์šธ ๊ฒฝ์šฐ ์ƒ‰์ด ๋ณ€ํ•˜๋Š” ๊ฒƒ์ด ๋ณด์ด๋Š”๋ฐ ์ด ์˜์—ญ๋„ attribute์— ์ถ”๊ฐ€ํ•˜๋ฉด ์–ด๋А์ •๋„ ๋ณด์กดํ•  ์ˆ˜ ์žˆ๋Š”์ง€ ๊ถ๊ธˆํ•˜๋‹ค. ํ˜„์žฌ ๊ตฌํ˜„๋˜์–ด ์žˆ๋Š” ์ฝ”๋“œ๋ฅผ ๋ณด๋ฉด pale skin๋„ attribute์˜ ํ•˜๋‚˜๋กœ ๋˜์–ด ์žˆ๋Š”๋ฐ ํ”ผ๋ถ€ํ†ค์„ ๋” ์„ธ๋ถ„ํ™”ํ•œ๋‹ค๋ฉด ํ”ผ๋ถ€ identity๋ฅผ ๋ณด์กดํ•  ์ˆ˜ ์žˆ์ง€ ์•Š์„๊นŒํ•˜๋Š” ์ƒ๊ฐ์ด ๋“ ๋‹ค.

 

์ตœ๊ทผ์— ์˜ฌ๋ผ์˜จ ๊ธ€
ยซ   2025/06   ยป
์ผ ์›” ํ™” ์ˆ˜ ๋ชฉ ๊ธˆ ํ† 
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30
Total
Today
Yesterday