Viterbi ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ๊ธฐ๋ณธ ์›๋ฆฌ

Viterbi ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ HMM(Hidden Markov Model)์—์„œ ๊ฐ€์žฅ ํ™•๋ฅ ์ด ๋†’์€ ์€๋‹‰ ์ƒํƒœ ์‹œํ€€์Šค๋ฅผ ์ฐพ๊ธฐ ์œ„ํ•œ ๋™์  ํ”„๋กœ๊ทธ๋ž˜๋ฐ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด๋‹ค. ASR์—์„œ๋Š” ๊ด€์ธก๋œ ์Œํ–ฅ ํŠน์ง•(acoustic features)์ด ์ฃผ์–ด์กŒ์„ ๋•Œ, ๊ฐ€์žฅ ํ™•๋ฅ ์ด ๋†’์€ ๋‹จ์–ด๋‚˜ ์Œ์†Œ ์‹œํ€€์Šค๋ฅผ ์ฐพ๋Š” ๋ฐ ์‚ฌ์šฉ๋œ๋‹ค.

Viterbi ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ์ฃผ์š” ๊ตฌ์„ฑ์š”์†Œ

  1. ์ƒํƒœ ๊ณต๊ฐ„: HMM์˜ ๊ฐ€๋Šฅํ•œ ๋ชจ๋“  ์ƒํƒœ๋“ค ($S = {s_1, s_2, …, s_N}$)
  2. ๊ด€์ธก ์‹œํ€€์Šค: ์‹œ๊ฐ„์— ๋”ฐ๋ฅธ ์Œํ–ฅ ํŠน์ง• ๋ฒกํ„ฐ ($O = o_1, o_2, …, o_T$)
  3. ์ƒํƒœ ์ „์ด ํ™•๋ฅ : ํ•œ ์ƒํƒœ์—์„œ ๋‹ค๋ฅธ ์ƒํƒœ๋กœ ์ „์ดํ•  ํ™•๋ฅ  ($a_{ij}$)
  4. ๋ฐฉ์ถœ ํ™•๋ฅ : ํŠน์ • ์ƒํƒœ์—์„œ ๊ด€์ธก๊ฐ’์„ ์ƒ์„ฑํ•  ํ™•๋ฅ  ($b_j(o_t)$)

Viterbi ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ์ˆ˜์‹

  1. ์ดˆ๊ธฐํ™”:

$\delta_1(i) = \pi_i \cdot b_i(o_1)$, $1 \leq i \leq N$ $$\psi_1(i) = 0$$ ์—ฌ๊ธฐ์„œ $\pi_i$๋Š” ์ƒํƒœ $i$์˜ ์ดˆ๊ธฐ ํ™•๋ฅ , $\delta_t(i)$๋Š” ์‹œ๊ฐ„ $t$์—์„œ ์ƒํƒœ $i$์— ๋„๋‹ฌํ•˜๋Š” ์ตœ๋Œ€ ํ™•๋ฅ  ๊ฒฝ๋กœ์˜ ํ™•๋ฅ ์ด๋‹ค.

  1. ์žฌ๊ท€:

$$\delta_t(j) = \max_{1 \leq i \leq N} [\delta_{t-1}(i) \cdot a_{ij}] \cdot b_j(o_t)$$ $$2 \leq t \leq T, 1 \leq j \leq N$$

$$\psi_t(j) = \arg\max_{1 \leq i \leq N} [\delta_{t-1}(i) \cdot a_{ij}]$$

  1. ์ข…๋ฃŒ:

$$P^* = \max_{1 \leq i \leq N} [\delta_T(i)]$$ $$q_T^* = \arg\max_{1 \leq i \leq N} [\delta_T(i)]$$

  1. ๊ฒฝ๋กœ ์—ญ์ถ”์ :

$$q_t^* = \psi_{t+1}(q_{t+1}^*), t = T-1, T-2, …, 1$$

Viterbi์˜ ๊ตฌ์ฒด์  ์˜ˆ์‹œ

๋ฌด์Šจ ๋ง์ธ์ง€ ๋ชจ๋ฅด๊ฒ ๋‹ค๋ฉด ์˜ˆ์‹œ๋ฅผ ํ†ตํ•ด ์ข€ ๋” ์‰ฝ๊ฒŒ ์ดํ•ดํ•  ์ˆ˜ ์žˆ๋‹ค. ์Œ์„ฑ ์ธ์‹์—์„œ “๋‚˜๋Š”” ์ด๋ผ๋Š” ๋‹จ์–ด๋ฅผ ์ธ์‹ํ•˜๋Š” ๊ฐ„๋‹จํ•œ ์˜ˆ๋ฅผ ํ†ตํ•ด Viterbi ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์•Œ์•„๋ณด์ž.

  1. ๋ฌธ์ œ ์„ค์ •: ์ธ์‹ํ•  ๋‹จ์–ด: “๋‚˜๋Š”” ์Œ์†Œ ๋ถ„ํ•ด: /ใ„ด/, /ใ…/, /ใ„ด/, /ใ…ก/, /ใ„ด/ ๊ฐ ์Œ์†Œ๋Š” 3๊ฐœ์˜ ์ƒํƒœ๋ฅผ ๊ฐ€์ง„ HMM์œผ๋กœ ๋ชจ๋ธ๋ง (์‹œ์ž‘, ์ค‘๊ฐ„, ๋) ๊ด€์ธก ์‹œํ€€์Šค: 5๊ฐœ์˜ ์Œํ–ฅ ํŠน์ง• ๋ฒกํ„ฐ $O = {o_1, o_2, o_3, o_4, o_5}$
  2. HMM ํŒŒ๋ผ๋ฏธํ„ฐ: ์ƒํƒœ ์ง‘ํ•ฉ: /ใ„ด1/, /ใ„ด2/, /ใ„ด3/, /ใ…1/, /ใ…2/, /ใ…3/, /ใ„ด1’/, /ใ„ด2’/, /ใ„ด3’/, /ใ…ก1/, /ใ…ก2/, /ใ…ก3/, /ใ„ด1’’/, /ใ„ด2’’/, /ใ„ด3’’/ (์ด 15๊ฐœ ์ƒํƒœ)

์ „์ด ํ™•๋ฅ  (์ผ๋ถ€ ์˜ˆ์‹œ): $$a_{/ใ„ด1/, /ใ„ด2/} = 0.7$$ (์ฒซ /ใ„ด/์˜ ์ฒซ ์ƒํƒœ์—์„œ ๋‘ ๋ฒˆ์งธ ์ƒํƒœ๋กœ) $$a_{/ใ„ด2/, /ใ„ด3/} = 0.8$$ $$a_{/ใ„ด3/, /ใ…1/} = 0.9$$ (์ฒซ /ใ„ด/์˜ ๋ ์ƒํƒœ์—์„œ /ใ…/์˜ ์ฒซ ์ƒํƒœ๋กœ)

๋ฐฉ์ถœ ํ™•๋ฅ  (t=1 ์‹œ์ ์˜ ์˜ˆ์‹œ): $$b_{/ใ„ด1/}(o_1) = 0.4$$ $$b_{/ใ…1/}(o_1) = 0.1$$ $$b_{/ใ…ก1/}(o_1) = 0.05$$

  1. Viterbi ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์‹คํ–‰:

์ดˆ๊ธฐํ™” (t=1): $$\delta_1(/ใ„ด1/) = \pi_{/ใ„ด1/} \cdot b_{/ใ„ด1/}(o_1) = 1.0 \times 0.4 = 0.4$$ (์ฒซ ์Œ์†Œ์˜ ์ฒซ ์ƒํƒœ๋กœ ์‹œ์ž‘ํ•œ๋‹ค๊ณ  ๊ฐ€์ •) ๋‹ค๋ฅธ ๋ชจ๋“  ์ƒํƒœ์˜ ์ดˆ๊ธฐ ํ™•๋ฅ ์€ 0

t=2 ๊ณ„์‚ฐ: $$\delta_2(/ใ„ด2/) = \delta_1(/ใ„ด1/) \cdot a_{/ใ„ด1/, /ใ„ด2/} \cdot b_{/ใ„ด2/}(o_2)$$ $$= 0.4 \times 0.7 \times 0.3 = 0.084$$ $$\delta_2(/ใ„ด1/) = \delta_1(/ใ„ด1/) \cdot a_{/ใ„ด1/, /ใ„ด1/} \cdot b_{/ใ„ด1/}(o_2)$$ $$= 0.4 \times 0.2 \times 0.25 = 0.02$$

t=3 ๊ณ„์‚ฐ: $$\delta_3(/ใ„ด3/) = \delta_2(/ใ„ด2/) \cdot a_{/ใ„ด2/, /ใ„ด3/} \cdot b_{/ใ„ด3/}(o_3)$$ $$= 0.084 \times 0.8 \times 0.5 = 0.0336$$ $$\delta_3(/ใ„ด2/) = \max[\delta_2(/ใ„ด1/) \cdot a_{/ใ„ด1/, /ใ„ด2/}, \delta_2(/ใ„ด2/) \cdot a_{/ใ„ด2/, /ใ„ด2/}] \cdot b_{/ใ„ด2/}(o_3)$$ $$= \max[0.02 \times 0.7, 0.084 \times 0.1] \times 0.45$$ $$= \max[0.014, 0.0084] \times 0.45 = 0.014 \times 0.45 = 0.0063$$

๊ณ„์†ํ•ด์„œ t=4, t=5๊นŒ์ง€ ๊ณ„์‚ฐ:
๋งˆ์ฐฌ๊ฐ€์ง€ ๋ฐฉ์‹์œผ๋กœ ๋ชจ๋“  ๊ฐ€๋Šฅํ•œ ์ƒํƒœ ์ „์ด์— ๋Œ€ํ•ด ํ™•๋ฅ ์„ ๊ณ„์‚ฐํ•œ๋‹ค.

์ตœ์ข… ๊ฒฐ๊ณผ:
๊ฐ€์ •๋œ ๊ฐ’๋“ค๋กœ ๊ณ„์‚ฐ์„ ์™„๋ฃŒํ•˜๋ฉด, ๊ฐ€์žฅ ๋†’์€ ํ™•๋ฅ ์„ ๊ฐ€์ง„ ์ƒํƒœ ์‹œํ€€์Šค: /ใ„ด1/ โ†’ /ใ„ด2/ โ†’ /ใ„ด3/ โ†’ /ใ…1/ โ†’ /ใ…2/ โ†’ /ใ…3/ โ†’ /ใ„ด1’/ โ†’ …

๋ฐฉ์ถœ ํ™•๋ฅ ($b_j(o_t)$), ์ƒํƒœ ์ „์ด ํ™•๋ฅ ($a_{ij}$)

๊ทธ๋ ‡๋‹ค๋ฉด ๋Œ€์ฒด ‘๋ฐฉ์ถœ ํ™•๋ฅ ($b_j(o_t)$)‘๊ณผ ‘์ƒํƒœ ์ „์ด ํ™•๋ฅ ($a_{ij}$)’ ์ด๋ผ๋Š” ๊ฒƒ์€ ์–ด๋–ป๊ฒŒ ๊ตฌํ• ๊นŒ?!

์ƒํƒœ ์ „์ด ํ™•๋ฅ ($a_{ij}$) ๊ตฌํ•˜๊ธฐ

  1. ์ „๋ฌธ๊ฐ€ ์ง€์‹ ๊ธฐ๋ฐ˜ ์ดˆ๊ธฐํ™”

์‚ฌ์šฉ ์‹œ์ : ๋ชจ๋ธ ํ•™์Šต ์‹œ์ž‘ ์ „ ์ดˆ๊ธฐ๊ฐ’ ์„ค์ •

  • 3-์ƒํƒœ left-to-right HMM ๊ตฌ์กฐ์—์„œ:
  • ์ž๊ธฐ ๋ฃจํ”„(self-loop): $a_{ii} \approx 0.6$
  • ๋‹ค์Œ ์ƒํƒœ๋กœ ์ „์ด: $a_{i,i+1} \approx 0.4$ Kaldi์˜ topo ํŒŒ์ผ์— ์ด๋Ÿฌํ•œ ์ดˆ๊ธฐ๊ฐ’ ์ •์˜
  1. Baum-Welch ์•Œ๊ณ ๋ฆฌ์ฆ˜ (EM ๊ธฐ๋ฐ˜)

์‚ฌ์šฉ ์‹œ์ : ๋ชจ๋ธ ํ•™์Šต ๊ณผ์ •

  • Baum-Welch ์•Œ๊ณ ๋ฆฌ์ฆ˜ (EM ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ HMM ๋ฒ„์ „)
  • E(Expectation)-๋‹จ๊ณ„: ์ „๋ฐฉ($\alpha$)/ํ›„๋ฐฉ($\beta$) ํ™•๋ฅ  ๊ณ„์‚ฐ
  • ํ†ต๊ณ„ ์ˆ˜์ง‘: $\xi_t(i,j)$ (์‹œ๊ฐ„ $t$์— ์ƒํƒœ $i$, ์‹œ๊ฐ„ $t+1$์— ์ƒํƒœ $j$์— ์žˆ์„ ํ™•๋ฅ )
  • M(Maximization)-๋‹จ๊ณ„: ์ƒํƒœ ์ „์ด ํ™•๋ฅ  ์—…๋ฐ์ดํŠธ $$a_{ij} = \frac{\sum_{t=1}^{T-1} \xi_t(i,j)}{\sum_{t=1}^{T-1} \gamma_t(i)}$$

์—ฌ๊ธฐ์„œ:
$\xi_t(i,j)$: ์‹œ๊ฐ„ $t$์— ์ƒํƒœ $i$, ์‹œ๊ฐ„ $t+1$์— ์ƒํƒœ $j$์— ์žˆ์„ ํ™•๋ฅ 
$\gamma_t(i)$: ์‹œ๊ฐ„ $t$์— ์ƒํƒœ $i$์— ์žˆ์„ ํ™•๋ฅ 

  1. ๊ฐ•์ œ ์ •๋ ฌ(Forced Alignment) ๊ธฐ๋ฐ˜

์‚ฌ์šฉ ์‹œ์ : ๋ชจ๋ธ ์„ธ๋ จํ™” ๋ฐ ์ •์ œ ๋‹จ๊ณ„

  • ์Œ์„ฑ-ํ…์ŠคํŠธ ์Œ์ด ์žˆ๋Š” ํ•™์Šต ๋ฐ์ดํ„ฐ ์ค€๋น„
  • ํ˜„์žฌ ๋ชจ๋ธ๋กœ ๋ฐœํ™”๋ฅผ ์•Œ๋ ค์ง„ ํ…์ŠคํŠธ์™€ ๊ฐ•์ œ ์ •๋ ฌ
  • ์ƒํƒœ ์‹œํ€€์Šค๋ฅผ ์นด์šดํŠธํ•˜์—ฌ ์ „์ด ํ™•๋ฅ  ๊ณ„์‚ฐ(์นด์šดํŠธ๋ฅผ ์ •๊ทœํ™”ํ•˜์—ฌ ํ™•๋ฅ  ๊ณ„์‚ฐ): $$a_{ij} = \frac{์นด์šดํŠธ(์ƒํƒœ i์—์„œ j๋กœ ์ „์ด)}{์นด์šดํŠธ(์ƒํƒœ i์—์„œ์˜ ๋ชจ๋“  ์ „์ด)}$$

๋ฐฉ์ถœ ํ™•๋ฅ ($b_j(o_t)$) ๊ตฌํ•˜๊ธฐ

GMM์€ ์—ฌ๋Ÿฌ ๊ฐ€์šฐ์‹œ์•ˆ ๋ถ„ํฌ์˜ ๊ฐ€์ค‘ ํ•ฉ์œผ๋กœ ํ‘œํ˜„๋˜๋Š” ํ™•๋ฅ  ๋ฐ€๋„ ํ•จ์ˆ˜๋‹ค. HMM-GMM ์‹œ์Šคํ…œ์—์„œ๋Š” ๊ฐ HMM ์ƒํƒœ์˜ ๋ฐฉ์ถœ ํ™•๋ฅ ์„ GMM์œผ๋กœ ๋ชจ๋ธ๋งํ•œ๋‹ค:
๊ฐ HMM ์ƒํƒœ $j$์˜ ๋ฐฉ์ถœ ํ™•๋ฅ ์€ ๊ฐ€์šฐ์‹œ์•ˆ ํ˜ผํ•ฉ ๋ชจ๋ธ๋กœ ํ‘œํ˜„:
$b_j(o_t) = \sum_{m=1}^M c_{jm} \mathcal{N}(o_t; \mu_{jm}, \Sigma_{jm})$

$c_{jm}$: $m$๋ฒˆ์งธ ๊ฐ€์šฐ์‹œ์•ˆ ์ปดํฌ๋„ŒํŠธ์˜ ๊ฐ€์ค‘์น˜ (๋ชจ๋“  ๊ฐ€์ค‘์น˜ ํ•ฉ์€ 1)
$\mu_{jm}$: $m$๋ฒˆ์งธ ๊ฐ€์šฐ์‹œ์•ˆ์˜ ํ‰๊ท  ๋ฒกํ„ฐ
$\Sigma_{jm}$: $m$๋ฒˆ์งธ ๊ฐ€์šฐ์‹œ์•ˆ์˜ ๊ณต๋ถ„์‚ฐ ํ–‰๋ ฌ
$\mathcal{N}(o_t; \mu, \Sigma)$: ํ‰๊ท  $\mu$, ๊ณต๋ถ„์‚ฐ $\Sigma$๋ฅผ ๊ฐ€์ง„ ๋‹ค๋ณ€๋Ÿ‰ ๊ฐ€์šฐ์‹œ์•ˆ ๋ฐ€๋„ ํ•จ์ˆ˜
(๋‚˜์ค‘์— ๋”ฐ๋กœ GMM์— ๋Œ€ํ•ด์„œ ํฌ์ŠคํŒ…์„ ํ•˜๋“ ์ง€ ํ•ด์•ผ๊ฒ ๋‹ค. ์ด ์ˆ˜์‹๋งŒ ๋ด์„œ๋Š” ์–ด๋–ป๊ฒŒ ๊ตฌํ•  ์ˆ˜ ์žˆ๋Š” ๊ฒƒ์ธ์ง€ ๊ฐ์ด ์•ˆ ์žกํžŒ๋‹ค. ๋‹ค๋งŒ ์•„๋ž˜ ํ•™์Šต ๊ณผ์ •์œผ๋กœ ์ด๋ฃจ์–ด ์ง€๋Š” ๊ฒƒ์„ ์•Œ๊ณ  ๋„˜์–ด๊ฐ€์ž)

๋ฐฉ์ถœ ํ™•๋ฅ  ํ•™์Šต๋ฐฉ๋ฒ•

  1. GMM ์ดˆ๊ธฐํ™”

์‚ฌ์šฉ ์‹œ์ : ๋ชจ๋ธ ํ•™์Šต ์‹œ์ž‘ ์ „

๋ฐฉ๋ฒ•:

  • ๊ฐ ์ƒํƒœ์— ํ• ๋‹น๋œ ํŠน์ง• ๋ฒกํ„ฐ์˜ k-means ํด๋Ÿฌ์Šคํ„ฐ๋ง
  • ์ดˆ๊ธฐ GMM ์ปดํฌ๋„ŒํŠธ ์ƒ์„ฑ (ํ‰๊ท , ๊ณต๋ถ„์‚ฐ, ๊ฐ€์ค‘์น˜)
  • Kaldi์—์„œ๋Š” gmm-init-mono ๋“ฑ์˜ ๋ช…๋ น์–ด๋กœ ๊ตฌํ˜„
  1. Baum-Welch ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๋‚ด GMM ์—…๋ฐ์ดํŠธ

์‚ฌ์šฉ ์‹œ์ : ๋ชจ๋ธ ํ•™์Šต ๊ณผ์ •

๋ฐฉ๋ฒ•:

  • E-๋‹จ๊ณ„์—์„œ ๊ฐ ํ”„๋ ˆ์ž„์˜ ์ƒํƒœ ์†Œ์† ํ™•๋ฅ  $\gamma_t(j)$ ๊ณ„์‚ฐ
  • ๊ฐ ๊ฐ€์šฐ์‹œ์•ˆ ์ปดํฌ๋„ŒํŠธ์— ๋Œ€ํ•œ ์ฑ…์ž„ ํ™•๋ฅ ($\gamma_t(j,m)$) ๊ณ„์‚ฐ: $$\gamma_t(j,m) = \gamma_t(j) \cdot \frac{c_{jm} \mathcal{N}(o_t; \mu_{jm}, \Sigma_{jm})}{\sum_{k=1}^M c_{jk} \mathcal{N}(o_t; \mu_{jk}, \Sigma_{jk})}$$ GMM ํŒŒ๋ผ๋ฏธํ„ฐ ์—…๋ฐ์ดํŠธ:
    ๊ฐ€์ค‘์น˜: $c_{jm} = \frac{\sum_{t=1}^T \gamma_t(j,m)}{\sum_{t=1}^T \gamma_t(j)}$
    ํ‰๊ท : $\mu_{jm} = \frac{\sum_{t=1}^T \gamma_t(j,m) \cdot o_t}{\sum_{t=1}^T \gamma_t(j,m)}$
    ๊ณต๋ถ„์‚ฐ: $\Sigma_{jm} = \frac{\sum_{t=1}^T \gamma_t(j,m) \cdot (o_t - \mu_{jm})(o_t - \mu_{jm})^T}{\sum_{t=1}^T \gamma_t(j,m)}$
  1. ์ •๋ ฌ ๊ธฐ๋ฐ˜ GMM ์„ธ๋ จํ™”

์‚ฌ์šฉ ์‹œ์ : ๋ชจ๋ธ ์„ธ๋ จํ™” ๋‹จ๊ณ„

๋ฐฉ๋ฒ•:

  • ๊ฐ•์ œ ์ •๋ ฌ๋กœ ํŠน์ง• ๋ฒกํ„ฐ๋ฅผ HMM ์ƒํƒœ์— ํ• ๋‹น
  • ์ •๋ ฌ๋œ ๋ฐ์ดํ„ฐ๋กœ ๋” ๋ณต์žกํ•œ GMM ํ•™์Šต (์˜ˆ: ํ˜ผํ•ฉ ์ˆ˜ ์ฆ๊ฐ€)
  • Kaldi์—์„œ๋Š” gmm-acc-stats-ali์™€ gmm-est๋กœ ๊ตฌํ˜„

์ฑ…์ž„ ํ™•๋ฅ  ์ˆ˜์‹

$$\gamma_t(j,m) = \gamma_t(j) \cdot \frac{c_{jm} \mathcal{N}(o_t; \mu_{jm}, \Sigma_{jm})}{\sum_{k=1}^M c_{jk} \mathcal{N}(o_t; \mu_{jk}, \Sigma_{jk})}$$ ์ด ์ˆ˜์‹์€ ์ƒํƒœ $j$์˜ $m$๋ฒˆ์งธ ๊ฐ€์šฐ์‹œ์•ˆ ์ปดํฌ๋„ŒํŠธ๊ฐ€ ๊ด€์ธก๊ฐ’ $o_t$๋ฅผ ์ƒ์„ฑํ•  ์ฑ…์ž„(responsibility) ์„ ๋‚˜ํƒ€๋‚ธ๋‹ค. ์ฆ‰, ๊ด€์ธก๊ฐ’ $o_t$๊ฐ€ ์ƒํƒœ $j$์—์„œ ์ƒ์„ฑ๋˜์—ˆ๋‹ค๋Š” ์กฐ๊ฑดํ•˜์—, ๊ทธ ์ค‘์—์„œ๋„ $m$๋ฒˆ์งธ ๊ฐ€์šฐ์‹œ์•ˆ ์ปดํฌ๋„ŒํŠธ์—์„œ ์ƒ์„ฑ๋˜์—ˆ์„ ํ™•๋ฅ ์ด๋‹ค.

๋ฐฉ์ถœ ํ™•๋ฅ  ์ˆ˜์‹

$$b_j(o_t) = \sum_{m=1}^M c_{jm} \mathcal{N}(o_t; \mu_{jm}, \Sigma_{jm})$$ ์ด ์ˆ˜์‹์€ ์ƒํƒœ $j$์—์„œ ๊ด€์ธก๊ฐ’ $o_t$๋ฅผ ์ƒ์„ฑํ•  ๋ฐฉ์ถœ ํ™•๋ฅ (emission probability) ์„ ๋‚˜ํƒ€๋‚ธ๋‹ค. GMM์œผ๋กœ ๋ชจ๋ธ๋ง๋œ ํ™•๋ฅ  ๋ฐ€๋„ ํ•จ์ˆ˜์ด๋‹ค.

๋‘ ์ˆ˜์‹์˜ ๊ด€๊ณ„

์ค‘์š”ํ•œ ์ ์€ ์ฒซ ๋ฒˆ์งธ ์ˆ˜์‹์˜ ๋ถ„๋ชจ๊ฐ€ ๋ฐ”๋กœ ๋‘ ๋ฒˆ์งธ ์ˆ˜์‹๊ณผ ๊ฐ™๋‹ค๋Š” ๊ฒƒ์ด๋‹ค:
$$\sum_{k=1}^M c_{jk} \mathcal{N}(o_t; \mu_{jk}, \Sigma_{jk}) = b_j(o_t)$$ ๋”ฐ๋ผ์„œ ์ฒซ ๋ฒˆ์งธ ์ˆ˜์‹์€ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋‹ค์‹œ ์“ธ ์ˆ˜ ์žˆ๋‹ค:

$$\gamma_t(j,m) = \gamma_t(j) \cdot \frac{c_{jm} \mathcal{N}(o_t; \mu_{jm}, \Sigma_{jm})}{b_j(o_t)}$$ ์ด๋Š” Baum-Welch ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ E-๋‹จ๊ณ„์—์„œ ๊ณ„์‚ฐ๋˜๋Š” ๊ฐ’์œผ๋กœ:
$\gamma_t(j)$: ์‹œ๊ฐ„ $t$์— ์ƒํƒœ $j$์— ์žˆ์„ ํ™•๋ฅ 
$\frac{c_{jm} \mathcal{N}(o_t; \mu_{jm}, \Sigma_{jm})}{b_j(o_t)}$: ์ƒํƒœ $j$ ๋‚ด์—์„œ $m$๋ฒˆ์งธ ๊ฐ€์šฐ์‹œ์•ˆ ์ปดํฌ๋„ŒํŠธ์˜ ๊ธฐ์—ฌ๋„ ๋‘ ์ˆ˜์‹์€ ๋‹ค๋ฅธ ์˜๋ฏธ๋ฅผ ๊ฐ€์ง€์ง€๋งŒ, GMM ํŒŒ๋ผ๋ฏธํ„ฐ ํ•™์Šต ๊ณผ์ •์—์„œ ์„œ๋กœ ์—ฐ๊ด€๋˜์–ด ์‚ฌ์šฉ๋œ๋‹ค:

  1. ๋ฐฉ์ถœ ํ™•๋ฅ  $b_j(o_t)$๋Š” HMM์˜ ๊ธฐ๋ณธ ๊ตฌ์„ฑ์š”์†Œ
  2. ์ฑ…์ž„ ํ™•๋ฅ  $\gamma_t(j,m)$์€ GMM ํŒŒ๋ผ๋ฏธํ„ฐ ์—…๋ฐ์ดํŠธ์— ์‚ฌ์šฉ๋˜๋Š” ํ†ต๊ณ„๋Ÿ‰

DNN-HMM ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ์‹œ์Šคํ…œ์—์„œ DNN์˜ GMM ๋Œ€์ฒด ๋ฐฉ์‹

๊ธฐ๋ณธ ๊ฐœ๋…์˜ ๋ณ€ํ™”

GMM-HMM ์‹œ์Šคํ…œ์—์„œ DNN-HMM ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ์‹œ์Šคํ…œ์œผ๋กœ์˜ ์ „ํ™˜์€ ASR ๋ฐœ์ „์— ์žˆ์–ด ์ค‘์š”ํ•œ ํŒจ๋Ÿฌ๋‹ค์ž„ ๋ณ€ํ™”์˜€๋‹ค. ์ฃผ์š” ๋ณ€ํ™”๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค

GMM-HMM์—์„œ

  • GMM์€ ์ƒ์„ฑ ๋ชจ๋ธ(generative model)๋กœ $p(o_t|s_j)$, ์ฆ‰ ์ƒํƒœ $j$๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ ๊ด€์ธก๊ฐ’ $o_t$์˜ ์šฐ๋„(likelihood)๋ฅผ ์ง์ ‘ ๋ชจ๋ธ๋ง
  • ๊ฐ HMM ์ƒํƒœ๋งˆ๋‹ค ๋ณ„๋„์˜ GMM์ด ์กด์žฌ
  • ๋ฐฉ์ถœ ํ™•๋ฅ : $b_j(o_t) = p(o_t|s_j) = \sum_{m=1}^M c_{jm} \mathcal{N}(o_t; \mu_{jm}, \Sigma_{jm})$
    DNN-HMM์—์„œ:
  • DNN์€ ํŒ๋ณ„ ๋ชจ๋ธ(discriminative model)๋กœ $p(s_j|o_t)$, ์ฆ‰ ๊ด€์ธก๊ฐ’ $o_t$๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ ์ƒํƒœ $j$์˜ ์‚ฌํ›„ ํ™•๋ฅ (posterior)์„ ์˜ˆ์ธก
  • ํ•˜๋‚˜์˜ DNN์ด ๋ชจ๋“  ์ƒํƒœ์˜ ์‚ฌํ›„ ํ™•๋ฅ ์„ ๋™์‹œ์— ์ถœ๋ ฅ
  • ๋ฒ ์ด์ฆˆ ๊ทœ์น™์œผ๋กœ ์šฐ๋„๋กœ ๋ณ€ํ™˜: $p(o_t|s_j) \propto \frac{p(s_j|o_t)}{p(s_j)}$

DNN์ด GMM์„ ๋Œ€์ฒดํ•˜๋Š” ๋ฉ”์ปค๋‹ˆ์ฆ˜

DNN์€ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๋ฐฉ์‹์œผ๋กœ GMM์˜ ์—ญํ• ์„ ๋Œ€์ฒดํ•œ๋‹ค:
์ž…๋ ฅ: ์Œํ–ฅ ํŠน์ง•(MFCC, FBANK ๋“ฑ)๊ณผ ๊ทธ ๋ฌธ๋งฅ(์•ž๋’ค ํ”„๋ ˆ์ž„)
์ถœ๋ ฅ: ๊ฐ HMM ์ƒํƒœ(senone)์— ๋Œ€ํ•œ ์‚ฌํ›„ ํ™•๋ฅ 
๋””์ฝ”๋”ฉ ์‹œ ์‚ฌ์šฉ: ๋ฒ ์ด์ฆˆ ๊ทœ์น™์„ ํ†ตํ•ด ์šฐ๋„๋กœ ๋ณ€ํ™˜

GMM-HMM์œผ๋กœ ์ดˆ๊ธฐ ๊ฐ•์ œ ์ •๋ ฌ ์ˆ˜ํ–‰

์ •๋ ฌ๋œ ํ”„๋ ˆ์ž„ ๋ ˆ์ด๋ธ”์„ ์‚ฌ์šฉํ•ด DNN ํ•™์Šต (๊ต์ฐจ ์—”ํŠธ๋กœํ”ผ ์†์‹ค ํ•จ์ˆ˜)
์ถœ๋ ฅ์ธต์€ softmax ํ™œ์„ฑํ™” ํ•จ์ˆ˜๋ฅผ ํ†ตํ•ด ๋ชจ๋“  HMM ์ƒํƒœ์— ๋Œ€ํ•œ ํ™•๋ฅ  ์ถœ๋ ฅ

๋””์ฝ”๋”ฉ ๋‹จ๊ณ„

DNN์ด ๊ฐ ํ”„๋ ˆ์ž„์— ๋Œ€ํ•œ ์ƒํƒœ ์‚ฌํ›„ ํ™•๋ฅ  $p(s_j|o_t)$ ์ถœ๋ ฅ
์‚ฌํ›„ ํ™•๋ฅ ์„ ์šฐ๋„๋กœ ๋ณ€ํ™˜: $p(o_t|s_j) \propto \frac{p(s_j|o_t)}{p(s_j)}$
์ด ์šฐ๋„๋ฅผ HMM ๋””์ฝ”๋”์— ์ œ๊ณต (Viterbi, Beam Search ๋“ฑ)
์—ฌ๊ธฐ์„œ $p(s_j)$๋Š” ์ƒํƒœ์˜ ์‚ฌ์ „ ํ™•๋ฅ ๋กœ, ํ•™์Šต ๋ฐ์ดํ„ฐ์—์„œ ๊ฐ ์ƒํƒœ์˜ ์ถœํ˜„ ๋นˆ๋„๋ฅผ ๊ณ„์‚ฐํ•˜์—ฌ ์–ป๋Š”๋‹ค.

์ฃผ์š” ํ˜์‹  ํฌ์ธํŠธ

  1. ํŠน์ง• ํ‘œํ˜„๋ ฅ

GMM: ํ™•๋ฅ  ๋ถ„ํฌ๋ฅผ ๋ช…์‹œ์ ์œผ๋กœ ๋ชจ๋ธ๋งํ•˜์ง€๋งŒ ๋ณต์žกํ•œ ํŒจํ„ด ์ธ์‹์— ์ œํ•œ์ 
DNN: ๋น„์„ ํ˜• ๋ณ€ํ™˜์„ ํ†ตํ•ด ๋” ๋ณต์žกํ•œ ํŒจํ„ด ์ธ์‹ ๊ฐ€๋Šฅ, ๋” ๊ฐ•๋ ฅํ•œ ํŠน์ง• ํ‘œํ˜„ ํ•™์Šต

  1. ๋ฌธ๋งฅ ์ •๋ณด ํ™œ์šฉ

GMM: ์ฃผ๋กœ ํ˜„์žฌ ํ”„๋ ˆ์ž„์˜ ํŠน์ง•๋งŒ ์‚ฌ์šฉ
DNN: ์—ฌ๋Ÿฌ ํ”„๋ ˆ์ž„์„ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›์•„ ๋” ๊ธด ๋ฌธ๋งฅ ์ •๋ณด ํ™œ์šฉ ๊ฐ€๋Šฅ (์˜ˆ: 9ํ”„๋ ˆ์ž„ ์œˆ๋„์šฐ)

  1. ํŒŒ๋ผ๋ฏธํ„ฐ ๊ณต์œ 

GMM: ๊ฐ ์ƒํƒœ๋งˆ๋‹ค ๋ณ„๋„์˜ ํŒŒ๋ผ๋ฏธํ„ฐ ์ง‘ํ•ฉ
DNN: ํ•˜๋‚˜์˜ ๋„คํŠธ์›Œํฌ๋กœ ๋ชจ๋“  ์ƒํƒœ์˜ ํ™•๋ฅ  ๊ณ„์‚ฐ, ํ•˜์œ„ ์ธต์—์„œ ํŠน์ง• ํ‘œํ˜„ ๊ณต์œ