Beam Search์˜ ๊ธฐ๋ณธ ๊ฐœ๋…๊ณผ ์›๋ฆฌ

beam search, ์Œ์„ฑ ์ธ์‹์„ ๊ตฌํ˜„ ํ•ด๋ณธ ์‚ฌ๋žŒ์ด๋ผ๋ฉด, ํ•œ๋ฒˆ ์ฏค์€ ๋“ค์–ด๋ณด๊ฒŒ ๋˜๋Š” ์šฉ์–ด์ด๋‹ค. ๋‹ค๋ฅธ ๋ถ„์•ผ์—์„œ๋Š” ๋งŽ์ด ์‚ฌ์šฉํ•˜๋Š”์ง€๋Š” ๋ชจ๋ฅด๊ฒ ์ง€๋งŒ search ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ๊ฐˆ๋ž˜์ด๊ธฐ๋„ ํ•˜๊ณ , ์„ ํƒ์ง€๊ฐ€ ๋งŽ์•„์ง€๋Š” task์ธ ๊ฒฝ์šฐ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ์œ ์šฉํ•œ(?) ๋ฐฉ๋ฒ• ๊ฐ™๋‹ค. ๊ทธ๋ž˜์„œ ์˜ค๋Š˜์€ ASR์„ ๋ชจ๋ธ๋ง ํ•ด๋ณธ ์‚ฌ๋žŒ์œผ๋กœ์จ beam search์— ๋Œ€ํ•ด์„œ ๊ธ€๋กœ์จ ์ •๋ฆฌํ•ด๋ณด๊ณ ์ž ํ•œ๋‹ค. ์ฃผ๋กœ ์ฝ”๋“œ๋กœ ์ ‘ํ•˜๋Š” beam search ์ด์ง€๋งŒ ์ด๋ฒˆ ๊ธฐํšŒ์— ๊ฐœ๋…๋ถ€ํ„ฐ ์•Œ์•„๋ณด์ž.

Beam Search๋ž€ ๋ฌด์—‡์ธ๊ฐ€?

Beam Search๋Š” ์ž๋™ ์Œ์„ฑ ์ธ์‹(ASR) ์‹œ์Šคํ…œ์—์„œ ๋””์ฝ”๋”ฉ ๊ณผ์ •์— ์‚ฌ์šฉ๋˜๋Š” ํ•ต์‹ฌ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด๋‹ค. ์Œ์„ฑ์„ ํ…์ŠคํŠธ๋กœ ๋ณ€ํ™˜ํ•  ๋•Œ, ์ž…๋ ฅ๋œ ์Œํ–ฅ ์‹ ํ˜ธ์— ๋Œ€ํ•ด ๊ฐ€๋Šฅํ•œ ์—ฌ๋Ÿฌ ๊ฐ€์„ค(hypothesis)๋“ค์„ ํšจ์œจ์ ์œผ๋กœ ํƒ์ƒ‰ํ•˜์—ฌ ๊ฐ€์žฅ ํ™•๋ฅ ์ด ๋†’์€ ์‹œํ€€์Šค๋ฅผ ์ฐพ์•„๋‚ด๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค.
ASR์—์„œ ๋””์ฝ”๋”ฉ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์ˆ˜์‹์œผ๋กœ ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ๋‹ค:

$$Y^* = \arg\max_Y P(Y|X)$$

์ด ์ˆ˜์‹์ด ASR task๋ฅผ ํ•œ๋งˆ๋””๋กœ ํ•˜์ž๋ฉด, ์ตœ์ข… ์ˆ˜์‹์ด๋‹ค. ์—ฌ๊ธฐ์„œ $X$๋Š” ์ž…๋ ฅ ์Œํ–ฅ ํŠน์ง•(acoustic feature), $Y$๋Š” ์ถœ๋ ฅ ํ…์ŠคํŠธ ์‹œํ€€์Šค๋ฅผ ์˜๋ฏธํ•œ๋‹ค. ์ด๋ก ์ ์œผ๋กœ๋Š” ๋ชจ๋“  ๊ฐ€๋Šฅํ•œ $Y$์— ๋Œ€ํ•ด ํ™•๋ฅ ์„ ๊ณ„์‚ฐํ•˜์—ฌ ์ตœ๋Œ€๊ฐ’์„ ์ฐพ์•„์•ผ ํ•˜์ง€๋งŒ, ์‹ค์ œ๋กœ๋Š” ๊ฐ€๋Šฅํ•œ ๊ฒฝ์šฐ์˜ ์ˆ˜๊ฐ€ ๋„ˆ๋ฌด ๋งŽ์•„ ์ „์ˆ˜ ์กฐ์‚ฌ(exhaustive search)๊ฐ€ ๋ถˆ๊ฐ€๋Šฅํ•˜๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์‹ค์ œ๋กœ ์ „์ˆ˜ ์กฐ์‚ฌ๋ฅผ ํ•˜๋Š” ์ปดํ“จํŒ… ๋น„์šฉ๋Œ€๋น„ ์–ป๋Š” ๋ณด์ƒ์ด ํฌ์ง€ ์•Š๋‹ค. ๊ฐ€์„ฑ๋น„๊ฐ€ ๋–จ์–ด์ง€๋Š” ๋А๋‚Œ์ด๋ž„๊นŒ?

Greedy Search์™€์˜ ์ฐจ์ด์ 

๊ฐ€์žฅ ๋‹จ์ˆœํ•œ ๋””์ฝ”๋”ฉ ๋ฐฉ๋ฒ•์€ Greedy Search(ํƒ์š• ํƒ์ƒ‰)์ด๋‹ค. ์ด ๋ฐฉ๋ฒ•์€ ๊ฐ ์‹œ์ ๋งˆ๋‹ค ๊ฐ€์žฅ ํ™•๋ฅ ์ด ๋†’์€ ํ•˜๋‚˜์˜ ํ† ํฐ๋งŒ์„ ์„ ํƒํ•˜์—ฌ ์‹œํ€€์Šค๋ฅผ ๊ตฌ์„ฑํ•œ๋‹ค.
$$y_t = \arg\max_{y} P(y|X, y_1, y_2, \ldots, y_{t-1})$$
๊ทธ๋Ÿฌ๋‚˜ Greedy Search๋Š” ์ง€์—ญ ์ตœ์ ํ•ด(local optimum)์— ์‰ฝ๊ฒŒ ๋น ์งˆ ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์ „์ฒด์ ์œผ๋กœ ์ตœ์ ์˜ ์‹œํ€€์Šค๋ฅผ ๋†“์น  ๊ฐ€๋Šฅ์„ฑ์ด ํฌ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ํŠน์ • ์‹œ์ ์—์„œ ๋‘ ๋ฒˆ์งธ๋กœ ํ™•๋ฅ ์ด ๋†’์€ ์„ ํƒ์ด ์ดํ›„์˜ ๋‹จ๊ณ„์—์„œ ๋” ์ข‹์€ ์ „์ฒด ๊ฒฝ๋กœ๋กœ ์ด์–ด์งˆ ์ˆ˜ ์žˆ๋Š”๋ฐ, Greedy Search๋Š” ์ด๋Ÿฌํ•œ ๊ฐ€๋Šฅ์„ฑ์„ ๊ณ ๋ คํ•˜์ง€ ์•Š๋Š”๋‹ค. ๋ฐ˜๋ฉด Beam Search๋Š” ๊ฐ ๋””์ฝ”๋”ฉ ๋‹จ๊ณ„์—์„œ ์ƒ์œ„ k๊ฐœ์˜ ๊ฐ€๋Šฅ์„ฑ(beam)์„ ์œ ์ง€ํ•˜๋ฉฐ ํƒ์ƒ‰ํ•œ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ์ง€์—ญ ์ตœ์ ํ•ด์— ๋น ์งˆ ์œ„ํ—˜์„ ์ค„์ด๊ณ , ์ „์ฒด์ ์œผ๋กœ ๋” ๋‚˜์€ ํ•ด๋ฅผ ์ฐพ์„ ํ™•๋ฅ ์„ ๋†’์ธ๋‹ค.

Beam Search์˜ ์ž‘๋™ ์›๋ฆฌ

Beam Search์˜ ๊ธฐ๋ณธ ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค:

  1. ์ดˆ๊ธฐ ์ƒํƒœ์—์„œ ์‹œ์ž‘ํ•˜์—ฌ ์ฒซ ๋ฒˆ์งธ ํƒ€์ž„ ์Šคํ…์—์„œ ๊ฐ€๋Šฅํ•œ ๋ชจ๋“  ํ† ํฐ์˜ ํ™•๋ฅ ์„ ๊ณ„์‚ฐํ•œ๋‹ค.
  2. ํ™•๋ฅ ์ด ๊ฐ€์žฅ ๋†’์€ ์ƒ์œ„ k๊ฐœ์˜ ํ† ํฐ์„ ์„ ํƒํ•œ๋‹ค (k๋Š” beam width ๋˜๋Š” beam size).
  3. ์„ ํƒ๋œ ๊ฐ ํ† ํฐ์— ๋Œ€ํ•ด, ๋‹ค์Œ ํƒ€์ž„ ์Šคํ…์—์„œ ๊ฐ€๋Šฅํ•œ ๋ชจ๋“  ํ† ํฐ์˜ ํ™•๋ฅ ์„ ๊ณ„์‚ฐํ•œ๋‹ค.
  4. ์ด์ „ ๊ฒฝ๋กœ์˜ ํ™•๋ฅ ๊ณผ ํ˜„์žฌ ํ† ํฐ์˜ ํ™•๋ฅ ์„ ๊ฒฐํ•ฉํ•˜์—ฌ $k \times |V|$ ๊ฐœ์˜ ๊ฐ€๋Šฅํ•œ ๊ฒฝ๋กœ ์ค‘ ์ƒ์œ„ k๊ฐœ๋ฅผ ๋‹ค์‹œ ์„ ํƒํ•œ๋‹ค (|V|๋Š” ์–ดํœ˜ ํฌ๊ธฐ).
  5. ์ข…๋ฃŒ ํ† ํฐ์ด ๋‚˜ํƒ€๋‚˜๊ฑฐ๋‚˜ ์ตœ๋Œ€ ๊ธธ์ด์— ๋„๋‹ฌํ•  ๋•Œ๊นŒ์ง€ 3-4 ๋‹จ๊ณ„๋ฅผ ๋ฐ˜๋ณตํ•œ๋‹ค.

์ˆ˜์‹์ ์œผ๋กœ ํ‘œํ˜„ํ•˜๋ฉด, ์‹œ๊ฐ„ $t$์—์„œ์˜ ๊ฒฝ๋กœ ์ ์ˆ˜๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๊ณ„์‚ฐ๋œ๋‹ค: $$score(y_1, y_2, \ldots, y_t) = \log P(y_1, y_2, \ldots, y_t|X) = \sum_{i=1}^{t} \log P(y_i|X, y_1, \ldots, y_{i-1})$$

4๋ฒˆ ์„ค๋ช…์—์„œ ‘์ด์ „ ๊ฒฝ๋กœ์˜ ํ™•๋ฅ ๊ณผ ํ˜„์žฌ ํ† ํฐ์˜ ํ™•๋ฅ ์˜ ๊ฒฐํ•ฉ’์€ ๋‹จ์ˆœ multiply๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ํ™•๋ฅ ์€ 0๊ณผ 1์‚ฌ์ด์˜ ๊ฐ’์ด๋ฏ€๋กœ ํƒ€์ž… ์Šคํ… $t$๊ฐ€ ๊ธธ์–ด์ง€๋ฉด 0๋ณด๋‹ค ์ž‘์€ ํ™•๋ฅ ๊ฐ’ $y_t$๊ฐ€ ์ ์  0๊ณผ ๊ฐ€๊นŒ์›Œ ์งˆ ๊ฒƒ์ด๋ฏ€๋กœ ์ปดํ“จํ„ฐ ์—ฐ์‚ฐ์—์„œ underflow ๋ฌธ์ œ๋ฅผ ์•ผ๊ธฐํ•  ์ˆ˜ ์žˆ๋‹ค. ๊ทธ๋ž˜์„œ ๋กœ๊ทธ ํ™•๋ฅ ์˜ ํ•ฉ์œผ๋กœ, ์ˆ˜์น˜์  ์•ˆ์ •์„ฑ์„ ์œ„ํ•ด ์ผ๋ฐ˜์ ์œผ๋กœ ๋กœ๊ทธ ๋„๋ฉ”์ธ์—์„œ ๊ณ„์‚ฐํ•จ์œผ๋กœ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•œ๋‹ค.

Beam Width(Size)์˜ ์˜๋ฏธ์™€ ์„ ํƒ

Beam width(k)๋Š” ๊ฐ ๋””์ฝ”๋”ฉ ๋‹จ๊ณ„์—์„œ ์œ ์ง€ํ•˜๋Š” ๊ฐ€์„ค์˜ ์ˆ˜๋ฅผ ๊ฒฐ์ •ํ•˜๋Š” ๋งค๊ฐœ๋ณ€์ˆ˜์ด๋‹ค. ์ด๋Š” ํƒ์ƒ‰ ๊ณต๊ฐ„๊ณผ ๊ณ„์‚ฐ ๋ณต์žก์„ฑ, ๊ฒฐ๊ณผ์˜ ํ’ˆ์งˆ ์‚ฌ์ด์˜ ์ค‘์š”ํ•œ ๊ท ํ˜•์ ์„ ์ œ๊ณตํ•œ๋‹ค.
์ž‘์€ beam width (k=1):
Greedy Search์™€ ๋™์ผํ•˜๋ฉฐ, ๊ณ„์‚ฐ์ด ๋น ๋ฅด์ง€๋งŒ ์ตœ์ ํ•ด๋ฅผ ๋†“์น  ๊ฐ€๋Šฅ์„ฑ์ด ๋†’๋‹ค.
ํฐ beam width:
๋” ๋„“์€, ์ข…ํ•ฉ์ ์ธ ํƒ์ƒ‰์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•˜์ง€๋งŒ ๊ณ„์‚ฐ ๋น„์šฉ์ด ์ฆ๊ฐ€ํ•œ๋‹ค. ASR ์‹œ์Šคํ…œ์—์„œ๋Š” ์ผ๋ฐ˜์ ์œผ๋กœ k=5~20 ์‚ฌ์ด์˜ ๊ฐ’์„ ์‚ฌ์šฉํ•˜๋Š”๋ฐ, ์ด๋Š” ๋””์ฝ”๋”ฉ ํ’ˆ์งˆ๊ณผ ์†๋„ ์‚ฌ์ด์˜ ํ•ฉ๋ฆฌ์ ์ธ ํƒ€ํ˜‘์ ์„ ์ œ๊ณตํ•œ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ตœ์ ์˜ beam width๋Š” ๋ชจ๋ธ ์•„ํ‚คํ…์ฒ˜, ํƒœ์Šคํฌ ๋ณต์žก์„ฑ, ํ•˜๋“œ์›จ์–ด ๋ฆฌ์†Œ์Šค ๋“ฑ์— ๋”ฐ๋ผ ๋‹ฌ๋ผ์งˆ ์ˆ˜ ์žˆ๋‹ค.

ํฅ๋ฏธ๋กญ๊ฒŒ๋„, beam width๋ฅผ ๋ฌดํ•œํžˆ ํฌ๊ฒŒ ํ•˜๋Š” ๊ฒƒ์ด ํ•ญ์ƒ ์ตœ์ƒ์˜ ๊ฒฐ๊ณผ๋ฅผ ๋ณด์žฅํ•˜์ง€๋Š” ์•Š๋Š”๋‹ค๊ณ  ํ•œ๋‹ค. ์ด๋Š” ‘beam search curse’๋ผ๋Š” ํ˜„์ƒ์œผ๋กœ, ํŠน์ • ์ž„๊ณ„๊ฐ’ ์ด์ƒ์—์„œ๋Š” ์„ฑ๋Šฅ ํ–ฅ์ƒ์ด ๋ฏธ๋ฏธํ•˜๊ฑฐ๋‚˜ ์˜คํžˆ๋ ค ๊ฐ์†Œํ•  ์ˆ˜ ์žˆ๋‹ค. ๊ทธ๋ ‡๋‹ค๊ณ  ํ•œ๋‹ค.

ASR ๋ชจ๋ธ ์•„ํ‚คํ…์ฒ˜๋ณ„ Beam Search ์ ์šฉ

HMM-GMM ์‹œ์Šคํ…œ์—์„œ Beam Search๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ž‘๋™ํ•œ๋‹ค: ๊ฐ ์‹œ๊ฐ„ ํ”„๋ ˆ์ž„๋งˆ๋‹ค ๊ฐ€๋Šฅํ•œ ๋ชจ๋“  HMM ์ƒํƒœ์˜ ํ™•๋ฅ ์„ ๊ณ„์‚ฐํ•œ๋‹ค. ํ™•๋ฅ ์ด ์ž„๊ณ„๊ฐ’(beam threshold) ์ดํ•˜์ธ ์ƒํƒœ๋Š” ์ œ๊ฑฐ(pruning)ํ•œ๋‹ค. ๋‚จ์€ ํ™œ์„ฑ ์ƒํƒœ(active states)์— ๋Œ€ํ•ด์„œ๋งŒ ๋‹ค์Œ ํ”„๋ ˆ์ž„์˜ ๊ณ„์‚ฐ์„ ์ง„ํ–‰ํ•œ๋‹ค.
์ˆ˜์‹์ ์œผ๋กœ, ์‹œ๊ฐ„ $t$์—์„œ ์ƒํƒœ $s$์˜ Viterbi ๊ฒฝ๋กœ ํ™•๋ฅ ์€: $$\delta_t(s) = \max_{s’} {\delta_{t-1}(s’) \cdot a_{s’s} \cdot b_s(o_t)}$$

์—ฌ๊ธฐ์„œ:

$\delta_t(s)$: ์‹œ๊ฐ„ $t$์—์„œ ์ƒํƒœ $s$๊นŒ์ง€์˜ ์ตœ๋Œ€ ๊ฒฝ๋กœ ํ™•๋ฅ 
$a_{s’s}$: ์ƒํƒœ $s’$์—์„œ $s$๋กœ์˜ ์ „์ด ํ™•๋ฅ 
$b_s(o_t)$: ์ƒํƒœ $s$์—์„œ ๊ด€์ธก๊ฐ’ $o_t$์˜ ๋ฐฉ์ถœ ํ™•๋ฅ 

Beam Search์—์„œ๋Š” ๊ฐ ์‹œ๊ฐ„ ํ”„๋ ˆ์ž„๋งˆ๋‹ค ๋‹ค์Œ ์กฐ๊ฑด์„ ๋งŒ์กฑํ•˜๋Š” ์ƒํƒœ๋งŒ ์œ ์ง€ํ•œ๋‹ค: $$\delta_t(s) \geq \max_j {\delta_t(j)} \cdot \theta$$ ์—ฌ๊ธฐ์„œ $\theta$๋Š” beam threshold(0๊ณผ 1 ์‚ฌ์ด์˜ ๊ฐ’)์ด๋‹ค.