AI policy model optimization for large-scale conversation systems
This technology is about the optimization method of the dialogue policy model and the dialogue system that implements it.
The dialogue policy model of the existing dialogue system had limitations in flexibility and scalability due to being rule-based. In particular, application of reinforcement learning in large conversation state spaces has been difficult. To solve this problem, this technology expresses the conversation state, including the trust score of the user's speech intention, as a continuous vector.
The conversation policy model is reinforced and learned through the Experience Replay technique that utilizes the data received at each time step. This allows us to implement a next-generation conversation system capable of natural conversations by efficiently reusing experience data and significantly improving data efficiency by reducing inter-sample correlation.
Key Features:
- Dialogue state determination unit: Receives speech act information from the user's utterance, determines the dialogue state including the goal, method, request, etc. of the dialogue and converts it into a continuous vector
- Dialogue management unit: Inputs the generated dialogue state vector into the 'dialogue policy model' to determine the optimal dialogue action to be performed in the current state
- Dialogue policy model learning unit: Dialogue using experience data (state, action, reward, next state) accumulated as the conversation progresses By reinforcing and learning the policy model, correlations between learning data are reduced through the 'Experience Replay' technique, which reuses experience data collected according to the order in which the model is continuously optimized
- Efficient learning is possible even with a small amount of data.