Reading Note: An ASP Methodology for Understanding Narratives about Stereotypical Activities

Paper

An ASP Methodology for Understanding Narratives about Stereotypical Activities

Introduction

What is the concept of a script to model stereotypical activities?

A fixed sequence of actions that are always executed in a specific order.

What is the problem the authors want to solve?

One of their goals in this work is to understand what parts of the AIA architecture can be adopted to their purpose of modeling the reasoning of a narrative reader and how.

What did this paper do?

  • The authors proposed a new representation methodology and reasoning approach, which makes it possible to answer, in both normal and exception scenarios, questions about events that did or did not take place.

  • The system they built might be the first scalable approach to the understanding or exceptional scenarios.

  • The authors proposed a prototypical implementation of the end-to-end system in which understanding is tested via question answering.

Literature Review

  • Theory of intentions

    This theory introduces the concept of an activity - a sequence of agent actions and sub-activities that are supposed to achieve a goal. The theory of intentions is written in an action language and is translatable into Answer Set Prolog.

  • Architecture of an intentional agent (AIA)

    AIA is an agent that obeys the theory of intentions. According to AIA, at each time step, the agent observes the world, explains observations incompatible with its expectations (diagnosis), and determines what action to execute next (planning).

    The reasoning module implementing AIA allows an agent to reason about a wide variety of scenarios, including the serendipitous achievement of its goal by exogenous actions or the realization that an active activity has no chance to achieve its goal anymore.

    How AIA handles the normal and exceptional stereotypical activities?

    • Scenario 1 (Base case)

      Nicole went to a vegetarian restaurant. She ordered lentil soup. The waitress set the soup in the middle of the table. Nicole enjoyed the soup. She left the restaurant.

      • What should readers know?

        Since customers are expected to pay for their meal, readers are supposed to know such conventions. Therefore, readers know this information is missing from the text.

    • Scenario 2 (Serendipity)

      Nicole went to a vegetarian restaurant. She ordered lentil soup. When the waitress brought her the soup, she told her that it was on the house. Nicole enjoyed the soup and then left.

      • What should readers know?

        The reader should understand that Nicole did not pay for the soup.

    • Scenario 3 (Detecting Futile Activity)

      Nicole went to a vegetarian restaurant. She sat down and wanted to order lentil soup, but it was not on the menu.

      • What should readers know?

        The reader should deduce that Nicole stopped her plan of eating lentil soup here.

    • Scenario 4 (Diagnosis)

      Nicole went to a vegetarian restaurant. She ordered lentil soup. The waitress brought her a miso soup instead.

      • What should readers know?

        The reader is supposed to produce some explanations for what may have gone wrong: either the waitress or the cook misunderstood the order.

    AIA encodes an agent’s reasoning process about his own goals, intentions, and ways to achieve them.

Story Understanding

  • An extensive review of narrative processing systems can be found in Mueller’s paper.

The task taken in this paper is a more difficult one because stories about stereotypical activities tend to omit more information about the events taking place compared to other texts, as such information is expected to be filled in by the reader.

Restaurant Narratives.

Erik Mueller’s work is based on the hypothesis that readers of a text understand it by constructing a mental model of the narrative.

Mueller’s system relied on two important pieces of background knowledge:

  1. a commonsense knowledge base about actions occuring in a restaurant, their effects and preconditions encoded in Event Calculus
  2. a script describing a sequence of actions performed by different characters in a normal unfolding of a restaurant episode.

Activity Recognition

The task authors undertook here presents some similarities to activity recognition, in that it requires observing agents and their environment in order to complete the picture about the agents’ actions and activities. However, unlike activity recognition, understanding naratives limited to a single stereotypical activity does not require identifying agents’ goals, which are always the same for each role in their case.

Preliminary: Theory of Intentions

Theory of intentions of a goal-driven agent

Each sequence of actions (i.e., plan) of an agent was associated with a goal that it was meant to achieve, and the combination of the two was called an activity.

  • Activities could have nested sub-activities, and were encoded using the predicates:

    • activity(m) (m is an activity)
    • goal(m, g) (the goal of activity m is g)
    • length(m, n) (the length of activity m is n)
    • comp(m, k, x) (the kth component of activity m is x, where x is either an action or a sub-activity)
  • AIA control loop

    Observe the world and initialize history with observations

    1. interpret observations

      The agent uses diagnostic reasoning to explain unexpected observations, which involves determinging which exogenous actions may have occured without being observed.

    2. find an intended action e

      The goal of this step is to allow the agent to find an intended action. The following intended actions are considered:

      • To continue executing an ongoing activity that is expected to achieve its goal
      • To stop an ongoing activity whose goal is no longer active (because it has been either achieved or abandoned)
      • To stop an activity that is no longer expected to achieve its goal
      • To start a chosen activity that is expected to achieve its goal

      Under certain conditions, there may be no way for the agent to achieve its goal, or the agent may simply have no goal. In either case, the agent’s intended action is to wait. For the case when the agent continues executing an ongoing activity, the fluent nexg_action(m, a) in the theory of intentions becomes relevant as it indicates the action in activity m that the agent would have to attempt next.

    3. Attempt to perform e and update history with a record of the attempt

      The agent acts and records its attempt to perform the intended action.

    4. observe the world, update history with observations, and go to step 3

      The agent observes the values of fluents, the result of his attempt to act form step3, and possibly the occurrence of someexogenous actions.

The authors introduced the concept of an intentional agent - one that has goals that it intends to pursue, ‘only attempts to perform those actions that are intended and does so without delay.’

The agent is expected to possess knowledge about the changing world around it. The knowledge can be represented as a trransition diagram in which nodes denote physical states of the world and arc are labeled by physically executable actions that may take the world from one state to the other. States describe the values of relevant properties of the world, where properties are divided into fluents (those that can be changed by actions) and statics (those that cannot).

To accommodate intentions and decisions of an intentional agent, Blount et al. expanded the traditional transition diagram with mental fluents and actions.

Three important mental fluents in their theory are:

  • status(m, k) (m is in progress if k greater than or equal to 0; not yet started or stpped if k = -1)

    => The authors added an extra argument ag to associate an agent to each mental fluent and action of TI (e.g., status(m, k) became status(ag, m, k))

  • active_goal(g) (goal g is active)

  • next_action(m, a) (‘the next physical action to be executed as part of activity m is a‘)

Axioms describe how the execution of physical actions affects the status of activities and sub-activities, activates (or inactivates) goals and sub-goals, and determines the selection of the next action to be executed.

The author also extended TI by the ASP axioms below, needed to make explicit Blount et al.’s assumption that an agent has only one top-level goal at a time.

This restriction is important when modeling an external observer and was not fully captured previously by TI.

The first two axioms say that an agent cannot select a goal if it already has an active goal or if it selects another goal at the same time.

The third rule says that the stopping of an activity inactivates the goals of all of its sub-actvities.

impossible(select(Ag, G), I) :-

holds(active_goal(Ag, G1), I),

possible_goal(Ag, G).

impossible(select(Ag, G), I) :-

occurs(select(Ag, G1), I), possible(Ag, G),

​ G =\= G1.

not holds(active_goal(Ag, G1), I + 1) :-

​ goal(M1, G1),

holds(descendant(Ag, M1, M), I),

occurs(stop(Ag, M), I).

Mental actions include:

  • select(g) for goal
  • abandon(g) for goal
  • start(m) for activities
  • stop(m) for activities

Methodology

Assumptions

The authors assumed that a wide coverage commonsense knowledge base written in ASP is available to us and that it contains information about a large number of actions, their effects and preconditions, including actions in the stereotypical activity.

  • In order to be able to evaluate the methodology, the authors had built a basic knowledge base with core information about restaurants and, whenever a scenario needed new information, they expanded the knowledge base with new actions and fluents.

To simplify the first attempt to use a theory of intentions to reason about stereotypical activities, the authors made the following assumptions:

  • there is only one customer that wants to dine
  • there is only one waiter
  • there is only one cook
  • there is only one ordered dish

Methodology

For each in put text $t$ and set of questions $\bold{Q}$, the authors construct a logic program $\Pi(t, \bold{Q})$ (simply $\Pi(t), \text{if } \bold{Q}\text{ is empty}$). Its answer sets represent models of the narrative and answers to questions in $\bold{Q}$.

The core of the commonsense knowledge base has two parts:

The pre-defined part

This part consists of the following items:

  1. The knowledge base, with a core describing sorts, fluents, actions, and some pre-defined objects relevant to the stereotypical activity of focus
  2. The ASP theory of intentions $TI$
  3. A module encoding stereotypical activities as $TI$ activities for each character
  4. A reasoning module, encoding
    • a mapping of time points on the story time line into points on the reasoning time line
    • reasoning components adapted from the $AIA$ architecture to reflect a reader’s reasoning process and expected to allow reasoning about serendipitous achievement of goals, decisions to stop futile activities, and diagnosis
    • a question answering component

Th input-dependent part

This part, a logic form obtained by translating the English text $t$ and questions in $\bold{Q}$ into ASP facts, consists of the following:

  1. Facts defining objects mentioned in the text $t$ as instances of relevant sorts in knowledge base
  2. Observations about the values of fluents and the occurrences of actions at different points on the story time line
  3. Default information about the values of fluents in the initial situation
  4. Facts encoding each question in $\bold{Q}$

The core of knowledge base

Defines knowledge related to the restaurant environment. It includes:

  • A hierarchy of sorts with main sorts:

    • person, has subsorts:
      • customer
      • waiter
      • cook
    • thing, has subsorts:

      • food
      • menu
      • bill
    • restaurant

    • location

Predifined instances of sorts are used

  • instances of location

    • entrance
    • kt (kitchen)
    • ct (counter)
    • outside
    • t (table)
  • m, instance of menu

  • b, instance of bill

Actions Fluents
go(c,r), greet(w, c), move(p, l1, l2), lead_to(w, c, t), sit(c), pick_up(p, t, l), put_down(p, t, l), order(c, f, w), request(p1, t, p2), prepare(ck, f), eat(c, f), pay(c), stand_up(c), leave(c), make_unavailable(f, r), interference hungry(c), open(r), at_loc(t, l), in(c, r), welcomed(c), standing_by(p, l), sitting(c), hoding(p, l), menu_read(c), informed(p1, t, p2), available(f, r), food_perpared(ck, f), served(c), bill_generated(c), paid(b), order_transimitted(c), done_with_payment(c), satiated_and_out(c), served_and_billed(c), done_with_request(ck, w)

Where:

  • c for a customer
  • w for a waiter
  • ck for a cook
  • f for a food
  • r for a restaurant
  • t1 and t2 for things
  • l, l1 and l2 for locations
  • p, p1 and p2 for persons

How to denote the agent performing each action

They denote the agent performing each action a by using the static actor(a, p). Each action has a unique actor, except lead_to(w, c, t) in which both w and c (waiter and customer) are considered actors.

(Doubt)All fluents are inertial except…

First, what does the inertial mean? It means the fluent normally maintain their previous values unless changed by an action, except order_transmitted, done_with_payment, satiated_and_out, served_and_billed, done_with_request are defined-positive fluents, i.e., their positive value is completely defined in terms of other fluents; otherwise their default value is false.