Internet Engineering Task Force SIPPING WG Internet Draft J. Rosenberg dynamicsoft [-draft-rosenberg-sipping-conferencing-framework-00.txt October 28, 2002-] {+draft-rosenberg-sipping-conferencing-framework-01.txt February 12, 2003+} Expires: [-April-] {+August+} 2003 A Framework for Conferencing with the Session Initiation Protocol STATUS OF THIS MEMO This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress". The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt To view the list Internet-Draft Shadow Directories, see http://www.ietf.org/shadow.html. Abstract The Session Initiation Protocol (SIP) supports the initiation, modification, and termination of media sessions between user agents. These sessions are managed by SIP dialogs, which represent a SIP relationship between a pair of user agents. Because dialogs are between pairs of user agents, SIP's usage for two-party communications (such as a phone call), is obvious. Communications sessions with multiple participants, generally known as conferencing, [-is-] {+are+} more complicated. This document defines a framework for how such conferencing can occur. This framework describes the overall architecture, terminology, and protocol components needed for multi- party conferencing. J. Rosenberg [Page 1] Internet Draft Conferencing Framework [-October 28, 2002-] {+February 12, 2003+} Table of Contents 1 Introduction ........................................ [-3-] {+4+} 2 Terminology ......................................... {+4+} 3 [-3 Basic-] {+Overview of Conferencing+} Architecture [-..................................-] {+...............+} 7 [-4-] {+3.1+} Usage of URIs ....................................... [-11 5-] {+10 4+} Functions of the Elements ........................... 12 [-5.1-] {+4.1+} Focus ............................................... 12 [-5.2-] {+4.2+} Conference Policy Server ............................ 13 [-5.3-] {+4.3+} Mixers .............................................. 14 [-5.4 Media Policy Server ................................. 14 5.5-] {+4.4+} Conference Notification Service ..................... 15 [-5.6-] {+4.5+} Participants ........................................ [-16 5.7-] {+15 4.6+} Conference Policy ................................... {+15 5 Common Operations ................................... 16 5.1 Creating Conferences ................................+} 16 {+5.1.1 SIP Mechanisms ...................................... 17 5.1.2 CPCP Mechanisms ..................................... 18 5.1.3 Non-Automated Mechanisms ............................ 18 5.2 Adding Participants ................................. 18 5.2.1 SIP Mechanisms ...................................... 18 5.2.2 CPCP Mechanisms ..................................... 18 5.2.3 Non-Automated Mechanisms ............................ 19 5.3 Conditional Joins ................................... 19 5.4 Removing Participants ............................... 19 5.4.1 SIP Mechanisms ...................................... 19 5.4.2 CPCP Mechanisms ..................................... 20 5.4.3 Non-Automated Mechanisms ............................ 20 5.5 Approving Policy Changes ............................ 20 5.6 Creating Sidebars ................................... 22 5.7 Destroying Conferences .............................. 23 5.7.1 SIP Mechanisms ...................................... 23 5.7.2 CPCP Mechanisms ..................................... 23 5.7.3 Non-Automated Mechanisms ............................ 23+} 5.8 {+Obtaining Membership ................................ 24 5.8.1 SIP Mechanisms ...................................... 24 5.8.2 CPCP Mechanisms ..................................... 24 5.8.3 Non-Automated Mechanisms ............................ 24 5.9 Adding and Removing+} Media [-Policy ........................................ 17-] {+........................... 24 5.9.1 SIP Mechanisms ...................................... 25 5.9.2 CPCP Mechanisms ..................................... 25 5.9.3 Non-Automated Mechanisms ............................ 25 5.10 Conference Announcements and Recordings ............. 25 5.11 Floor Control ....................................... 27 J. Rosenberg [Page 2] Internet Draft Conferencing Framework February 12, 2003 5.12 Camera and Video Controls ........................... 27+} 6 Physical Realization ................................ [-17-] {+28+} 6.1 Centralized Server .................................. [-17-] {+28+} 6.2 Endpoint Server ..................................... [-17-] {+28+} 6.3 Media Server Component .............................. [-18-] {+28+} 6.4 Distributed Mixing .................................. [-21-] {+31+} 6.5 Cascaded Mixers ..................................... [-22-] {+33+} 7 [-Common Operations ................................... 22 7.1 Creating Conferences ................................ 22 7.2 Adding Participants ................................. 25 7.3 Removing Participants ............................... 27 7.4 Approving Policy Changes ............................ 27 7.5 Creating Sidebars ................................... 28 8-] Security Considerations ............................. [-28 9-] {+33 8+} Contributors ........................................ [-29-] {+33 9 Changes since draft-rosenberg-sipping- conferencing-framework-00 ...................................... 35+} 10 Authors Addresses ................................... [-29-] {+35+} 11 Normative References ................................ [-29-] {+35+} 12 Informative References .............................. [-29-] {+35+} J. Rosenberg [Page [-2]-] {+3]+} Internet Draft Conferencing Framework [-October 28, 2002-] {+February 12, 2003+} 1 Introduction The Session Initiation Protocol (SIP) [1] supports the initiation, modification, and termination of media sessions between user agents. These sessions are managed by SIP dialogs, which represent a SIP relationship between a pair of user agents. Because dialogs are between pairs of user agents, SIP's usage for two-party communications (such as a phone call), is obvious. Communications sessions with multiple participants, however, are more complicated. SIP can support many models of multi-party communications. One, referred to as loosely coupled conferences, makes use of multicast media groups. In the loosely coupled model, there is no signaling relationship between participants in the conference. There is no central point of control or conference server. Participation is gradually learned through control information that is passed as part of the conference (using the Real Time Control Protocol (RTCP) [2], for example). Loosely coupled conferences are easily supported in SIP by using multicast addresses within its session descriptions. In another model, referred to as fully distributed multiparty conferencing, each participant maintains a signaling relationship with each other participant, using SIP. There is no central point of control; it is completely distributed amongst the participants. [-SIP does not yet support-] {+This model is outside the scope of+} this [-model.-] {+document.+} In another model, sometimes [-referrred-] {+referred+} to as the tightly coupled conference, there is a central point of control. Each participant connects to this central point. It provides a variety of conference functions, and may possibly perform media mixing functions as well. Tightly coupled conferences are not directly addressed by [-the SIP specification,-] {+RFC 3261,+} although basic [-ones are-] {+participation is+} possible without any additional protocol support. This document is one of a series of specifications that discusses tightly coupled conferences. Here, we present the overall framework for tightly coupled conferencing, referred to simply as "conferencing" from this point forward. This framework presents a general architectural model for these conferences, presents terminology used to discuss such conferences, and describes the sets of protocols involved in a conference. The aim of the framework is to meet the general requirements for conferencing that are outlined in [3]. 2 Terminology Conference: [-Sadly, conference-] {+Conference+} is an overused term which has different meanings in different contexts. In SIP, a conference is an instance of a multi-party conversation. {+Within the context+} J. Rosenberg [Page [-3]-] {+4]+} Internet Draft Conferencing Framework [-October 28, 2002 Within the context-] {+February 12, 2003+} of this specification, a conference is always a tightly coupled conference. Loosely Coupled Conference: A loosely coupled conference is a conference without coordinated signaling relationships amongst participants. Loosely coupled conferences {+frequently+} use multicast for distribution of conference memberships. Tightly Coupled Conference: A tightly coupled conference is a conference in which a single user agent, referred to as a focus, maintains a dialog with each participant. The focus plays the role of the centralized manager of the conference, and is addressed by a conference URI. Focus: The focus is a SIP user agent that is addressed by a conference [-URI.-] {+URI and identifies a conference (recall that a conference is a unique instance of a multi-party conversation).+} The focus maintains a SIP signaling relationship with each participant in the conference. The focus is responsible for [-insuring,-] {+ensuring,+} in some way, that each participant receives the media that make up the conference. The focus also implements conference policies. The focus is a logical role. Conference URI: A URI, usually a SIP URI, which identifies the focus of a conference. [-Participants:-] {+Participant:+} The [-set of user agents, each identified by-] {+software element that connects+} a [-URI, which are connected-] {+user or automata+} to [-the focus for-] a [-particular-] conference. [-Conference Notification Service: A conference notification service is-] {+It implements, at a minimum, a SIP user agent, but may also include a conference policy control protocol client, for example. Conference Notification Service: A conference notification service is+} a logical function provided by the focus. The focus can act as a notifier [4], accepting subscriptions to the conference state, and notifying subscribers about changes to that state. The state includes the state maintained by the focus itself, the conference policy, and the media policy. Conference Policy Server: A conference policy server is a logical function which can store and manipulate {+the conference policy. The conference policy is the overall set of+} rules [-associated with-] {+governing operation of the conference. It is broken into membership policy and media policy. Unlike the focus, there is not an instance of the conference policy server for each conference. Rather, there is an instance of J. Rosenberg [Page 5] Internet Draft Conferencing Framework February 12, 2003 the membership and media policies for each conference. Conference Policy: The complete set of rules manipulated by the conference policy server. It includes the membership policy and the media policy. Membership Policy: A set of rules manipulated by the conference policy server regarding+} participation in [-a-] {+the+} conference. These rules include directives on the lifespan of the conference, who can and cannot join the conference, definitions of roles available in the conference and the responsibilities associated with those roles, and policies on who is allowed to request which roles. [-The conference policy server is a logical role.-] Media [-Policy Server:-] {+Policy:+} A [-media-] {+set of rules manipulated by the conference+} policy server [-is a logical function J. Rosenberg [Page 4] Internet Draft Conferencing Framework October 28, 2002 which can store and manipulate rules associated with-] {+regarding+} the media [-distribution-] {+composition+} of the conference. [-These-] {+The media policy is used by the focus to determine the mixing characteristics for the conference. The media policy includes+} rules [-can specify-] {+about+} which participants receive media from which other participants, and the ways in which that media is combined for each participant. In the case of audio, these rules can include the relative volumes at which each participant is mixed. In the case of video, these rules can indicate whether the video is tiled, whether the video indicates the loudest speaker, and so on. Conference [-Policy: The set of rules manipulated by the conference policy server. Conference-] Policy Control [-Protocol:-] {+Protocol (CPCP):+} The [-client-server-] protocol used by clients to manipulate the conference policy. [-Media Policy: The set of rules manipulated by the media policy server. The media policy is used by the focus to determine the mixing characteristics for the conference. Media Policy Control Protocol: The client-server protocol used by clients to manipulate the media policy.-] Mixer: [-As defined in the Real Time Transport Protocol [2], a-] {+A+} mixer receives a set of media [-streams, and-] {+streams of the same type, and+} combines their media in a type-specific manner, redistributing the result to each participant. [-We use-] {+This includes media transported using RTP [2]. As a result,+} the term {+defined+} here [-to include combining-] {+is a superset+} of [-non-RTP-] {+the mixer concept defined in RFC 1889, since it allows for non-RTP-based+} media [-streams as well,-] such as instant messaging sessions [5]. [-Basic Conference: A basic conference is one where there is no conference policy server, media policy server, or conference subscription server - only a focus. Basic-] {+Conference-Unaware+} Participant: A [-basic-] {+conference-unaware+} participant is a participant in a conference that is not aware that it is actually in a conference. As far as the UA is concerned, it is a [-point- to-point-] {+point-to-point+} call. Cascaded [-Conference:-] {+Conferencing:+} A [-conference-] {+mechanism for group communications+} in which a [-participant is-] {+set of conferences are linked by having their focuses interact in some fashion. Simplex Cascaded Conferences: a group of conferences which are linked such that the user agent which represents+} the focus {+J. Rosenberg [Page 6] Internet Draft Conferencing Framework February 12, 2003+} of [-another conference. Complex Conference: A complex conference includes at least-] one [-of a conference policy server, media policy server, or-] conference [-subscription server,-] {+is a conference-unaware participant+} in [-addition to the focus. Complex-] {+another conference. Conference-Aware+} Participant: A [-complex-] {+conference-aware+} participant is a participant in a conference that has learned, through automated means, that [-J. Rosenberg [Page 5] Internet Draft Conferencing Framework October 28, 2002-] it is in a conference, and that can use a conference policy control protocol, media policy control protocol, or conference subscription, to implement advanced functionality. Conference Server: A conference server is a physical server which contains, at a minimum, the focus. It may also include a [-media policy server, a-] conference policy [-server,-] {+server+} and [-a mixer. Singleton: In this context, a singleton is a conference participant that is not a focus.-] {+mixers. Mass Invitation:+} A [-singleton represents a single user in-] {+conference policy control protocol request to invite+} a {+large number of users into the+} conference. [-Conference Topology: The-] {+Mass Ejection: A+} conference [-topology is-] {+policy control protocol request to remove+} a [-graph that defines-] {+large number of users from+} the [-connectivity amongst participants connected through conferences. Each node in-] {+conference. Sidebar: A sidebar appears to+} the [-graph represents-] {+users within the sidebar as+} a [-user agent, whether it-] {+"conference within the conference". It+} is a [-focus or-] {+conversation amongst+} a [-singleton. Each leaf node in-] {+subset of+} the [-tree represents an singleton, and an internal node represents a focus.-] {+participants to which the remaining participants are not privy. Anonymous Participant:+} An [-edge between two nodes implies that there is a SIP dialog between them. Ideally, conference topologies are trees, not arbitrary graphs. Conversation Space: For each conference URI, there is a unique conversation space. The conversation space is defined as the set of singleton in the conference topology associated with that URI. The conference topology associated with a conference URI is the one that is constructed by starting with the focus for that URI. Under normal circumstances, the set of singleton in a conversation space will all receive each others media. Instant Conference: A conference in which the focus is constructed the instant the first INVITE for a URI is received, and then destroyed in which the last participant has left. Mass Invitation: A conference policy control protocol request to invite a large number of users into the conference. Mass Ejection: A conference policy control protocol request to remove a large number of users from the conference. Sidebar: A sidebar appears to the users as a "conference within the conference". It is a dicsussion amongst a subset of the participants, not heard by the remaining participants in the conference. J. Rosenberg [Page 6] Internet Draft Conferencing Framework October 28, 2002 Anonymous Participant: An anonymous participant is one-] {+anonymous participant is one+} that is known to other participants [-(through-] {+through+} the conference notification [-service),-] {+service,+} but whose identity is being withheld. [-Invisible-] {+Hidden+} Participant: [-An invisible-] {+A hidden+} participant is one that is not known to other participants in the conference. They may be known to the moderator, depending on conference policy. 3 [-Basic Architecture A SIP conference is represented by a URI. This URI identifies the focus, which is the user agent at the center-] {+Overview+} of [-the conference. Any participant that is involved-] {+Conferencing Architecture The central component (literally)+} in [-the-] {+a SIP+} conference is [-connected to-] the {+focus. The+} focus [-by-] {+maintains+} a SIP [-dialog.-] {+signaling relationship with each participant in the conference.+} The result is a star topology, shown in Figure 1. The focus [-has access to a conference policy and-] {+is responsible for making sure that the+} media [-policy, an instance of which exist for each focus. In a basic SIP conference, these policies are administratively defined. Users join the conference by sending an INVITE to the conference URI. As long as the conference policy allows, the INVITE is accepted by the focus and the user is brought into the conference. Users can leave the conference by sending a BYE, as they would in a normal call. Indeed, a participant in a basic conference does not need to know that the focus is anything other than a normal SIP user agent. Similarly, the focus can terminate a dialog with a participant, should the conference policy change to indicate that the participant is no longer allowed in the conference. A focus can also initiate an INVITE, should the conference policy indicate that the focus needs to bring a participant into the conference. The focus is responsible for making sure that the media streams-] {+streams+} which constitute the conference are available to the participants in the conference. It does that through the use of one or more mixers, each of which combines a number of input media streams to produce one or more output media streams. The focus uses the media policy to determine the proper configuration of the mixers. [-With these basic capabilities, a large number of common conferencing applications can be built. None of them require any extensions to SIP; they merely require that the focus is aware of its role and responsibilities in maintaining the conference. However, basic conferences do not allow for the participants to control the way in which the conference operates.-] J. Rosenberg [Page 7] Internet Draft Conferencing Framework [-October 28, 2002-] {+February 12, 2003+} +-----------+ | | | | |Participant| | {+4+} | | | +-----------+ | |SIP |Dialog [-|-] {+|4+} | +-----------+ +-----------+ +-----------+ | | | | | | | | | | | | |Participant|-----------| Focus |------------|Participant| | {+1+} | SIP | | SIP | {+3+} | | | Dialog | | Dialog | | +-----------+ {+1+} +-----------+ {+3+} +-----------+ | | |SIP |Dialog [-|-] {+|2+} | +-----------+ | | | | |Participant| | {+2+} | | | +-----------+ Figure 1: [-Basic-] SIP Conference [-A complex SIP-] {+Architecture The focus has access to the+} conference [-is one in which additional interfaces are exposed, allowing for a richer set-] {+policy (composed+} of [-controls and information on-] the {+membership and media policies), an instance of which exist for each+} conference. [-In particular, a complex SIP-] {+Effectively, the+} conference {+policy+} can [-include-] {+be thought of as+} a J. Rosenberg [Page 8] Internet Draft Conferencing Framework [-October 28, 2002 conference policy server and a media policy server, and-] {+February 12, 2003 database which describes+} the [-focus can expose a conference notification service. The model for these conferences is shown in Figure 2. This figure shows-] {+way that+} the [-view from one participant. The-] conference [-now encompasses an additional set of functions. In addition to maintaining the dialog with the focus, the participant now has access to these other functions.-] {+should operate.+} It [-can, using a conference event package [6], SUBSCRIBE to the conference URI, and be connected to the conference notification service provided by the focus. Through this package, it can learn about changes in participants (effectively, the state of the dialogs), the media policy, and the conference policy. The participant can also communicate with the conference policy server, using a conference policy control protocol. This-] is [-a strictly client-server transactional protocol. This protocol might not be a protocol at all; it can be performed using a web interface. In this case, no standardized protocols or policies are needed. However, the web interface can only be manipulated by humans, not automata. For this reason,-] the [-participant can use a protocol designed specifically for this purpose. The participant can also communicate with-] {+responsibility of+} the [-media policy server, using a media policy control protocol. This is a strictly client- server transactional operation. This can also be through a web interface, or through an explicit protocol. The-] focus [-will access the media and conference-] {+to enforce those+} policies. [-There is a tight coupling between these policies and the focus.-] Not only does [-it-] {+the focus+} need read access to [-these policies,-] {+the database,+} but it needs to know when [-they have-] {+it has+} changed. Such changes might result in SIP signaling (for example, the ejection of a user from the conference using BYE), and most changes will require a notification to be sent to subscribers [-to-] {+using+} the conference notification service. The conference [-policy and media policy servers need not be available in any particular conference. Even when available, they need not be used-] {+is represented+} by [-all participants. A participant in-] a [-conference that does not access any of these functions, and-] {+URI,+} which [-doesn't even know that-] {+identifies+} the [-focus is a focus, is called a basic participant. A-] {+focus. Each+} conference [-participant that can discover and access these additional function is-] {+has+} a [-complex participant. Any conference can include basic and complex participants. The interfaces between (1) the-] {+unique+} focus and {+a unique URI identifying that focus. Requests to+} the [-media policy, (2)-] {+conference URI are routed to+} the focus [-and-] {+for that specific conference. Users usually join+} the conference [-policy, (3)-] {+by sending an INVITE to the conference URI. As long as+} the conference policy [-server-] {+allows, the INVITE is accepted by the focus+} and the {+user is brought into the conference. Users can leave the+} conference [-policy, and (4)-] {+by sending a BYE, as they would in a normal call. Similarly,+} the [-media-] {+focus can terminate a dialog with a participant, should the conference+} policy [-server and-] {+change to indicate that+} the [-media-] {+participant is no longer allowed in the conference. A focus can also initiate an INVITE, should the conference+} policy [-are not subject-] {+indicate that the focus needs+} to [-standardization at-] {+bring a participant into+} the [-time-] {+conference. The notion+} of {+a conference-unaware participant is important in+} this [-writing.-] {+framework. A conference-unaware participant does not even know that the UA it is communicating with happens to be a focus. As far as its concerned, its a UA just like any other. The focus, of course, knows that its a focus, and it performs the tasks needed for the conference to operate. Conference-unaware participants have access to a good deal of functionality.+} They [-are intended primarily-] {+can join and leave conferences using SIP, and obtain more advanced features through stimulus signaling, as discussed in [6]. However, if the participant wishes+} to [-show-] {+explicitly control aspects of+} the [-logical roles-] {+conference using functional signaling protocols, the participant must be conference-aware. A conference-aware participant is one that has access to advanced functionality through additional protocol interfaces. The client uses these protocols to interact with the conference policy server and the focus. A model for this interaction is shown in Figure 2. The participant can interact with the focus using extensions, such as REFER, in order to access enhanced call control functions [7]. The participant can SUBSCRIBE to the conference URI, and be connected to the conference notification service provided by the focus. Through+} J. Rosenberg [Page 9] Internet Draft Conferencing Framework [-October 28, 2002 Conference ..................................... Policy . +-----------+ . Control . | | . Protocol . |Participant| . +------------------->| Policy | . | . | Server | . | . | | \ . | Media . +-----------+ \ . | Policy . +-----------+ \ //-----\\ . | Control . | | > || || . | Protocol . | Media | \\-----// . | +------------->| Policy | | | . | | . | Server |----> |Conference . | | . | | | | . | | . +-----------+ | & | . | | . | | . | | . | Media | . +-----------+ . +-----------+ | Policy| . | | . | | \ // . | | . | | \-----/ . |Participant|<--------->| Focus | | . | | SIP . | | | . | | Dialog . | |<-----------+ . +-----------+ . |...........| . ^ . | Conference| . | . |Notification . +------------>| Service | . Subscription. +-----------+ . . . . . . . . . ..................................... Conference Functions Figure 2: Complex SIP Conference J. Rosenberg [Page 10] Internet Draft Conferencing Framework October 28, 2002-] {+February 12, 2003 this mechanism, it can learn about changes in participants (effectively, the state of the dialogs), the media policy, and the membership policy. The participant can communicate with the conference policy server using a conference policy control protocol. Through this protocol, it can affect the conference policy. The conference policy server need not be available in any particular conference, although there is always a conference policy. The interfaces between the focus and the conference policy, and the conference policy server and the conference policy, are not subject to standardization at the time of this writing. They are intended primarily to show the logical roles involved in a conference, as opposed to suggesting a physical decomposition. The separation of these functions is documented here+} to encourage clarity in the requirements and to allow individual implementations the flexibility to compose a conferencing system in a scalable and robust manner. [-4-] {+3.1+} Usage of URIs It is fundamental to this framework that a conference is uniquely identified by a URI, and that this URI [-identify-] {+identifies+} the focus which is responsible for the conference. [-This-] {+The conference URI is unique, such that no two conferences have the same conference URI. A conference+} URI is always a SIP or SIPS URI. The conference URI is opaque to any participants which might use it. There is no way to look at the URI, and know for certain whether it identifies a focus, as opposed to a user or an interface on a PSTN gateway. This is in line with the general philosophy of URI usage [-[7].-] {+[8].+} However, contextual information surrounding the URI (for example, SIP header parameters) may indicate that the URI represents a conference. {+When a SIP request is sent to the conference URI, that request is routed to the focus, and only to the focus. The element or system that creates the conference URI is responsible for guaranteeing this property.+} The conference URI can represent a long-lived conference or interest group, such as "sip:discussion-on-dogs@example.com". The focus identified by this URI would always exist, and always be managing the conference for whatever participants are currently joined. [-The-] {+Other+} conference [-URI-] {+URIs+} can [-also-] represent {+short-lived conferences, such as+} an [-"instant" conference, for example, "sip:a8sd9998as-9s8daa@example.com". An instant conference is one where the focus is instantiated when the first URI for it arrives, and then destroyed when the last participant leaves. Both of these represent variations in the policies implemented by the focus, and cannot be determined from inspection of the URI.-] {+ad-hoc conference.+} Ideally, a conference URI is never constructed or guessed by a user. [-Rather, conference URIs are learned through many mechanisms. A conference URI can be emailed or sent in an instant-] {+J. Rosenberg [Page 10] Internet Draft Conferencing Framework February 12, 2003 ..................................... . . . . . . . . . Conference . . Policy . Conference . . Policy . +-----------+ //-----\\ . Control . | | || || . Protocol . | Conference| \\-----// . +---------------->| Policy | | | . | . | Server |----> |Membership . | . | | | | . | . +-----------+ | & | . | . | | . | . | Media | . +-----------+ . +-----------+ | Policy| . | | . | | \ // . | | . | | \-----/ . |Participant|<--------->| Focus | | . | | SIP . | | | . | | Dialog . | |<-----------+ . +-----------+ . |...........| . ^ . | Conference| . | . |Notification . +------------>| Service | . Subscription. +-----------+ . . . . . . . . . ..................................... Conference Functions Figure 2: Conference-Aware Participant J. Rosenberg [Page 11] Internet Draft Conferencing Framework February 12, 2003 Rather, conference URIs are learned through many mechanisms. A conference URI can be emailed or sent in an instant+} message. A conference URI can be linked on a web page. A conference URI can be obtained from a conference policy control protocol, which can be used to create conferences and the policies associated with them. To determine that a SIP URI does represent a focus, standard techniques for URI capability discovery can be used. [-First, a participant can send an OPTIONS to a SIP URI, and if it represents a focus,-] {+Specifically,+} the [-response will-] {+caller preferences specification [9] provides the "isfocus" feature tag to+} indicate [-such [TBD]. The response will-] {+that the URI is a focus. Caller preferences parameters are+} also {+used to+} indicate [-whether or not the-] {+that a+} focus [-has implemented-] {+supports+} the [-subscription-] {+conference+} notification service. This is [-known-] {+done+} by [-the presence of an Allow header in the response, indicating-] {+declaring+} support for the SUBSCRIBE [-method, along-] {+method and the relevant package(s) in the caller preferences feature parameters associated+} with [-an Allow-Events header, indicating support for-] the [-conferencing package. A second method for determining that a URI represents a focus is through a refresh request. The Allow and Allow-Events headers, along with the caller preferences specification [8] can indicate the same information that would be learned through J. Rosenberg [Page 11] Internet Draft Conferencing Framework October 28, 2002 an OPTIONS query.-] {+conference URI.+} The other functions in a conference are also represented by URIs. If the conference policy [-and media policy servers are-] {+server is+} implemented through web pages, [-these servers are regular-] {+this server is identified by+} HTTP URIs. If [-they are-] {+it is+} accessed using an explicit protocol, [-they are the URIs-] {+it is a URI+} defined for [-those protocols.-] {+that protocol.+} Starting with the conference URI, the URIs for the other logical entities in the conference can be learned using [-[TBD]. OPEN ISSUE: I suppose we cannot say more until the protocol work is done. But, we have a requirement here - that there be a way to learn these URIs starting only with-] the conference [-URI. 5-] {+notification service. 4+} Functions of the Elements This section gives a more detailed description of the functions typically implemented in each of the elements. [-5.1-] {+4.1+} Focus As its name implies, the focus is the center of the conference. All participants in the conference are connected to it using a SIP dialog. The focus is responsible for maintaining the dialogs connected to it. It [-insures-] {+ensures+} that the dialogs are connected to a set of participants who are allowed to participate in the conference, as defined by the [-conference-] {+membership+} policy. The focus also uses SIP to manipulate the media sessions, in order to make sure each participant obtains all the media for the conference. To do that, the focus makes use of [-the services of a mixer.-] {+mixers.+} When a focus receives an INVITE, it checks the [-conference-] {+membership+} policy. The [-conference-] {+membership+} policy might indicate that this participant is not allowed to join, in which case the call can be rejected. It might indicate that another participant, acting as a moderator, needs to approve this new participant. In that case, the INVITE might be parked on a music-on-hold server, or a 183 response might be sent to indicate progress. A notification, using the conference notification service, {+J. Rosenberg [Page 12] Internet Draft Conferencing Framework February 12, 2003+} would be sent to the moderator. The moderator then has the ability to manipulate the policies using the conference policy control protocol. If the policies are changed to allow this new participant, the focus can accept the INVITE (or unpark it from the music-on-hold server). The interpretation of the [-conference-] {+membership+} policy by the focus is, itself, a matter of local policy, and not subject to standardization. [-J. Rosenberg [Page 12] Internet Draft Conferencing Framework October 28, 2002-] If a participant manipulated the [-conference-] {+membership+} policy to indicate that a certain other participant was no longer allowed in the conference, the focus would send a BYE to that other participant to remove them. This is often referred to as "ejecting" a user from the conference. The process of ejecting fundamentally constitutes these two steps - the establishment of the policy through the conference policy protocol, and the implementation of that policy (using a BYE) by the focus. Similarly, if a [-participant-] {+user+} manipulated the [-conference-] {+membership+} policy to indicate that a number of users need to be added to the conference, the focus would send an INVITE to those participants. This is often referred to as the "mass invitation" function. As with ejection, it is fundamentally composed of the policy functions that specify the participants which should be present, and the implementation of those [-functions using SIP.-] {+functions.+} A policy request to add a set of users might not require an INVITE to execute it; those users might already be participants in the conference. A similar model exists for media policy. If the media policy indicates that a participant should not receive any video, the focus might implement that policy by sending a re-INVITE, removing the media stream to that participant. Alternatively, if the video is being centrally mixed, it could inform the mixer to send a black screen to that participant. The means by which the policy is implemented are not subject to specification. [-5.2-] {+4.2+} Conference Policy Server The conference policy server allows clients to manipulate and interact with the conference policy. The conference policy is used by the focus to make authorization decisions and guide its overall behavior. Logically speaking, there is a one-to-one mapping between a conference policy and a focus. The conference policy is represented by a URI. There is a unique conference policy for each [-focus.-] {+conference.+} The conference policy URI points to a conference policy server which can manipulate that conference policy. A conference policy server also has a "top level" URI which can be used to access functions that are independent of any conference. Perhaps the most important of these functions is the {+J. Rosenberg [Page 13] Internet Draft Conferencing Framework February 12, 2003+} creation of a new conference. [-This-] {+Creation of a new conference+} will result in the construction of a new {+focus and a corresponding+} conference URI, which can then be used to join the conference [-itself.-] {+itself, along with a media policy and conference policy.+} The conference policy server is accessed using a client-server transactional protocol. The client can be a participant in the conference, or it can be a third party. Access control lists for who [-J. Rosenberg [Page 13] Internet Draft Conferencing Framework October 28, 2002-] can modify a conference policy are themselves part of the conference policy. The conference policy server [-also allows clients to create new conferences. This would result in the instantiation-] {+is responsible for reconciliation+} of [-a focus (and therefore, a conference URI associated with that focus), a conference policy, and a media policy. The conference-] {+potentially conflicting requests regarding the+} policy [-server will also have rules about who can create conferences.-] {+for the conference.+} The {+client of the+} conference policy [-also includes per-participant policies that specify how-] {+control protocol can be any entity interested in manipulating+} the [-focus is to handle a particular participant. These include whether-] {+conference policy. Clearly, participants might be interested in manipulating them. A participant might want to raise+} or [-not-] {+lower the volume for one of+} the {+other participants it is hearing. Or, a+} participant {+might want to add a user to the conference. A client of the conference policy protocol could also be another server whose job+} is [-anonymous,-] {+to determine the conference policy. As an example, a floor control server is responsible+} for [-example. 5.3-] {+determining which participant(s) in a conference are allowed to speak at any given time, based on participant requests and access rules. The floor control server would act as a client of the conference policy server, and change the media policy based on who is allowed to speak. The client of the conference policy control protocol could also be another conference policy server. 4.3+} Mixers A mixer is responsible for combining the media streams that make up the conference, and generating one or more output streams that are distributed to recipients (which could be participants or other mixers). The [-combination-] process {+of combining media+} is specific to the media type, and is directed by the focus, under the guidance of the rules described in the media policy. A mixer is not aware of a "conference" as an entity, per se. A mixer receives media streams as inputs, and based on directions provided by the focus, generates media streams as outputs. There is no grouping of media streams beyond the policies that describe the ways in which the streams are mixed. {+J. Rosenberg [Page 14] Internet Draft Conferencing Framework February 12, 2003+} A mixer is always under the control of a focus. The focus is responsible for interpreting the media policy, and then installing the appropriate rules in the mixer. If the focus is directly controlling a mixer, the mixer can either be co-resident with the focus, or can be controlled through [-a protocol like Megaco [9].-] {+some kind of protocol.+} However, a focus need not directly control a mixer. Rather, a focus can delegate the mixing to the participants, each of which has their own mixer. This is described in Section 6.4. [-5.4 Media Policy Server-] {+4.4 Conference Notification Service+} The [-media policy server is similar to the-] {+focus can provide a+} conference [-policy server. It is accessed using-] {+notification service. In this role, it acts as+} a [-transactional client-server protocol.-] {+notifier, as defined in RFC 3265 [4].+} It [-manipulates a media policy, identified by a URI. The focus has-] {+accepts subscriptions from clients for+} the [-responsibility-] {+conference URI, and generates notifications to them as the state+} of [-acting on that media policy, implementing it through direct or indirect control-] {+the conference changes. This state is composed+} of [-mixers.-] {+two separate pieces.+} The [-media policy describes-] {+first is+} the [-way in which the set-] {+state+} of [-inputs to the mixer are combined to generate-] the [-set of outputs. Media policies can span media types. In other words,-] {+focus and+} the [-policy on how one media stream-] {+second+} is [-mixed can be based on characteristics of other media streams. J. Rosenberg [Page 14] Internet Draft Conferencing Framework October 28, 2002 Media policies can be based on any quantifiable characteristic of the media stream (its source, volume, codecs, speaking/silence, etc.), and they can be based on internal or external variables accessible by-] the [-media-] {+conference+} policy. The [-media policy server is responsible for reconciliation-] {+state+} of [-potentially conflicting requests regarding-] the [-media policy for the conference. The client of-] {+focus includes+} the [-media policy protocol can be any entity interested in manipulating media policies. Clearly,-] participants [-might be interested in manipulating them. A participant might want-] {+connected+} to [-raise or lower-] the [-volume for one of-] {+focus, and information about+} the [-other-] {+dialogs associated with them. As new+} participants [-it-] {+join, this state changes, and+} is [-hearing. Or, a participant might want to switch from a tiled video view,-] {+reported through the notification service. Similarly, when someone leaves, this state also changes, allowing subscribers+} to [-just viewing-] {+learn about this fact. As described previously,+} the [-active speaker. A client of-] {+conference policy includes+} the [-media-] {+membership+} policy [-protocol could also be another server whose job is to determine-] {+and+} the media policy. As {+those policies change, due to usage of the CPCP, direct change by the focus, or through+} an [-example, a floor control server is responsible for determining which participant(s)-] {+application, the conference notification service informs subscribers of these changes. 4.5 Participants A participant+} in a conference [-are allowed to speak at-] {+is+} any [-given time, based on participant requests and access rules. The floor control server would act as-] {+SIP user agent that has+} a [-client of the media policy server, and inform the media policy server about who is allowed to speak. The client of-] {+dialog with+} the [-media policy protocol could-] {+focus. This SIP user agent can be a PC application, a SIP hardphone, or a PSTN gateway. It can+} also be another [-media policy server, as described in Section 6.4. Some examples of media policies include: o The video output-] {+focus. A conference which has a participant that+} is the [-picture-] {+focus+} of [-the loudest speaker (video follows audio). o The audio from each participant will-] {+another conference is called a simplex cascaded conference. They can also+} be [-mixed with equal weight, and distributed-] {+used+} to [-all other participants. o The audio and video that is distributed-] {+provide scalable conferences where there are regional sub- conferences, each of which+} is {+connected to+} the [-one selected by the floor control server. 5.5-] {+main conference. 4.6+} Conference [-Notification Service-] {+Policy+} The [-focus can provide a-] conference [-notification service. In this role, it acts as a notifier, as defined in RFC 3265 [4]. It accepts subscriptions from clients for-] {+policy contains+} the [-conference URI, and generates notifications to them as-] {+rules that guide+} the [-state-] {+operation+} of the [-conference changes. This state is composed of three separate pieces.-] {+focus.+} The [-first is-] {+rules can be simple, such as an access list that defines+} the [-state-] {+set+} of [-the focus, the second is the conference policy, and the third is the media policy.-] {+allowed participants in a conference. The rules can also be incredibly complex, specifying time-of-day based rules on+} J. Rosenberg [Page 15] Internet Draft Conferencing Framework [-October 28, 2002 The state of the focus includes the participants connected to the focus, and information about the dialogs associated with them. As new participants join, this state would change, allowing subscribers to learn about them. Similarly, when someone leaves, this state also changes, allowing subscribers to learn about this fact. The state of the conference policy includes the set of participants that are allowed, or not allowed, to join the conference, and-] {+February 12, 2003 participation conditional on+} the [-set-] {+presence+} of [-participants who are to be explicitly added to the conference.-] {+other participants.+} It [-includes the roles which are assigned-] {+is important+} to [-each participant, such as whether they are a moderator. If-] {+understand that+} there [-was a change in role, for example, a new moderator was selected,-] {+is no restriction on+} the [-focus would inform subscribers. The state-] {+type+} of [-the media policy includes the media streams being received by each participant, the audio or video modalities, and so on. 5.6 Participants A participant-] {+rules that can be encapsulated+} in a conference [-is any SIP user agent that has a dialog with the focus. This SIP user agent-] {+policy. The conference policy+} can be [-a PC application, a SIP hardphone,-] {+manipulated using web applications+} or [-a PSTN gateway.-] {+voice applications.+} It can also be [-another focus. A-] {+manipulated with proprietary protocols. However, the+} conference [-which has-] {+policy control protocol can be used as+} a [-participant that is-] {+standardized means of manipulating+} the [-focus-] {+conference policy. By the nature+} of [-another-] conference [-is called a cascaded conference. They-] {+policies, not all aspects of the policy+} can [-also-] be [-used to provide scalable conferences where there are regional sub- conferences, each of which is connected to-] {+manipulated with+} the [-main conference. A-] conference [-topology refers to a graph which shows each focus and each participant as a vertex, with a connection between each participant and its focus. 5.7 Conference Policy-] {+policy control protocol.+} The conference policy [-contains-] {+includes+} the [-rules that guide-] {+membership policy and+} the [-operation of-] {+media policy. The membership policy includes per-participant policies that specify how+} the [-focus.-] {+focus is to handle a particular participant.+} These [-rules can be simple, such as an access list that defines-] {+include whether or not the participant is anonymous, for example. The media policy describes the way in which+} the set of [-allowed participants in-] {+inputs to+} a [-conference. The rules-] {+mixer are combined to generate the set of outputs. Media policies can span media types. In other words, the policy on how one media stream is mixed+} can [-also-] be [-incredibly complex, specifying time-of-day-] based [-rules on participation conditional-] on [-the presence-] {+characteristics+} of other [-participants. It is important to understand that there is no restriction-] {+media streams. Media policies can be based+} on [-the type-] {+any quantifiable characteristic+} of [-rules that-] {+the media stream (its source, volume, codecs, speaking/silence, etc.), and they+} can be [-encapsulated in a conference policy. However, there does exist a protocol means-] {+based on internal or external variables accessible+} by [-which a client can request a change in-] the [-conference-] {+media+} policy. [-This-] {+Some examples of media policies include: o The video output+} is [-done by communicating with the conference policy server, which manipulates the conference policy. By-] the [-nature of conference policies, not all aspects-] {+picture+} of the [-policy can-] {+loudest speaker (video follows audio). o The audio from each participant will+} be [-manipulated-] {+mixed+} with [-the conference policy control protocol. It is the responsibility of the conference policy server-] {+equal weight, and distributed+} to [-reconcile the various requests with the conference policy. J. Rosenberg [Page 16] Internet Draft Conferencing Framework October 28, 2002 5.8 Media Policy-] {+all other participants. o+} The [-media policy contains the rules-] {+audio and video+} that [-guide-] {+is distributed is+} the [-operation of-] {+one selected by+} the [-mixer. The focus uses these rules to-] {+floor control server. 5 Common Operations There are a large number of ways in which users can+} interact with [-the mixer to implement them. These rules can be simple (mix all media from all participants), or they-] {+a conference. They+} can [-be incredibly complex. It is important to understand that there-] {+join, leave, set policies, approve members, and so on. This section+} is [-no restriction on-] {+meant as an overview of+} the [-type-] {+major conferencing operations, summarizing how they operate. More detailed examples+} of [-rules that-] {+the SIP mechanisms+} can be [-encapsulated-] {+found in [7]. 5.1 Creating Conferences There are many ways+} in [-a media policy. However, there does exist a protocol means by-] which a [-client-] {+conference+} can [-request-] {+be created. The creation of+} a [-change-] {+conference actually constructs several elements all at J. Rosenberg [Page 16] Internet Draft Conferencing Framework February 12, 2003 the same time. It results+} in the [-media-] {+creation of a focus and a conference+} policy. [-This is done by communicating with-] {+It also results in+} the [-media policy server,-] {+construction of a conference URI,+} which [-manipulates-] {+uniquely identifies+} the [-media policy. By-] {+focus. Since+} the [-nature of media policies, not all aspects of-] {+conference URI needs to be unique,+} the [-policy-] {+element which creates conferences is responsible for guaranteeing that uniqueness. This+} can be [-manipulated-] {+accomplished deterministically, by keeping records of conference URIs, or probabilistically, by creating random URI+} with [-the-] {+sufficiently low probabilities of collision. When a+} media {+and conference+} policy [-control protocol. It is-] {+are created, they are established with default rules that are implementation dependent. If+} the [-responsibility-] {+creator+} of the [-media policy server-] {+conference wishes+} to [-reconcile the various requests with-] {+change those rules, they would do so using+} the [-media policy. 6 Physical Realization In this section, we present several physical instantiations of these components, to show how these basic functions can be combined to solve a variety of problems. 6.1 Centralized Server In-] {+conference policy control protocol (CPCP), for example. Of course, using+} the [-most simplistic realization of this framework, there is a single physical server in-] {+CPCP requires that an element know+} the [-network which implements-] {+URI for manipulating+} the [-focus,-] {+policy. That requires a means to learn+} the conference policy [-server, the media policy server, and-] {+URI from+} the [-mixer. This is-] {+conference URI, since+} the [-classic "one box" solution, shown in Figure 3. 6.2 Endpoint Server Another important model-] {+conference URI+} is [-that of a locally-mixed ad-hoc conference. In this scenario, two users (A and B) are in a regular point-to-point call. One of-] {+frequently+} the [-participants (A) decides-] {+sole result returned+} to [-conference in a third participant, C. To do this, A begins acting-] {+the client+} as a [-focus. Its existing dialog-] {+result of conference creation. Any other URIs associated+} with [-B becomes-] the [-first dialog attached to-] {+conference are learned through+} the [-focus. B would re-INVITE A on that dialog, changing its Contact URI to a new value which identifies-] {+conference notification service. They are carried as elements in+} the [-focus. In essence, A "mutates" from a single- user UA-] {+notifications. 5.1.1 SIP Mechanisms One way+} to {+create+} a [-focus plus-] {+conference is through+} a [-single user UA, and in the process of such-] {+conferencing application. As an example,+} a [-mutation, its URI changes. Then, the focus makes-] {+user can send+} an [-outbound-] INVITE {+request+} to [-C. When C accepts, it mixes-] {+sip:conferences@service.com. This URI identifies an IVR application which interacts with+} the [-media from A and C together, redistributing-] {+user, collects information about+} the [-results.-] {+desired conference, and creates it.+} The [-mixed media is also played locally. Figure 4 shows a diagram-] {+user can then be placed into their newly created conference. Creation+} of [-this transition. It-] {+conferences where the focus resides in an endpoint operates differently. There, the endpoint itself creates the conference URI, and hands it out to other endpoints which are to be the participants. What differs from case to case+} is {+how the endpoint decides to create a conference. One+} important {+case is the ad-hoc conference described in Section 6.2. There, an endpoint unilaterally decides+} to [-note-] {+create the conference based on local policy. The dialogs+} that {+were connected to+} the [-external interfaces-] {+UA are migrated to the endpoint-hosted focus, using a re-INVITE to pass the conference URI to the newly joined participants. Alternatively, one UA can ask another UA to create an endpoint-hosted conference. This is accomplished with the SIP Join header [10]. The UA which receives the Join header+} in [-this model,-] {+an invitation may need to create a new conference URI (a new one is not needed if the dialog that is being joined is already part of a conference). The conference URI is+} J. Rosenberg [Page 17] Internet Draft Conferencing Framework [-October 28, 2002 Conference Server ................................... . . . +------+ +------------+ . . |Media | | Conference | . . |Policy| |Notification| . . |Server| | Server | . . +------+ +------------+ . . +----------+ . . |Conference| . . | Policy | +-------+ +-----+ . . | Server | | Focus | |Mixer| . . +----------+ +-------+ +-----+ . ................//.\.......--./.... // \ ---- / // -\- /RTP SIP // ---- \ / // --- \SIP / // ---- RTP \ / / -- \ / +-----------+ +-----------+ |Participant| |Participant| +-----------+ +-----------+ Figure 3: Centralized-] {+February 12, 2003 then handed to the recently joined participants through a re-INVITE. 5.1.2 CPCP Mechanisms Another way to create a conference is through interaction with the conference policy server. Using the conference policy control protocol, a client can instruct the conference policy+} server [-architecture between A-] {+to create a new conference+} and [-B,-] {+return the conference URI+} and [-between B-] {+conference policy URI. 5.1.3 Non-Automated Mechanisms Of course, a user can also create conferences by interacting with a web server. The web server would prompt the user for the neccessary information (start+} and [-C, are exactly-] {+stop times of+} the [-same-] {+conference, participants, etc.) and return the conference URI+} to [-those that-] {+the user. The user+} would [-be used-] {+copy this URI into their SIP phone, and send it an INVITE+} in {+order to join the newly-created conference. 5.2 Adding Participants There are many mechanisms for adding participants to+} a [-centralized server model. B could also-] {+conference. These+} include [-a media-] {+SIP, the conference+} policy [-server-] {+control protocol,+} and {+non- automated means. In all cases, participant additions can be first party (a user adds themself) or third party (a user adds another user). 5.2.1 SIP Mechanisms First person additions using SIP are trivially accomplished with a standard INVITE. A participant can send an INVITE request to the+} conference [-subscription server too, allowing-] {+URI, and if+} the [-participants to have access to-] {+conference policy allows+} them [-if-] {+to join,+} they [-so desired. Just because-] {+are added to+} the [-focus is co-resident with-] {+conference. If+} a [-participant-] {+UA+} does not [-mean any aspect of-] {+know+} the [-behaviors and external interfaces will change. 6.3 Media Server Component-] {+conference URI, but has learned about a dialog which is connected to a conference (by using the dialog event package, for example [11]), the UA can join the conference by using the Join header to join the dialog. Third party additions with SIP are done using REFER [12]. The client can send a REFER request to the participant, asking them to send an INVITE request to the conference URI. Additionally, the client can send a REFER request to the focus, asking it to send an INVITE to the participant. The latter technique has the benefit of allowing a client to add a conference-unaware participant that does not support the REFER method. 5.2.2 CPCP Mechanisms+} J. Rosenberg [Page 18] Internet Draft Conferencing Framework [-October 28, 2002 B B +------+ +------+ | | | | | UA | | UA | | | | | +------+ +------+ | . | . | . | . | . | . | . Transition | . | . ------------> | . SIP| .RTP SIP| .RTP | . | . | . | . | . | . | . | . | . +----------+ +------+ | +------+ | SIP +------+ | | | |Focus | |----------| | | UA | | |M.Pol.| | | UA | | | | |C.Pol.| |..........| | +------+ | |Mixer | | RTP +------+ | +------+ |-] {+February 12, 2003+} A [-| + | C | + <..|....... | + | . | +------+ | . | |Parti-| | . | |cipant| | . | | | | . | +------+ | . +----------+ . B . . Internal Interface Figure 4: Transition from two-party call to conference J. Rosenberg [Page 19] Internet Draft Conferencing Framework October 28, 2002 +------------+ +------------+ | App Server| SIP |Conf. Cmpnt.| | |-------------| | | Focus | Conf. Proto | Focus | | C.Pol |-------------| M.Pol | | M.Pol | Media Proto | Mixer | |Notification|-------------| | | | | | +------------+ +------------+ | \ .. . | \\ RTP... . | \\ .. . | SIP \\ ... . SIP | \\ ... .RTP | ..\ . | ... \\ . | ... \\ . | .. \\ . | ... \\ . | .. \ . +-----------+ +-----------+ |Participant| |Participant| +-----------+ +-----------+ Figure 5: Media server component model In this model, shown in Figure 5, each conference involves two centralized servers. One-] {+basic function+} of [-these servers, referred-] {+the conference policy control protocol is+} to [-as-] {+add participants. A client of+} the [-"application server" owns and manages-] {+protocol can specify any SIP URI (which may identify themself) that is to be added. If+} the [-conference and media policies, and maintains-] {+URI does not identify+} a [-dialog with each participant. As-] {+user that is already+} a [-result, it represents-] {+participant in the conference,+} the focus [-seen by all participants-] {+will send an INVITE to that URI+} in {+order to add them in. 5.2.3 Non-Automated Mechanisms There are countless non-automated means for asking+} a {+participant to join the+} conference. [-However, this server doesn't provide any media support. To perform-] {+Generally, they involve conveying+} the [-actual media mixing function, it makes use of a second server, called-] {+conference URI to+} the [-"mixing server". This server includes-] {+desired participant, so that they can send an INVITE to it. These mechanisms all require some kind of human interaction. As an example,+} a [-focus, but has no J. Rosenberg [Page 20] Internet Draft Conferencing Framework October 28, 2002 conference policy server or conference notification service. It has a default conference policy, which accepts all invitations from the top-level focus. Its media policy server accepts any controls made by the application server. The focus in-] {+user can send an instant message [13] to+} the [-application server uses-] third [-party call control to connect-] {+party, containing an HTML document which requests+} the [-media streams of each-] user to {+click on+} the [-mixing server, as needed. If-] {+hyperlink to join+} the [-focus in-] {+conference: Hey, would you like to join +} the [-application server receives a media policy control command from-] {+conference now? 5.3 Conditional Joins In many cases,+} a [-client, it delegates that-] {+new participant will not wish+} to {+join+} the [-media server by making-] {+conference unless they can join with a particular set of policies. As an example, a participant may want to join anonymously, so that other participants know that someone has joined, but not who. To accomplish this,+} the [-same media-] {+conference+} policy control [-command-] {+protocol is used to establish these policies prior+} to [-it. This model allows for-] the [-mixing server-] {+generation or acceptance of an invitation+} to [-be used as-] {+the conference. For example, if+} a [-resource for-] {+user wishes to join+} a [-variety of different conferencing applications. This is because it is unaware of any-] conference [-or media policies; it is merely-] {+with+} a [-"slave" to-] {+known conference URI,+} the [-top-level server, doing whatever it asks. This is consistent with-] {+user would obtain+} the [-SIP Application Server Component Model [10]. 6.4 Distributed Mixing In a distributed mixed conference, there is still a centralized server which implements-] {+URI for+} the [-focus,-] conference {+policy, manipulate the+} policy [-server,-] {+to set themself as an anonymous participant,+} and [-media policy server. However, there is no centralized mixer. Rather, there is a mixer in each endpoint, along with a media policy server. The focus distributes-] {+then actually join+} the [-media-] {+conference+} by [-using third party call control [11]-] {+sending an INVITE request+} to [-move a media stream between each participant and each other participant.-] {+the conference URI. 5.4 Removing Participants+} As [-a result, if-] {+with additions,+} there are [-N participants in the conference, there will-] {+several mechanisms for departures. These include SIP mechanisms and CPCP mechanisms. Removals can also+} be {+first person or third person. 5.4.1 SIP Mechanisms First person departures are trivially accomplished by sending+} a [-single-] {+BYE J. Rosenberg [Page 19] Internet Draft Conferencing Framework February 12, 2003 request to the focus. This terminates the+} dialog [-between each participant-] {+with the focus+} and {+removes+} the [-focus, but-] {+participant from+} the [-session description associated with that dialog will-] {+conference. Third person departures can also+} be [-constructed-] {+done using SIP, through the REFER method. 5.4.2 CPCP Mechanisms The CPCP can be used by a client+} to [-allow media-] {+remove any participant (including themself). When CPCP is used for this purpose, the focus will send a BYE request+} to [-be distributed amongst-] the [-participants. This-] {+participant that+} is [-shown in Figure 6. There are several ways-] {+being removed. The focus will execute any other signaling that is needed to remove them (for example, manipulate other dialogs+} in [-which-] {+order to manage+} the {+change in+} media {+streams). The conference policy control protocol+} can {+also+} be [-distributed-] {+used+} to [-each participant for mixing. In a multi-unicast model, each participant sends-] {+remove+} a [-copy-] {+large number+} of [-its media-] {+users. This is generally referred+} to [-each other participant. In this case,-] {+as mass ejection. 5.4.3 Non-Automated Mechanisms As with+} the [-session description manages N-1 media streams. In a multicast model, each participant joins a-] {+other+} common [-multicast group, and each participant sends a single copy of its media stream-] {+conferencing functions, there are many non- automated ways+} to [-that group.-] {+remove a participant.+} The [-underlying multicast infrastructure then distributes-] {+identity of+} the [-media, so that each-] participant [-gets a copy. In-] {+can be entered into+} a [-single-source multicast model (SSM), each participant-] {+web form. When the user clicks submit, the focus+} sends [-its media stream to-] a [-central point, using unicast. The central point then redistributes the media-] {+BYE+} to [-all participants using multicast. The focus is responsible for selecting the modality of media distribution, and for handling any hybrids-] that [-would be necessitated-] {+participant, removing them+} from [-clients with mixed capabilities. When a new participant joins or is added, the focus will perform-] the [-necessary third party call control to distribute-] {+conference. Alternatively,+} the [-media from-] {+conference can expose an IM interface, where+} the [-J. Rosenberg [Page 21] Internet Draft Conferencing Framework October 28, 2002 new participant-] {+user can send an IM+} to [-all-] the [-other participants, and vice-a-versa. The central-] conference [-server also includes a media policy server. Of course,-] {+saying "remove Bob", causing+} the [-central-] conference server [-cannot implement any of the media policies directly. Rather, it would delegate the implementation-] to {+remove Bob. 5.5 Approving Policy Changes OPEN ISSUE: The basic mechanism described here depends on+} the {+actual protocols used for conference and+} media policy [-servers co-resident with a participant. As an example, if a participant decides to switch-] {+manipulation. If+} the [-overall-] {+protocol itself provides change notifications, sip-events may not be needed for that purpose. Thus, this description here is tentative. A+} conference [-mode from "video follows audio" to "tiled video", they would communicate with the central media-] policy [-server. This-] {+for a particular conference may designate one or more users as moderators for some set of+} media policy [-server, in turn, would communicate with-] {+or conference policy change requests. This means that those moderators need to approve+} the [-media-] {+specific+} policy [-servers co- resident-] {+change. Typically, moderators are used to approve member additions and removals. However, the framework allows for moderators to be associated+} with [-each participant,-] {+any policy change that can be made. Moderating a policy request is done+} using {+a combination of+} the [-same media policy control protocol,-] {+conference notification service+} and [-instruct them to use "tiled video".-] {+the CPCP protocol. J. Rosenberg [Page 20] Internet Draft Conferencing Framework February 12, 2003 First, a client makes a policy change.+} This [-model requires additional functionality in user agents, which may or may not-] {+can+} be [-present.-] {+directly, using the CPCP, or indirectly. An indirect policy change request is any non-CPCP action that requires approval.+} The [-participants, therefore, must be able to advertise this capability-] {+simplest example is an INVITE+} to the [-focus. 6.5 Cascaded Mixers In very large conferences, it may not be possible to have-] {+focus from+} a [-single mixer that can handle all-] {+new participant. That represents a request to change the membership+} of the [-media. A solution to this-] {+conference. From a moderation perspective, it+} is {+handled identically+} to [-use cascaded mixers. In this architecture, there is a centralized focus, but-] the [-mixing function is implemented by-] {+case where+} a [-multiplicity of mixers, scattered throughout-] {+client used+} the [-network. Each participant is connected-] {+CPCP+} to [-one, and only one of-] {+request that+} the [-mixers. The focus uses some kind of control protocol (such as MEGACO [9])-] {+same user to be added+} to [-connect-] the [-mixers together, so that all-] {+conference. Part+} of the [-participants can hear each other.-] {+conference policy itself may designate any policy change as moderated.+} This [-architecture is shown in Figure 7. 7 Common Operations There are-] {+means that they change cannot be performed by the client directly. As+} a [-large number-] {+result, any CPCP request will fail, and the failure response informs the client that their request failed due to insufficient authorization. That completes the CPCP transaction. In the case+} of [-ways in which users can interact with-] a [-conference. They can-] {+policy change requested indirectly through some other means, the behavior depends on the mechanism. For example, if a user sends a SIP INVITE request to the conference in order to+} join, [-leave, set policies, approve members,-] and [-so on. This section-] {+that join request+} is [-meant as an overview of-] {+moderated,+} the [-basic primitives, summarizing how they operate. More detailed examples with complete call flows-] {+focus+} can [-be found in [12]. 7.1 Creating Conferences There are many ways in which a conference-] {+reject the INVITE, or it+} can [-be created. Ultimately, all of them-] {+accept it and play music-on-hold until the request is approved. Even though the CPCP transaction failed, it does+} result in {+a change in internal state. Specifically,+} the [-establishment of-] {+requested change shows up as+} a {+"pending" state within the media and+} conference [-URI which identifies-] {+policies. This means that the change has been requested, but has not taken effect. It is almost+} a [-focus. In all cases,-] {+form of change request history. However, because it is+} a [-conference URI must be created by the focus itself, or an element which-] {+state change, it+} is [-responsible for managing URIs-] {+something+} that [-are used by-] {+can result in notifications through+} the [-focus. Otherwise,-] {+conference notification service. Therefore, in order to moderate requests, the moderator subscribes to+} the [-uniqueness of-] conference [-URIs could-] {+policy notification service. Normally, the notifications from the focus do+} not [-be guaranteed. J. Rosenberg [Page 22] Internet Draft Conferencing Framework October 28, 2002 +---------+ |Partcpnt | media | | media ...............| |.................. . | Mixer | . . |M.Pol.Srv| . . +---------+ . . | . . | . . | . . dialog | . . | . . | . . | . . +---------+ . . |Cnf.Srvr.| . . | | . . | Focus | . . |M.Pol.Srv| . . / |C.Pol.Srv| \ . . / +---------+ \ . . / \ . . / \ . . / dialog \ . . / \ . . /dialog \ . . / \ . . / \ . . / \ . . . +---------+ +---------+ |Partcpnt | |Partcpnt | | | | | | | ......................... | | | Mixer | | Mixer | |M.Pol.Srv| media |M.Pol.Srv| +---------+ +---------+ Figure 6: Dialog-] {+reflect pending state changes. That is, the service will not normally send a notification informing a subscriber that a policy change request was made and failed due to lack of authorization. However, notifications to the moderator do reflect these changes. That is because the policy of the focus is to inform moderators, and only moderators, of these changes. Indeed, different users can be moderators for different parts of the conference+} and media [-streams in-] {+policies. For example, one user can be+} a [-distributed mixed-] {+moderator for membership changes, and another, a moderator for whether users can be anonymously joined or not. There are two ways that the focus knows whether a subscriber to the+} conference [-J. Rosenberg [Page 23] Internet Draft Conferencing Framework October 28, 2002 +---------+ +-----------------------| |------------------------+ | ++++++++++++++++++++| |++++++++++++++++++ | | + +------| Focus |---------+ + | | + | | | | + | | + | +-| |--+ | + | | + | | +---------+ | | + | | + | | + | | + | | + | | + | | + | | + | | + | | + | | + | | +---------+ | | + | | + | | | | | | + | | + | | | Mixer 2 | | | + | | + | | | | | | + | | + | | +---------+ | | + | | + | |... . .... | | + | | + .|....| . .|.... | + | | + ...... | | . | ..|... + | | + ... | | . | | ....+ | | +---------+ | | +---------+ | | +---------+ | | | | | | | | | | | | | | | Mixer 2 | | | | Mixer 3 | | | | Mixer 4 | | | | | | | | | | | | | | | +---------+ | | +---------+ | | +---------+ | | . . | | . . | | . . | | . . | | .. . | | .. . | | . . | | . . | | . .-] {+notification service is a moderator. The first is configured policy (once again through CPCP). That policy can specify that a particular user is the moderator for a particular piece of policy. Therefore, if that user subscribes to the conference notification service, any notification sent to that user will include J. Rosenberg [Page 21] Internet Draft Conferencing Framework February 12, 2003 pending changes to that piece of policy. As an alternative, a SUBSCRIBE request from a user can include a filter [14] that requests receipt of these pending state changes. If the conference policy allows, that request is honored, and the subscriber will receive notifications about pending state changes. Once the moderator receives a notification about the pending state change, they use the CPCP to implement their decision. If the moderator decides to approve the change, they use the CPCP or MPCP to actually perform the change themselves. Since the moderator for a piece of policy is allowed to change that piece of policy, by definition, their change is accepted and performed. If the moderator decides to reject the change, they use the CPCP to remove the pending state from the database. The pending state persists in the database for a period of time which is, itself, part of the conference policy. If the moderator does not either approve or reject the change, the pending state eventually disappears, as if the change was explicitly rejected. If the pending state is approved, a real change to the conference or media policy takes place, and this change will be reflected in the conference notification service. In this way, if a client makes a policy change, and their request is rejected because they are not authorized, the client can subscribe to the conference notification service to learn if their change is eventually approved or rejected. This general mechanism for moderating policy requests is consistent with the moderation of presence subscriptions [15] [16]. 5.6 Creating Sidebars A sidebar is a "conference within a conference", allowing a subset of the participants to converse amongst themselves. Frequently, participants in a sidebar will still receive media from the main conference, but "in the background". For audio, this may mean that the volume of the media is reduced, for example. A sidebar is represented by a separate conference URI. This URI is a type of "alias" for the main conference URI. Both route to the same focus. Like any other conference, the sidebar conference URI has a conference policy and a media policy associated with it. Like any other conference, one can join it by sending an INVITE to this URI, or ask others to join by referring them to it. However, it differs from a normal conference URI in several ways. First, users in the main conference do not need to establish a separate dialog to the sidebar conference. The focus recognizes the sidebar as a special URI, and knows to use the existing dialog to the main conference as a J. Rosenberg [Page 22] Internet Draft Conferencing Framework February 12, 2003 "virtual" connection to the sidebar URI. The second difference is the way in which conference and media policies are implemented. If the conference policy control protocol is used to add a user to a normal conference, the focus will typically send an INVITE to the participant to ask them to join. For a sidebar conference, it is done differently. If the conference policy control protocol is used to add a user to it, and that user is already part of the main conference, the focus will use the conference notification service to alert the existing participant that they have been asked to join the sidebar. The invited user can then make use of the CPCP to formally add themselves to the sidebar. 5.7 Destroying Conferences Conferences can be destroyed in several ways. Generally, whether those means are applicable for any particular conference is a component of the conference policy. When a conference is destroyed, the conference and media policies associated with it are destroyed. Any attempts to read or write those policies results in a protocol error. Furthermore, the conference URI becomes invalid. Any attempts to send an INVITE to it, or SUBSCRIBE to it, would result in a SIP error response. Typically, if a conference is destroyed while there are still participants, the focus would send a BYE to those participants before actually destroying the conference. Similarly, if there were any users subscribed to the conference notification service, those subscriptions would be terminated by the server before the actual destruction. 5.7.1 SIP Mechanisms There is no explicit means in SIP to destroy a conference. However, a conference may be destroyed as a by-product of a user leaving the conference, which can be done with BYE. In particular, if the conference policy states that the conference is destroyed once the last user leaves, when that user does leave (using a SIP BYE request), the conference is destroyed. 5.7.2 CPCP Mechanisms The CPCP contains mechanisms for explicitly destroying a conference. 5.7.3 Non-Automated Mechanisms As with conference creation, a conference can be destroyed by J. Rosenberg [Page 23] Internet Draft Conferencing Framework February 12, 2003 interacting with a web application or voice application that prompts the user for the conference to be destroyed. 5.8 Obtaining Membership A participant in a conference will frequently wish to know the set of other users in the conference. This information can be obtained many ways. 5.8.1 SIP Mechanisms The conference notification service allows a conference aware participant to subscribe to it, and receive notifications that contain the list of participants. When a new participant joins or leaves, subscribers are notified. The conference notification service also allows a user to do a "fetch" [4] to obtain the current listing. 5.8.2 CPCP Mechanisms The CPCP contains mechanisms for querying for the current set of conference participants. 5.8.3 Non-Automated Mechanisms Users can also interact with applications to obtain conference membership. There may be a conference web page associated with the conference, which has a link that will fetch the current list of participants and display them in the browser. Similarly, an interactive voice response application connected to the focus can be used to obtain the current membership. A user in the conference could press the pound key on their phone, and hear a listing of the current participants. 5.9 Adding and Removing Media Each conference is composed of a particular set of media that the focus is managing. For example, a conference might contain a video stream and an audio stream. The set of media streams that constitute the conference can be changed by participants. When the set of media in the conference change, the focus will need to generate a re-INVITE to each participant in order to add or remove the media stream to each participant. When a media stream is being added, a participant can reject the offered media stream, in which case it will not receive or contribute to that stream. Rejection of a stream by a participant does not imply that that the stream is no longer part of the conference - just that the participant is not involved in it. There are several ways in which a media stream can be added or J. Rosenberg [Page 24] Internet Draft Conferencing Framework February 12, 2003 removed from a conference. 5.9.1 SIP Mechanisms A SIP re-INVITE can be used by a participant to add or remove a media stream. This is accomplished using the standard offer/answer techniques for adding media streams to a session [17]. This will trigger the focus to generate its own re-INVITEs. 5.9.2 CPCP Mechanisms The CPCP can be used to add or remove a media stream. This too will trigger the focus to generate a re-INVITE to each participant in order to affect the change. 5.9.3 Non-Automated Mechanisms As with most of the other common functions, addition and removal of media streams can be accomplished with a web application or interactive voice application. 5.10 Conference Announcements and Recordings Conference announcements and recordings play a key role in many real conferencing systems. Examples of such features include: 1. Asking a user to state their name before joining the conference, in order to support a roll call 2. Allowing a user to request a roll call, so they can hear who else is in the conference 3. Allowing a user to press some keys on their keypad in order to record the conference 4. Allowing a user to press some keys on their keypad in order to be connected with a human operator 5. Allowing a user to press some keys on their keypad to mute or unmute their line In this framework, these capabilities are modeled as an application which acts as a participant in the conference. This is shown pictorially in Figure 3. The conference has four participants. Three of these participants are end users, and the fourth is the announcement application. J. Rosenberg [Page 25] Internet Draft Conferencing Framework February 12, 2003 User 1 +-----------++} | [-+---------+ .-] | [-+---------+ .-] | [-+---------+ .-] | {+|Participant|+} | [-Prtcpnt-] {+4+} | [-.-] | | [-Prtcpnt-] {++-----------+ |SIP |Dialog Conference |1 Policy +---|--------+ User 2 Server+} | [-.-] | | [-Prtcpnt-] {+Application +-----------+ +-----------++} | [-.-] {+CPCP *************+} | | [-1-] | [-.-] | {+|-------- * *+} | [-1-] | [-.-] | | [-1-] | [-.-] {+* * |Participant|-----------| Focus |------------*Participant*+} | [-+---------+ .-] {+1+} | [-+---------+ .-] {+SIP+} | [-+---------+ .-] | [-.-] | [-.-] {+SIP * 3 *+} | [-.-] | [-+---------+ +---------+ +---------+-] {+Dialog+} | [-Prtcpnt-] {+|--+ Dialog * * +-----------+ 2 +-----------+ 4 *************+} | | [-Prtcpnt-] {+|SIP |Dialog |3+} | {++-----------++} | [-Prtcpnt-] | | [-1-] | {+|Participant|+} | [-1-] {+2+} | | [-1-] | [-+---------+ +---------+ +---------+ ------- SIP Dialog ....... Media Flow +++++++ Control Protocol-] {++-----------+ User 3 Figure 3: Conference announcement application+} J. Rosenberg [Page [-24]-] {+26]+} Internet Draft Conferencing Framework [-October 28, 2002 Figure 7: Cascaded Mixers protocol,-] {+February 12, 2003 If the announcement application wishes to play an announcement to all the conference members (for example, to announce+} a [-client-] {+join), it merely sends media to the mixer as would any other participant. The announcement is mixed in with the conversation and played to the participants. Similarly, the announcement application can play an announcement to a specific user by using the CPCP to configure its media policy so that the media it generates is only heard by the target user. The application then generates the desired announcement, and it will be heard only by the selected recipient. The announcement application can also receive input from a specific user through the conference. The announcement application would use the CPCP to cause in-band DTMF to be dropped from the mix, and sent only to itself. When a user wishes to invoke an operation, such as to obtain a roll call, the user would press the appropriate key sequence. That sequence would be heard only by the announcement application. Once the application determines that the user wishes to hear a roll call, it can use the CPCP to set the media policy so that media from that user is delivered only to the announcement application. This "disconnects" the user from the rest of the conference so they+} can [-instruct-] {+interact with+} the [-conference policy server-] {+application. Once the interaction is done, and announcement application uses the CPCP+} to [-create a new-] {+"reconnect" the user to the+} conference. [-The result of this operation-] {+5.11 Floor Control Floor control+} is {+similar to+} a conference [-URI, which-] {+announcement application. Within this framework, floor control+} is [-returned-] {+managed by an application (possibly one that is not a participant) that uses the CPCP+} to {+enforce+} the [-client. Another way-] {+resulting floor control decisions. [[Need more work here]] 5.12 Camera and Video Controls OPEN ISSUE: Originally, I was just going+} to [-obtain a conference URI-] {+say that this+} is [-to literally guess. In an instant conferencing server, there are literally an infinite number of conference URIs which can be used. Each-] {+outside the scope+} of [-them is a valid conference URI, since-] {+conferencing. But,+} it [-identifies-] {+does impact conferencing. Effectively, camera control is treated like+} a [-focus,-] {+media stream. The mixer would combine the various requests across participants+} and [-when an INVITE is sent-] {+direct them+} to [-it, will join-] the [-user into-] {+appropriate device. How does+} that [-conference. As a result,-] {+work though? In+} a [-client can simply choose one of them at random, so long as it is configured-] {+video conference+} with {+4 participants,+} the [-domain portion of the URI and any naming conventions in use by-] {+camera control needs to identify+} the [-instant conferencing server. OPEN ISSUE: Do-] {+specific user whose camera is to be controlled. That is something unique to conferencing. J. Rosenberg [Page 27] Internet Draft Conferencing Framework February 12, 2003 6 Physical Realization In this section,+} we [-need-] {+present several physical instantiations of these components,+} to [-specify standards for this? The previous two approaches are used-] {+show how these basic functions can be combined+} to [-obtain conference URIs for focuses that are hosted within centralized servers. Creation-] {+solve a variety+} of [-conferences where-] {+problems. 6.1 Centralized Server In+} the [-focus resides in an endpoint operates differently. There,-] {+most simplistic realization of this framework, there is a single physical server in+} the [-endpoint itself creates-] {+network which implements the focus,+} the conference [-URI,-] {+policy server,+} and [-hands it out to other endpoints which are to be-] the [-participants. What differs from case to case-] {+mixers. This+} is [-how-] the [-endpoint decides to create a conference. One-] {+classic "one box" solution, shown in Figure 4. 6.2 Endpoint Server Another+} important [-case-] {+model+} is [-the-] {+that of a locally-mixed+} ad-hoc [-conference described-] {+conference. In this scenario, two users (A and B) are+} in [-Section 6.2. There, an endpoint unilaterally-] {+a regular point-to-point call. One of the participants (A)+} decides to [-create the-] conference [-based on local policy. The dialogs that were connected to-] {+in a third participant, C. To do this, A begins acting as a focus. Its existing dialog with B becomes+} the [-UA are migrated-] {+first dialog attached+} to the [-endpoint-hosted focus, using a-] {+focus. A would+} re-INVITE [-to pass the conference-] {+B on that dialog, changing its Contact+} URI to {+a new value which identifies+} the [-newly joined participants. Alternatively, one UA can ask another-] {+focus. In essence, A "mutates" from a single- user+} UA to [-create an endpoint-hosted conference. This is accomplished with-] {+a focus plus a single user UA, and in+} the [-SIP Join header [13]. The UA which receives-] {+process of such a mutation, its URI changes. Then,+} the [-Join header in-] {+focus makes+} an [-invitation may need-] {+outbound INVITE+} to [-create a new conference URI (a new one is not needed if-] {+C. When C accepts, it mixes+} the [-dialog that is being joined-] {+media from B and C together, redistributing the results. The mixed media+} is [-already part of-] {+also played locally. Figure 5 shows+} a [-conference). The conference URI-] {+diagram of this transition. It+} is [-then handed-] {+important+} to {+note that+} the [-recently joined participants through a re-INVITE. 7.2 Adding Participants There-] {+external interfaces in this model, between A and B, and between B and C,+} are [-two modes for adding participants-] {+exactly the same+} to {+those that would be used in a centralized server model. B could also include+} a conference [-- first party additions,-] {+policy server+} and [-third party additions. In a first party addition,-] {+conference notification service, allowing+} the [-participant that wishes-] {+participants+} to [-join makes a direct attempt-] {+have access+} to [-join. In-] {+them if they so desired. Just because the focus is co-resident with+} a [-third party addition, some other-] participant [-takes action with-] {+does not mean any aspect of+} the [-aim-] {+behaviors and external interfaces will change. 6.3 Media Server Component In this model, shown in Figure 6, each conference involves two centralized servers. One+} of [-causing a third party to be added-] {+these servers, referred+} to {+as the "application server" owns and manages the membership and media policies, and maintains a dialog with each participant. As a result, it represents+} the {+focus seen by all participants in a+} conference. {+However, this server doesn't provide any media support. To perform+} J. Rosenberg [Page [-25]-] {+28]+} Internet Draft Conferencing Framework [-October 28, 2002 First person additions are trivially accomplished with-] {+February 12, 2003 Conference Server ................................... . . . +------------+ . . | Conference | . . |Notification| . . | Server | . . +------------+ . . +----------+ . . |Conference| +-----+ . . | Policy | +-------+ +-----+| . . | Server | | Focus | |Mixer|+ . . +----------+ +-------+ +-----+ . ................//.\.....***....... // \ *** * // *** * RTP SIP // *** \ * // *** \SIP * // *** RTP \ * / ** \ * +-----------+ +-----------+ |Participant| |Participant| +-----------+ +-----------+ Figure 4: Centralized server architecture the actual media mixing function, it makes use of+} a [-standard INVITE. A participant can send an INVITE request to-] {+second server, called+} the [-conference URI,-] {+"mixing server". This server includes a focus,+} and [-if the conference policy allows them to join, they are added to the conference. If-] a [-UA does not know the-] conference [-URI,-] {+policy server,+} but has [-learned about-] {+no conference notification service. It has+} a [-dialog-] {+default membership policy,+} which [-is connected to a conference (by using the dialog event package, for example [14]), the UA can join-] {+accepts all invitations from+} the {+top-level focus. Its+} conference {+policy server accepts any controls made+} by [-using the Join header to join-] the [-dialog. Third party invitations can be done in one of several ways.-] {+application server.+} The [-first approach is for-] {+focus in+} the [-user-] {+application J. Rosenberg [Page 29] Internet Draft Conferencing Framework February 12, 2003 B B +------+ +------+ | | | | | UA | | UA | | | | | +------+ +------+ | . | . | . | . | . | . | . Transition | . | . ------------> | . SIP| .RTP SIP| .RTP | . | . | . | . | . | . | . | . | . +----------+ +------+ | +------+ | SIP +------+ | | | |Focus | |----------| | | UA | | |C.Pol.| | | UA | | | | |Mixers| |..........| | +------+ | | | | RTP +------+ | +------+ | A | + | C | + <..|....... | + | . | +------+ | . | |Parti-| | . | |cipant| | . | | | | . | +------+ | . +----------+ . A . . Internal Interface Figure 5: Transition from two-party call+} to [-ask the-] {+conference server uses+} third party {+call control+} to [-send an INVITE to the conference URI. This can be done automatically through-] {+connect+} the [-usage-] {+media streams+} of [-REFER [15]. The participant would send a REFER request to the third party. The Refer-To header field in that request would contain the conference URI. There are countless non-automated means for asking a participant to send an INVITE to the conference URI. A user can send an instant message [16] to the third party, containing an HTML document which requests the-] {+each+} user to [-click on the hyperlink to join the conference: Hey, would you like to join the conference now? The second approach for third party additions is for-] the [-participant to ask-] {+mixing server, as needed. If+} the focus [-to add the third party to-] {+in+} the [-conference. In this case, however,-] {+application server receives+} a [-REFER cannot be used. REFER would have the effect of telling the focus to send an INVITE to the new potential participant. However, just sending this INVITE is not sufficient-] {+conference