Voice SDK - Automation Anywhere EKB Docs

VoiceSDK は、自動チャットシステム統合と音声対応アプリケーション構築のための React フックを備えた、高度な音声会話機能を提供します。AI エージェントとの自然な音声インタラクションを、オーディオ可視化、文字起こし、会話管理とともに作成できます。この記事では、すぐに使い始められるだけでなく、動作の仕組みも理解できるクイックスタートの例を紹介します。記事の残りの部分では、使用できる various メソッド、その用途、ベストプラクティスについて詳しく説明します。

インストール

npm install @odin-ai-staging/sdk @elevenlabs/react

クイックスタート

基本的な音声会話

この例では、TypeScript と React の両方のアプリケーションで VoiceSDK を使用してリアルタイム音声会話を作成する方法を学びます。まず、会話を処理する AI 音声エージェントを定義する特定の agentId を含む API 資格情報で VoiceSDK を初期化します。次に、startVoiceConversation() を使用して、接続イベントへの応答、AI メッセージの受信、切断の処理、ユーザーが話している内容のリアルタイム文字起こし（isFinal が完全なフレーズの認識完了を示す）に対応するコールバックハンドラ付きの音声セッションを開始します。saveToChat オプションにより、音声会話をテキストとしてチャット履歴に保存し、後で参照できます。React アプリケーションでは、useVoiceConversation フックを使用します。これは組み込みのステート管理を備えたよりクリーンなインターフェースを提供します。接続状態を追跡する status 変数、会話を制御する startSession() と endSession() メソッド、音声レベルを調整する setVolume()、音声入力の可視化に使用する getInputByteFrequencyData()（波形表示の作成に最適）、現在の音量レベルなどのリアルタイム情報を含む conversationState が提供されます。これにより、リアルタイムの音声認識と合成を備えた音声対応 AI アプリケーションの構築に必要なすべてが揃います。ボイスアシスタント、ハンズフリーインターフェース、対話型 AI エクスペリエンスの作成に最適です。React フックが複雑なステート管理と WebSocket 接続をすべて処理してくれます。

import { VoiceSDK } from '@odin-ai-staging/sdk';

// Initialize the SDK
const voiceSDK = new VoiceSDK({
  baseUrl: 'https://your-api-endpoint.com/',
  projectId: 'your-project-id',
  apiKey: 'your-api-key',
  apiSecret: 'your-api-secret',
  agentId: 'your-agent-id'
});

// Start a voice conversation
async function startVoiceChat() {
  const sessionId = await voiceSDK.startVoiceConversation({
    saveToChat: true,
    callbacks: {
      onConnect: () => console.log('Voice connected'),
      onMessage: (message) => console.log('Voice message:', message),
      onDisconnect: () => console.log('Voice disconnected'),
      onTranscription: (text, isFinal) => {
        if (isFinal) console.log('User said:', text);
      }
    }
  });
  
  console.log('Voice session started:', sessionId);
}

React フックの使用方法

import { useVoiceConversation } from '@odin-ai-staging/sdk';

function VoiceChat() {
  const {
    status,
    startSession,
    endSession,
    setVolume,
    getInputByteFrequencyData,
    conversationState
  } = useVoiceConversation({
    sdkConfig: {
      baseUrl: 'https://your-api-endpoint.com/',
      projectId: 'your-project-id',
      agentId: 'your-agent-id'
    },
    callbacks: {
      onConnect: () => console.log('Connected!'),
      onMessage: (message) => console.log('Message:', message)
    }
  });

  return (
    <div>
      <button 
        onClick={() => startSession()}
        disabled={status === 'connected'}
      >
        Start Voice Chat
      </button>
      
      <button 
        onClick={() => endSession()}
        disabled={status !== 'connected'}
      >
        End Chat
      </button>
      
      <div>Status: {status}</div>
      <div>Volume: {conversationState.volume}</div>
    </div>
  );
}

設定

VoiceSDKConfig インターフェース

interface VoiceSDKConfig extends BaseClientConfig {
  agentId?: string;            // Default agent ID for conversations
  defaultVoiceSettings?: VoiceSettings;  // Default voice configuration
}

VoiceSettings

interface VoiceSettings {
  stability?: number;          // Voice stability (0.0 to 1.0)
  similarityBoost?: number;    // Voice similarity boost (0.0 to 1.0)
  style?: number;             // Voice style (0.0 to 1.0)
  useSpeakerBoost?: boolean;  // Enable speaker boost
}

設定例：

const voiceSDK = new VoiceSDK({
  baseUrl: 'https://api.example.com/',
  projectId: 'proj_123',
  apiKey: 'your-api-key',
  apiSecret: 'your-api-secret',
  agentId: 'agent_456',
  defaultVoiceSettings: {
    stability: 0.8,
    similarityBoost: 0.7,
    style: 0.3,
    useSpeakerBoost: true
  }
});

コア機能

音声会話セッション

VoiceSDK は自動チャット統合を備えた音声会話セッションを管理します：

interface VoiceConversationSession {
  id: string;                    // Session identifier
  chatId?: string;               // Associated chat ID
  startTime: number;             // Session start timestamp
  endTime?: number;              // Session end timestamp
  messages: VoiceMessage[];      // Voice messages in session
  metadata?: {
    agentId?: string;
    voiceSettings?: VoiceSettings;
    totalDuration?: number;
    userInfo?: { name: string; id: string };
  };
}

音声メッセージ

interface VoiceMessage {
  id: string;                    // Message ID
  type: 'user_speech' | 'ai_speech' | 'system';
  text: string;                  // Transcribed/generated text
  audioUrl?: string;             // Audio file URL
  timestamp: number;             // Message timestamp
  duration?: number;             // Audio duration in seconds
  voiceSettings?: VoiceSettings; // Voice settings used
  saved?: boolean;               // Whether saved to database
}

セッション管理

`startVoiceConversation(options?)`

新しい音声会話セッションを開始します。

async startVoiceConversation(
  options?: StartVoiceConversationOptions
): Promise<string>

StartVoiceConversationOptions：

interface StartVoiceConversationOptions {
  callbacks?: VoiceConversationCallbacks;
  saveToChat?: boolean;          // Auto-save to chat history
  existingChatId?: string;       // Continue existing chat
  agentId?: string;             // Override default agent
  voiceSettings?: VoiceSettings; // Custom voice settings
  userInfo?: { name: string; id: string };
}

例：

const sessionId = await voiceSDK.startVoiceConversation({
  saveToChat: true,
  existingChatId: 'chat_123',
  voiceSettings: {
    stability: 0.9,
    similarityBoost: 0.8
  },
  userInfo: {
    name: 'John Doe',
    id: 'user_456'
  },
  callbacks: {
    onConnect: () => console.log('Voice conversation started'),
    onMessage: (message) => handleVoiceMessage(message),
    onTranscription: (text, isFinal) => {
      if (isFinal) displayTranscription(text);
    },
    onConversationSaved: (chatId, messageId) => {
      console.log(`Conversation saved to chat ${chatId}`);
    }
  }
});

`endVoiceSession(sessionId, reason?)`

音声会話セッションを終了します。

async endVoiceSession(sessionId: string, reason?: string): Promise<void>

例：

await voiceSDK.endVoiceSession(sessionId, 'User ended conversation');

`getVoiceState(sessionId)`

現在の音声会話の状態を取得します。

getVoiceState(sessionId: string): VoiceConversationState | null

例：

const state = voiceSDK.getVoiceState(sessionId);
if (state) {
  console.log('Connection status:', state.connectionStatus);
  console.log('Is speaking:', state.isSpeaking);
  console.log('Volume:', state.volume);
}

React 統合

useVoiceConversation フック

useVoiceConversation フックは、ステート管理を備えた React 統合を提供します：

function useVoiceConversation(options: VoiceHookOptions): {
  // Hook properties
  status: VoiceStatus;
  isSpeaking: boolean;
  startSession: (config?: VoiceSessionConfig) => Promise<string>;
  endSession: () => Promise<void>;
  setVolume: (options: { volume: number }) => void;
  
  // Enhanced SDK properties
  conversationState: VoiceConversationState;
  currentSessionId: string | null;
  getInputByteFrequencyData: () => Uint8Array | null;
  getOutputByteFrequencyData: () => Uint8Array | null;
}

完全な React の例：

import React, { useState } from 'react';
import { useVoiceConversation } from '@odin-ai-staging/sdk';

function VoiceConversationComponent() {
  const [messages, setMessages] = useState<string[]>([]);
  const [isRecording, setIsRecording] = useState(false);

  const {
    status,
    isSpeaking,
    startSession,
    endSession,
    setVolume,
    conversationState,
    currentSessionId,
    getInputByteFrequencyData
  } = useVoiceConversation({
    sdkConfig: {
      baseUrl: process.env.REACT_APP_API_BASE_URL,
      projectId: process.env.REACT_APP_PROJECT_ID,
      agentId: process.env.REACT_APP_AGENT_ID
    },
    callbacks: {
      onConnect: () => {
        console.log('Connected to voice chat');
        setIsRecording(true);
      },
      onDisconnect: () => {
        console.log('Disconnected from voice chat');
        setIsRecording(false);
      },
      onTranscription: (text, isFinal) => {
        if (isFinal) {
          setMessages(prev => [...prev, `You: ${text}`]);
        }
      },
      onMessage: (message) => {
        if (message.type === 'ai_speech') {
          setMessages(prev => [...prev, `AI: ${message.text}`]);
        }
      },
      onError: (error) => {
        console.error('Voice error:', error);
        setIsRecording(false);
      }
    }
  });

  const handleStartConversation = async () => {
    try {
      await startSession({
        saveToChat: true,
        voiceSettings: {
          stability: 0.8,
          similarityBoost: 0.7
        }
      });
    } catch (error) {
      console.error('Failed to start conversation:', error);
    }
  };

  const handleEndConversation = async () => {
    try {
      await endSession();
    } catch (error) {
      console.error('Failed to end conversation:', error);
    }
  };

  const handleVolumeChange = (volume: number) => {
    setVolume({ volume });
  };

  return (
    <div className="voice-conversation">
      <div className="controls">
        <button 
          onClick={handleStartConversation}
          disabled={status === 'connected'}
        >
          Start Voice Chat
        </button>
        
        <button 
          onClick={handleEndConversation}
          disabled={status !== 'connected'}
        >
          End Voice Chat
        </button>
      </div>

      <div className="status">
        <div>Status: {status}</div>
        <div>Speaking: {isSpeaking ? 'Yes' : 'No'}</div>
        <div>Recording: {isRecording ? 'Yes' : 'No'}</div>
        <div>Volume: {conversationState.volume}</div>
      </div>

      <div className="volume-control">
        <label>Volume:</label>
        <input
          type="range"
          min="0"
          max="100"
          value={conversationState.volume}
          onChange={(e) => handleVolumeChange(parseInt(e.target.value))}
        />
      </div>

      <div className="messages">
        {messages.map((message, index) => (
          <div key={index} className="message">
            {message}
          </div>
        ))}
      </div>

      {currentSessionId && (
        <AudioVisualizer 
          getInputData={getInputByteFrequencyData}
          isActive={status === 'connected'}
        />
      )}
    </div>
  );
}

音声コントロール

音量コントロール

// Set volume (0-100)
await voiceSDK.setVolume(sessionId, 75);

マイクコントロール

// Mute/unmute microphone
await voiceSDK.setMicrophoneMuted(sessionId, true);  // Mute
await voiceSDK.setMicrophoneMuted(sessionId, false); // Unmute

音声設定の更新

// Update voice settings during conversation
await voiceSDK.updateVoiceSettings(sessionId, {
  stability: 0.9,
  similarityBoost: 0.8,
  style: 0.4
});

オーディオ可視化

リアルタイムオーディオデータ

// Get audio frequency data for visualization
const audioData = voiceSDK.getAudioFrequencyData(sessionId);

if (audioData) {
  const inputData = audioData.input;   // User's audio input
  const outputData = audioData.output; // AI's audio output
  
  // Use for audio visualization
  renderAudioVisualization(inputData, outputData);
}

オーディオ可視化コンポーネント

import React, { useRef, useEffect } from 'react';

interface AudioVisualizerProps {
  getInputData: () => Uint8Array | null;
  isActive: boolean;
}

function AudioVisualizer({ getInputData, isActive }: AudioVisualizerProps) {
  const canvasRef = useRef<HTMLCanvasElement>(null);

  useEffect(() => {
    if (!isActive) return;

    const canvas = canvasRef.current;
    if (!canvas) return;

    const ctx = canvas.getContext('2d');
    if (!ctx) return;

    const animate = () => {
      const data = getInputData();
      
      if (data) {
        // Clear canvas
        ctx.clearRect(0, 0, canvas.width, canvas.height);
        
        // Draw frequency bars
        const barWidth = canvas.width / data.length;
        
        for (let i = 0; i < data.length; i++) {
          const barHeight = (data[i] / 255) * canvas.height;
          
          ctx.fillStyle = `hsl(${i * 2}, 100%, 50%)`;
          ctx.fillRect(
            i * barWidth,
            canvas.height - barHeight,
            barWidth - 1,
            barHeight
          );
        }
      }
      
      requestAnimationFrame(animate);
    };

    animate();
  }, [isActive, getInputData]);

  return (
    <canvas 
      ref={canvasRef}
      width={400}
      height={100}
      className="audio-visualizer"
    />
  );
}

チャット統合

自動チャット保存

音声会話はチャットシステムに自動保存できます：

const sessionId = await voiceSDK.startVoiceConversation({
  saveToChat: true,  // Enable automatic saving
  existingChatId: 'chat_123',  // Optional: continue existing chat
  callbacks: {
    onConversationSaved: (chatId, messageId) => {
      console.log(`Voice conversation saved to chat ${chatId}`);
      // Update UI to show the saved conversation
      refreshChatHistory(chatId);
    }
  }
});

手動チャット統合

// Get conversation history from voice session
const messages = voiceSDK.getConversationHistory(sessionId);

// Save to chat manually
for (const message of messages) {
  if (message.type === 'user_speech') {
    await chatSDK.sendMessage(message.text, {
      chatId: 'chat_123',
      metadata: {
        voiceMessage: true,
        audioUrl: message.audioUrl,
        sessionId: sessionId
      }
    });
  }
}

コンテキスト更新

音声会話に追加のコンテキストを送信します：

// Send context from chat history
await voiceSDK.sendContextualUpdate(
  sessionId,
  'User previously asked about pricing. Current conversation is about features.'
);

エラーハンドリング

try {
  const sessionId = await voiceSDK.startVoiceConversation({
    callbacks: {
      onError: (error) => {
        console.error('Voice conversation error:', error);
        
        // Handle specific error types
        if (error.message.includes('microphone')) {
          showMicrophonePermissionDialog();
        } else if (error.message.includes('network')) {
          showNetworkErrorMessage();
        }
      },
      onDisconnect: (details) => {
        console.log('Disconnected:', details?.reason);
        
        // Handle different disconnection reasons
        if (details?.reason === 'user_ended') {
          showConversationSummary();
        } else if (details?.reason === 'error') {
          showReconnectOption();
        }
      }
    }
  });
} catch (error) {
  console.error('Failed to start voice conversation:', error);
  
  if (error.message.includes('agent')) {
    showAgentConfigError();
  }
}

例

音声対応カスタマーサポート

import { VoiceSDK, ChatSDK } from '@odin-ai-staging/sdk';

class VoiceCustomerSupport {
  private voiceSDK: VoiceSDK;
  private chatSDK: ChatSDK;
  private activeSession?: string;

  constructor() {
    const config = {
      baseUrl: process.env.API_BASE_URL,
      projectId: process.env.PROJECT_ID,
      apiKey: process.env.API_KEY,
      apiSecret: process.env.API_SECRET,
    };

    this.voiceSDK = new VoiceSDK(config);
    this.chatSDK = new ChatSDK(config);
  }

  async startSupportSession(customerId: string, issueType: string) {
    try {
      // Create a new chat for this support session
      const chat = await this.chatSDK.createChat(
        `Voice Support - ${issueType}`,
        [] // Could add relevant document keys based on issue type
      );

      // Start voice conversation
      this.activeSession = await this.voiceSDK.startVoiceConversation({
        saveToChat: true,
        existingChatId: chat.chat_id,
        agentId: this.getAgentForIssueType(issueType),
        userInfo: {
          name: `Customer ${customerId}`,
          id: customerId
        },
        callbacks: {
          onConnect: () => {
            console.log('Support session started');
            this.logSupportEvent('session_started', { customerId, issueType });
          },
          onTranscription: (text, isFinal) => {
            if (isFinal) {
              this.logSupportEvent('customer_spoke', { 
                customerId, 
                text: text.substring(0, 100) // Log first 100 chars
              });
            }
          },
          onMessage: (message) => {
            if (message.type === 'ai_speech') {
              this.logSupportEvent('agent_responded', {
                customerId,
                responseLength: message.text.length
              });
            }
          },
          onConversationSaved: (chatId, messageId) => {
            console.log(`Support conversation saved to chat ${chatId}`);
          },
          onDisconnect: (details) => {
            this.logSupportEvent('session_ended', {
              customerId,
              reason: details?.reason,
              duration: this.getSessionDuration()
            });
          }
        }
      });

      return {
        sessionId: this.activeSession,
        chatId: chat.chat_id
      };
    } catch (error) {
      console.error('Failed to start support session:', error);
      throw error;
    }
  }

  async endSupportSession() {
    if (this.activeSession) {
      await this.voiceSDK.endVoiceSession(this.activeSession);
      this.activeSession = undefined;
    }
  }

  private getAgentForIssueType(issueType: string): string {
    const agentMap = {
      'technical': 'agent_technical_support',
      'billing': 'agent_billing_support',
      'general': 'agent_general_support'
    };
    return agentMap[issueType] || agentMap['general'];
  }

  private logSupportEvent(event: string, data: any) {
    console.log(`Support Event: ${event}`, data);
    // Send to your analytics/logging system
  }

  private getSessionDuration(): number {
    // Calculate session duration
    return 0; // Placeholder
  }
}

ベストプラクティス

エラーハンドリングとフォールバック

const voiceSupport = {
  async startWithFallback() {
    try {
      return await this.voiceSDK.startVoiceConversation(options);
    } catch (error) {
      console.warn('Voice failed, falling back to text chat:', error);
      
      // Fallback to text-only chat
      return await this.chatSDK.createChat('Support Chat (Text)');
    }
  }
};

リソース管理

class VoiceManager {
  private activeSessions = new Set<string>();

  async startSession(options: any) {
    const sessionId = await this.voiceSDK.startVoiceConversation(options);
    this.activeSessions.add(sessionId);
    return sessionId;
  }

  async cleanup() {
    // End all active sessions
    for (const sessionId of this.activeSessions) {
      try {
        await this.voiceSDK.endVoiceSession(sessionId);
      } catch (error) {
        console.warn('Failed to end session:', sessionId, error);
      }
    }
    this.activeSessions.clear();
  }
}

パフォーマンスの最適化

// Use React.memo for audio visualization components
const AudioVisualizer = React.memo(({ getInputData, isActive }) => {
  // Throttle animation updates
  const throttledAnimate = useCallback(
    throttle(() => {
      // Animation logic
    }, 16), // ~60fps
    []
  );
  
  // ... component logic
});

アクセシビリティ

function VoiceAccessibleChat() {
  const [transcript, setTranscript] = useState('');
  
  const { startSession } = useVoiceConversation({
    callbacks: {
      onTranscription: (text, isFinal) => {
        setTranscript(text);
        
        // Update screen reader
        if (isFinal) {
          announceToScreenReader(`You said: ${text}`);
        }
      }
    }
  });

  return (
    <div>
      <button 
        aria-label="Start voice conversation"
        onClick={startSession}
      >
        🎤 Start Voice Chat
      </button>
      
      <div 
        aria-live="polite"
        aria-label="Voice transcript"
      >
        {transcript}
      </div>
    </div>
  );
}

​インストール

​クイックスタート

​基本的な音声会話

​React フックの使用方法

​設定

​VoiceSDKConfig インターフェース

​VoiceSettings

​コア機能

​音声会話セッション

​音声メッセージ

​セッション管理

​startVoiceConversation(options?)

​endVoiceSession(sessionId, reason?)

​getVoiceState(sessionId)

​React 統合

​useVoiceConversation フック

​音声コントロール

​音量コントロール

​マイクコントロール

​音声設定の更新

​オーディオ可視化

​リアルタイムオーディオデータ

​オーディオ可視化コンポーネント

​チャット統合

​自動チャット保存

​手動チャット統合

​コンテキスト更新

​エラーハンドリング

​例

​音声対応カスタマーサポート

​ベストプラクティス

​エラーハンドリングとフォールバック

​リソース管理

​パフォーマンスの最適化

​アクセシビリティ