Observatorio de I+D+i UPM

| Otras actividades
HOME

Proyectos Internacionales Art�culos Patentes UPM Software UPM Empresas UPM Otras actividades Memorias de investigaci�n

Memorias de investigación

Ponencias en congresos:

Cooperative off-policy prediction of markov decision processes in adaptive networks

A�o:2013

�reas de investigaci�n

Tecnolog�a electr�nica y de las comunicaciones

Datos

Descripci�n
We apply diffusion strategies to propose a cooperative reinforcement learning algorithm, in which agents in a network communicate with their neighbors to improve predictions about their environment. The algorithm is suitable to learn off-policy even in large state spaces. We provide a mean-square-error performance analysis under constant step-sizes. The gain of cooperation in the form of more stability and less bias and variance in the prediction error, is illustrated in the context of a classical model. We show that the improvement in performance is especially significant when the behavior policy of the agents is different from the target policy under evaluation.
Internacional	Si
Nombre congreso	2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Tipo de participaci�n	960
Lugar del congreso
Revisores	Si
ISBN o ISSN	1520-6149
DOI
Fecha inicio congreso	26/05/2013
Fecha fin congreso	31/05/2013
Desde la p�gina	4539
Hasta la p�gina	4543
T�tulo de las actas	Proceedings of ICASSP

Ver publicaci�n en Archivo digital upm

Esta actividad pertenece a memorias de investigaci�n

Participantes

Autor: Santiago Zazo Bello UPM

Grupos de investigaci�n, Departamentos, Centros e Institutos de I+D+i relacionados

Creador: Grupo de Investigaci�n: Grupo de Aplicaciones del Procesado de Se�al (GAPS)
Departamento: Se�ales, Sistemas y Radiocomunicaciones