eng / pl

Paweł Wawrzyński

J. Łyskawa, P. Wawrzyński,
"Actor-Critic with Variable Time Discretization via Sustained Actions",
Neural Information Processing (ICONIP), pp. 476-489, 2023.

Ł. Neumann, Ł. Lepak, P. Wawrzyński,
"Least Redundant Gated Recurrent Neural Network", International Joint Conference on Neural Networks (IJCNN), pp. 1-10, 2023.

W. Masarczyk, P. Wawrzyński, D. Marczak, K. Deja, T. Trzciński,
"Logarithmic Continual Learning", *IEEE Access,* vol. 10, pp. 117001-117010,
DOI:10.1109/ACCESS.2022.3218907, 2022.

J. Łyskawa, P. Wawrzyński,
"ACERAC: Efficient reinforcement learning in fine time discretization",
*IEEE Transactions on Neural Networks and Learning Systems,* DOI:10.1109/TNNLS.2022.3190973, 2022.

A. Małkowski, J. Grzechociński, P. Wawrzyński,
"Graph autoencoder with constant dimensional latent space",
International Conference on Neural Information Processing (ICONIP), 2022.

G. Rypeść, Ł. E. Lepak, P. Wawrzyński,
"Reinforcement Learning for on-line Sequence Transformation",
Conference on Computer Science and Intelligence Systems (FedCSIS), pp. 133-139, 2022.

K. Deja, P. Wawrzyński, D. Marczak, W. Masarczyk, T. Trzciński,
"Multiband VAE: Latent Space Partitioning for Knowledge Consolidation in Continual Learning",
International Joint Conference on Artificial Intelligence (IJCAI), 2022.

J. Arabas, R. Biedrzycki, K. Budzyńska, J. Chudziak, P. Cichosz, W. Daszczuk,
T. Gambin, P. Gawrysiak, M. Muraszkiewicz, R. Nowak, K. Piczak,
P. Wawrzyński, P. Zawistowski,
*Sztuczna inteligencja dla inżynierów. Metody ogólne*,
Oficyna Wydawnicza Politechniki Warszawskiej, 2022.

P. Wawrzyński,
*Uczące się systemy decyzyjne*, Oficyna Wydawnicza Politechniki Warszawskiej, 2021.

P. Wawrzyński, W. Masarczyk, Mateusz Ostaszewski,
"Reinforcement learning with experience replay and adaptation of action dispersion",
arXiv:2208.00156, 2022.

K. Deja, P. Wawrzyński, D. Marczak, W. Masarczyk, T. Trzciński,
"BinPlay: A Binary Latent Autoencoder for Generative Replay Continual Learning",
International Joint Conference on Neural Networks (IJCNN), 2021.

D. Chen, P. Wawrzynski, Zhihan Lv, "Cyber security in smart cities:
A review of deep learning-based applications and case studies", *Sustainable Cities and Society,* 2021,
vol. 66, pp.1-12.

M.Szulc, J.Łyskawa, P.Wawrzyński, "A Framework for Reinforcement Learning
with Autocorrelated Actions", International Conference on Neural Information Processing (ICONIP),
pp. 90-101, 2020.

P.Wawrzyński, P.Zawistowski, L.Lepak,
"Automatic hyperparameter tuning in on-line learning: Classic Momentum and ADAM",
International Joint Conference on Neural Networks (IJCNN), 2020.

K.Checinski, P.Wawrzyński,
"DCT-Conv: Coding ﬁlters in convolutional networks with Discrete Cosine Transform",
International Joint Conference on Neural Networks (IJCNN), 2020.

P.Wawrzyński,
"Efficient on-line learning with diagonal approximation of loss function Hessian",
International Joint Conference on Neural Networks (IJCNN), 2019.

A.Zanetti, A.Testolin, M.Zorzi, P.Wawrzynski,
"Numerosity Representation in InfoGAN: An Empirical Study",
International Work-Conference on Artificial Neural Networks (IWANN), pp.49-60, 2019.

P.Wawrzyński,
"ASD+M: Automatic parameter tuning in stochastic optimization and on-line learning",
*Neural Networks,* vol. 96, pp. 1-10, 2017.

P.Wawrzyński,
"Robot’s Velocity and Tilt Estimation Through Computationally Efficient Fusion
of Proprioceptive Sensors Readouts",
International Conference on Methods and Models in Automation and Robotics (MMAR), pp. 738-743, 2015.

M.Majczak, P.Wawrzyński,
"Comparison of two efficient control strategies for two-wheeled balancing robot",
International Conference on Methods and Models in Automation and Robotics (MMAR), pp. 744-749, 2015.

P.Wawrzyński,
"Control policy with autocorrelated noise in reinforcement learning for robotics",
*International Journal of Machine Learning and Computing,* Vol. 5, No. 2, pp. 91-95, IACSIT Press, 2015.

P.Wawrzyński, J.Mozaryn, J.Klimaszewski,
"Robust estimation of walking robots velocity and tilt using proprioceptive sensors data fusion",
*Robotics and Autonomous Systems,* Elsevier, Vol. 66, pp. 44-54, 2015.

J.Możaryn, J.Klimaszewski, D.Swieczkowski-Feiz, P.Kolodziejczyk, P.Wawrzyński,
"Design process and experimental verification of the quadruped robot wave gait",
International Conference on Methods and Models in Automation and Robotics (MMAR), pp. 206-211, 2014.

P.Wawrzyński,
"Reinforcement Learning with Experience Replay for Model-Free Humanoid Walking Optimization,"
*International Journal of Humanoid Robotics,*Vol. 11, No. 3, pp. 1450024, 2014.

P.Wawrzyński,
*Podstawy sztucznej inteligencji,*
Oficyna Wydawnicza Politechniki Warszawskiej, 2014 i 2019.

B.Papis, P.Wawrzyński,
"dotRL: A platform for rapid Reinforcement Learning methods development and validation,"
Federated Conference on Computer Science and Information Systems (FedCSIS), pp. 129-136, 2013.

P.Wawrzyński, J.Mozaryn, J.Klimaszewski,
"Robust velocity estimation for legged robot using on-board sensors data fusion,"
International Conference on Methods and Models in Automation and Robotics (MMAR),
August 26-29, 2013, Międzyzdroje, Poland, pp. 717-722, IEEE, 2013.

P.Suszynski, P.Wawrzyński,
"Learning population of spiking neural networks with perturbation of conductances,"
International Joint Conference on Neural Networks (IJCNN), August 4-9, 2013, Dallas TX, USA,
pp. 332-337, IEEE, 2013.

P.Wawrzyński, A.K.Tanwani,
"Autonomous Reinforcement Learning with Experience Replay,"
*Neural Networks,* vol. 41, pp. 156-167, Elsevier, 2013.

P.Wawrzyński,
*Sterowanie Adaptacyjne i Uczenie Maszynowe - preskrypt wykładu,*
Politechnika Warszawska, 2012.

P.Wawrzyński,
"Autonomous Reinforcement Learning with Experience Replay for Humanoid Gait Optimization,"
International Neural Network Society Winter Conference (INNS-WC2012), pp. 205-211, Elsevier, 2012.

P.Wawrzyński, B.Papis,
"Fixed point method for autonomous on-line neural network training,"
*Neurocomputing 74,* pp. 2893-2905, Elsevier, 2011.

P.Wawrzyński,
"Fixed Point Method of Step-size estimation for on-line neural network training,"
International Joint Conference on Neural Networks (IJCNN), July 18-23, 2010, Barcelona, Spain,
IEEE, pp. 2012-2017.

P.Wawrzyński,
*Systemy adaptacyjne i uczące się - preskrypt wykładu,*
Oficyna Wydawnicza Politechniki Warszawskiej, 2009.

P.Wawrzyński,
"Real-Time Reinforcement Learning by Sequential Actor-Critics and Experience Replay,"
*Neural Networks,* vol. 22, pp. 1484-1497, Elsevier, 2009.

P.Wawrzyński,
"A Cat-Like Robot Real-Time Learning to Run," *Lecture Notes in Computer Science 5495,* pp. 380-390,
Springer-Verlag, 2009.

P.Wawrzyński, J. Arabas, P. Cichosz,
"Predictive Control for Artificial Intelligence in Computer Games,"
*Lecture Notes in Artificial Intelligence 5097,* pp. 1137-1148, Springer-Verlag, 2008.

P.Wawrzyński, A.Pacut
"Truncated Importance Sampling for Reinforcement Learning with Experience Replay,"
International Multiconference on Computer Science and Information Technology, pp. 305-315, 2007.

P.Wawrzyński,
"Learning to Control a 6-Degree-of-Freedom Walking Robot,"
EUROCON 2007 The International Conference on Computer as a Tool, pp. 698-705, 2007.

P.Wawrzyński,
"Reinforcement Learning in Fine Time Discretization,"
*Lecture Notes in Computer Science 4431,* pp. 470-479, 2007.

P.Wawrzyński, A.Pacut,
"Balanced Importance Sampling Estimation,"
International Conference on Information Processing and Management of Uncertainty in Knowledge-based Systems (IPMU),
Paris, July 2-7, 2006, pp. 66-73.

P.Wawrzyński,
"Symulacja Płaskich łańcuchów Kinematycznych,"
Raport nr 05-06 Instytutu Automatyki i Informatyki Stosowanej, Listopad 2005.

P.Wawrzyński, A.Pacut,
"Reinforcement Learning in Quasi-Continuous Time,"
International Conference on Computational Intelligence for Modelling, Control and Automation,
November 2005, Vienna, Austria, pp. 1031-1036.

P.Wawrzyński,
"Intensive Reinforcement Learning," Ph.D. dissertation,
Institute of Control and Computation Engineering, Warsaw University of Technology, may 2005.

P.Wawrzyński, A.Pacut,
"Model-free off-policy reinforcement learning in continuous environment,"
International Joint Conference on Neural Networks (IJCNN), Budapest, July 2004, pp. 1091-1096.

P.Wawrzyński, A.Pacut,
"Intensive versus nonintensive actor-critic algorithms of reinforcement learning,"
*Lecture Notes in Artificial Intelligence 3070,* pp. 934-941, Springer-Verlag, 2004.

P.Wawrzyński, A.Pacut,
"A simple actor-critic algorithm for continuous environments,"
International Conference on Methods and Models in Automation and Robotics (MMAR), August 2004, pp. 1143-1149.

P.Wawrzyński, P.Podsiadly, G.Lehmann,
"IOT Methodology of Frequency Assignment in Cellular Network,"
The MOST International Conference, October 2002, pp. 313-324.

P.Wawrzyński, A.Pacut,
"Modeling of distributions with neural approximation of conditional quantiles,"
IASTED International Conference Artificial Intelligence and Applications,
Malaga, Spain, September 2002, pp. 539-543.

J. Łyskawa, P. Wawrzyński,
"Actor-Critic with Variable Time Discretization via Sustained Actions",
Neural Information Processing (ICONIP), pp. 476-489, 2023, preprint.

ABSTRACT:
Reinforcement learning (RL) methods work in discrete time. In order to apply RL to inherently continuous problems like robotic
control, a specific time discretization needs to be defined. This is a choice between sparse time control, which may be easier to
train, and finer time control, which may allow for better ultimate performance. In this work, we propose SusACER, an off-policy RL
algorithm that combines the advantages of different time discretization settings. Initially, it operates with sparse time
discretization and gradually switches to a fine one. We analyze the effects of the changing time discretization in robotic control
environments: Ant, HalfCheetah, Hopper, and Walker2D. In all cases our proposed algorithm outperforms state of the art.

Keywords - reinforcement learning, frame skipping, robotic control.

Ł. Neumann, Ł. Lepak, P. Wawrzyński,
"Least Redundant Gated Recurrent Neural Network",
International Joint Conference on Neural Networks (IJCNN), pp. 1-10, 2023, preprint.

ABSTRACT:
Recurrent neural networks are important tools for sequential data processing. However, they are notorious for
problems regarding their training. Challenges include capturing complex relations between consecutive states
and stability and efficiency of training. In this paper, we introduce a recurrent neural architecture called
Deep Memory Update (DMU). It is based on updating the previous memory state with a deep transformation of
the lagged state and the network input. The architecture is able to learn to transform its internal state
using any nonlinear function. Its training is stable and fast due to relating its learning rate to the size
of the module. Even though DMU is based on standard components, experimental results presented here confirm
that it can compete with and often outperform state-of-the-art architectures such as Long Short-Term Memory,
Gated Recurrent Units, and Recurrent Highway Networks.

Keywords - Recurrent neural networks, universal function approximation.

W. Masarczyk, P. Wawrzyński, D. Marczak, K. Deja, T. Trzciński,
"Logarithmic Continual Learning", *IEEE Access,* vol. 10, pp. 117001-117010,
DOI:10.1109/ACCESS.2022.3218907, 2022.

ABSTRACT:
We introduce a neural network architecture that logarithmically reduces the number of self-rehearsal steps
in the generative rehearsal of continually learned models. In continual learning (CL), training samples come
in subsequent tasks, and the trained model can access only a current task. Contemporary CL methods employ
generative models to replay previous samples and train them recursively with a combination of current and
regenerated past data. This recurrence leads to superfluous computations as the same past samples are
regenerated after each task, and the reconstruction quality successively degrades. In this work, we address
these limitations and propose a new generative rehearsal architecture that requires, at most, a logarithmic
number of retraining sessions for each sample. Our approach leverages the allocation of past data in a set
of generative models such that most of them do not require retraining after a task. The experimental evaluation
of our logarithmic continual learning approach shows the superiority of our method with respect to the
state-of-the-art generative rehearsal methods.

Keywords - Continual learning, generative rehersal.

J. Łyskawa, P. Wawrzyński,
"ACERAC: Efficient reinforcement learning in fine time discretization",
*IEEE Transactions on Neural Networks and Learning Systems,*
DOI:10.1109/TNNLS.2022.3190973, 2022.

ABSTRACT:
We propose a framework for reinforcement learning (RL) in fine time discretization and a learning algorithm
in this framework. One of the main goals of RL is to provide a way for physical machines to learn optimal behavior
instead of being programmed. However, the machines are usually controlled in fine time discretization. The most
common RL methods apply independent random elements to each action, which is not suitable in that setting. It is
not feasible because it causes the controlled system to jerk, and does not ensure sufficient exploration since
a single action is not long enough to create a significant experience that could be translated into policy
improvement. In the RL framework introduced in this paper, policies are considered that produce actions based
on states and random elements autocorrelated in subsequent time instants. The RL algorithm introduced here
approximately optimizes such a policy. The efficiency of this algorithm is verified against three other RL
methods (PPO, SAC, ACER) in four simulated learning control problems (Ant, HalfCheetah, Hopper, and Walker2D)
in diverse time discretization. The algorithm introduced here outperforms the competitors in most cases considered.

Keywords - Reinforcement learning, experience replay, fine time discretization.

A. Małkowski, J. Grzechociński, P. Wawrzyński,
"Graph autoencoder with constant dimensional latent space",
International Conference on Neural Information Processing (ICONIP),
arXiv:2201.12165, 2022.

ABSTRACT:
Invertible transformation of large graphs into constant dimensional vectors (embeddings) remains a challenge.
In this paper we address it with recursive neural networks: The encoder and the decoder. The encoder network
transforms embeddings of subgraphs into embeddings of larger subgraphs, and eventually into the embedding of
the input graph. The decoder does the opposite. The dimension of the embeddings is constant regardless of
the size of the (sub)graphs. Simulation experiments presented in this paper confirm that our proposed graph
autoencoder can handle graphs with even thousands of vertices..

Keywords - Graph neural networks, Autoencoders, Recursive neural networks.

G. Rypeść, Ł. E. Lepak, P. Wawrzyński,
"Reinforcement Learning for on-line Sequence Transformation",
Conference on Computer Science and Intelligence Systems (FedCSIS), pp. 133-139,
pdf, 2022.

ABSTRACT:
A number of problems in the processing of sound and natural language, as well as in other areas, can be reduced
to simultaneously reading an input sequence and writing an output sequence of generally different length. There
are well developed methods that produce the output sequence based on the entirely known input. However, efficient
methods that enable such transformations on-line do not exist. In this paper we introduce an architecture that
learns with reinforcement to make decisions about whether to read a token or write another token. This architecture
is able to transform potentially infinite sequences on-line. In an experimental study we compare it with
state-of-the-art methods for neural machine translation. While it produces slightly worse translations than
Transformer, it outperforms the autoencoder with attention, even though our architecture translates texts
on-line thereby solving a more difficult problem than both reference methods.

Keywords - Reinforcement learning, neural machine translation.

K. Deja, P. Wawrzyński, D. Marczak, W. Masarczyk, T. Trzciński,
"Multiband VAE: Latent Space Partitioning for Knowledge Consolidation in Continual Learning",
International Joint Conference on Artificial Intelligence (IJCAI),
preprint, 2022.

ABSTRACT:
We propose a new method for unsupervised continual knowledge consolidation in generative models that relies
on the partitioning of Variational Autoencoder's latent space. Acquiring knowledge about new data samples
without forgetting previous ones is a critical problem of continual learning. Currently proposed methods
achieve this goal by extending the existing model while constraining its behavior not to degrade on the past
data, which does not exploit the full potential of relations within the entire training dataset. In this work,
we identify this limitation and posit the goal of continual learning as a knowledge accumulation task. We solve
it by continuously re-aligning latent space partitions that we call bands which are representations of samples
seen in different tasks, driven by the similarity of the information they contain. In addition, we introduce
a simple yet effective method for controlled forgetting of past data that improves the quality of reconstructions
encoded in latent bands and a latent space disentanglement technique that improves knowledge consolidation. On top
of the standard continual learning evaluation benchmarks, we evaluate our method on a new knowledge consolidation
scenario and show that the proposed approach outperforms state-of-the-art by up to twofold across all testing
scenarios.

Keywords - Continual learning.

P. Wawrzyński, W. Masarczyk, Mateusz Ostaszewski,
"Reinforcement learning with experience replay and adaptation of action dispersion",
arXiv:2208.00156, 2022.

ABSTRACT:
Effective reinforcement learning requires a proper balance of exploration and exploitation defined by the dispersion
of action distribution. However, this balance depends on the task, the current stage of the learning process, and the
current environment state. Existing methods that designate the action distribution dispersion require problem-dependent
hyperparameters. In this paper, we propose to automatically designate the action distribution dispersion using the
following principle: This distribution should have sufficient dispersion to enable the evaluation of future policies. To
that end, the dispersion should be tuned to assure a sufficiently high probability (densities) of the actions in the
replay buffer and the modes of the distributions that generated them, yet this dispersion should not be higher. This
way, a policy can be effectively evaluated based on the actions in the buffer, but exploratory randomness in actions
decreases when this policy converges. The above principle is verified here on challenging benchmarks Ant, HalfCheetah,
Hopper, and Walker2D, with good results. Our method makes the action standard deviations converge to values similar to
those resulting from trial-and-error optimization.

J. Arabas, R. Biedrzycki, K. Budzyńska, J. Chudziak, P. Cichosz, W. Daszczuk,
T. Gambin, P. Gawrysiak, M. Muraszkiewicz, R. Nowak, K. Piczak,
P. Wawrzyński, P. Zawistowski,
*Sztuczna inteligencja dla inżynierów. Metody ogólne*,
Oficyna Wydawnicza Politechniki Warszawskiej, 2022.

STRESZCZENIE:
Autorami poszczególnych rozdziałów opracowania są wykładowcy Wydziału Elektroniki i Technik Informacyjnych
Politechniki Warszawskiej, którzy aktywnie biorą udział w rozwoju sztucznej inteligencji. Planując zakres
i treść książki, z szerokiej gamy metod i zagadnień autorzy wybrali te, które uważają za szczególnie istotne
i mające zastosowanie w ramach całego obszaru SI – również w obrębie innych, nieomówionych tutaj metod.
W rozdziale pierwszym znajdują się informacje o historii, charakterze i o zastosowaniach sztucznej inteligencji.
Rozdział drugi traktuje o podstawowym (nie tylko dla sztucznej inteligencji) zagadnieniu przeszukiwania przestrzeni
stanów w poszukiwaniu rozwiązań zadanego problemu. Towarzyszy temu omówienie metod optymalizacji, które wskazują
najlepsze rozwiązanie z punktu widzenia przyjętego kryterium. Tematem kolejnego rozdziału jest uczenie maszynowe.
Rozdział czwarty został poświęcony architekturom sztucznych sieci neuronowych, w tym sieciom głębokim.
W rozdziale piątym znajduje się prezentacja i dyskusja dotycząca wzajemnych związków etyki i sztucznej inteligencji,
ze szczególnym naciskiem na konieczność przedstawiania wyników działania systemów SI w sposób zrozumiały dla
człowieka. Każdy rozdział jest opatrzony notą bibliograficzną, która podaje pozycje rozszerzające omówiony materiał.

Książka może służyć jako podręcznik i wsparcie dydaktyczne wykładów z zakresu SI oraz jako materiał referencyjny dla przedstawionych w niej metod i algorytmów.

Książka może służyć jako podręcznik i wsparcie dydaktyczne wykładów z zakresu SI oraz jako materiał referencyjny dla przedstawionych w niej metod i algorytmów.

Keywords - Sztuczna inteligencja.

P. Wawrzyński,
*Uczące się systemy decyzyjne*,
Oficyna Wydawnicza Politechniki Warszawskiej, 2021.

STRESZCZENIE:
Podręcznik omawia podejścia do sekwencyjnego problemu decyzyjnego w warunkach niepewności, w sytuacji, kiedy
agent podejmujący decyzje nie zna z góry środowiska, w którym je podejmuje. A zatem, agent musi nauczyć się
podejmować właściwe decyzje podejmując próby i popełniając błędy. Omówione zostają kluczowe dziedziny wiedzy
dostarczające rozwiązań dla tak postawionego problemu. Najwięcej miejsca zajmuje gałąź uczenia maszynowego
pod nazwą uczenie się ze wzmocnieniem. Wyczerpująco zostają omówione główne wyniki z dziedzin programowania
dynamicznego oraz sterowania adaptacyjnego. Pokrótce omówione są także inne podejścia, jak sterowanie
z iteracyjnym uczeniem się, aproksymowane programowanie dynamiczne, stochastyczne sterowanie adaptacyjne,
a także Filtr Kalmana.

Keywords - uczenie się ze wzmocnieniem, sterowanie adaptacyjne, programowanie dynamiczne.

K. Deja, P. Wawrzyński, D. Marczak, W. Masarczyk, T. Trzciński,
"BinPlay: A Binary Latent Autoencoder for Generative Replay Continual Learning",
International Joint Conference on Neural Networks (IJCNN), 2021.
arXiv:2011.14960

ABSTRACT:
We introduce a binary latent space autoencoder architecture to rehearse training samples for the continual
learning of neural networks. The ability to extend the knowledge of a model with new data without forgetting
previously learned samples is a fundamental requirement in continual learning. Existing solutions address it
by either replaying past data from memory, which is unsustainable with growing training data, or by reconstructing
past samples with generative models that are trained to generalize beyond training data and, hence, miss important
details of individual samples. In this paper, we take the best of both worlds and introduce a novel generative
rehearsal approach called BinPlay. Its main objective is to find a quality-preserving encoding of past samples
into precomputed binary codes living in the autoencoder's binary latent space. Since we parametrize the formula
for precomputing the codes only on the chronological indices of the training samples, the autoencoder is able
to compute the binary embeddings of rehearsed samples on the fly without the need to keep them in memory.
Evaluation on three benchmark datasets shows up to a twofold accuracy improvement of BinPlay versus competing
generative replay methods.

Keywords - Continual Learning, Binary latent space autoencoder.

D. Chen, P. Wawrzynski, Z. Lv,
"Cyber security in smart cities: A review of deep learning-based applications and case studies",
*Sustainable Cities and Society,* 2021, vol. 66, pp.1-12.
DOI:10.1016/j.scs.2020.102655

ABSTRACT:
On the one hand, smart cities have brought about various changes, aiming to revolutionize people's lives.
On the other hand, while smart cities bring better life experiences and great convenience to people's lives,
there are more hidden dangers of cyber security, including information leakage and malicious cyber attacks.
The current cyber security development cannot keep up with the eager adoption of global smart city technologies
so correct design based on deep learning methods is essential to protect smart city cyber. This paper summarizes
the knowledge and interpretation of Smart Cities (SC), Cyber Security (CS), and Deep Learning (DL) concepts
as well as discussed existing related work on IoT security in smart cities. Specifically, we briefly reviewed
several deep learning models, including Boltzmann machines, restricted Boltzmann machines, deep belief networks,
recurrent neural networks, convolutional neural networks, and generative adversarial networks. Then we introduced
cyber security applications and use cases based on deep learning technology in smart cities. Finally, we describe
the future development trend of smart city cyber security.

Keywords - Smart cities, Cyber security, Deep learning, A review.

M.Szulc, J.Łyskawa, P.Wawrzyński,
"Framework for Reinforcement Learning with Autocorrelated Actions",
International Conference on Neural Information Processing (ICONIP), pp. 90-101, 2020.
[preprint]

ABSTRACT:
The subject of this paper is reinforcement learning. Policies are considered here that produce
actions based on states and random elements autocorrelated in subsequent time instants.
Consequently, an agent learns from experiments that are distributed over time and potentially
give better clues to policy improvement. Also, physical implementation of such policies,
e.g. in robotics, is less problematic, as it avoids making robots shake. This is in opposition
to most RL algorithms which add white noise to control causing unwanted shaking of the robots.
An algorithm is introduced here that approximately optimizes the aforementioned policy.
Its efficiency is verified for four simulated learning control problems (Ant, HalfCheetah,
Hopper, and Walker2D) against three other methods (PPO, SAC, ACER). The algorithm outperforms
others in three of these problems.

Keywords - Reinforcement learning, Actor-Critic, Experience replay,
Fine time discretization.

P.Wawrzyński, P.Zawistowski, L.Lepak,
"Automatic hyperparameter tuning in on-line learning: Classic Momentum and ADAM",
International Joint Conference on Neural Networks (IJCNN), 2020.
[pdf]

ABSTRACT: We propose a method that adapts hyperparameters,
namely step-sizes and momentum decay factors, in on-line learning with classic momentum
and ADAM. The approach is based on the estimation of the short- and long-term inﬂuence
of these hyperparameters on the loss value. In the experimental study, our approach is
applied to on-line learning in small neural networks and deep autoencoders. Automatically
tuned coefﬁcients surpass or roughly match the best ones selected manually in terms of
learning speed. As a result, on-line learning can be a fully automatic process, producing
results from the ﬁrst run, without preliminary experiments aimed at manual hyperparameter
tuning.

Keywords - online optimization, Stochastic Gradient Descent, hyperparameter
tuning, neural networks, Classic Momentum, ADAM.

K.Checinski, P.Wawrzyński,
"DCT-Conv: Coding ﬁlters in convolutional networks with Discrete Cosine Transform",
International Joint Conference on Neural Networks (IJCNN), 2020.
[preprint]

ABSTRACT: Convolutional neural networks are based on a huge
number of trained weights. Consequently, they are often datagreedy, sensitive to overtraining,
and learn slowly. We follow the line of research in which ﬁlters of convolutional neural layers
are determined on the basis of a smaller number of trained parameters. In this paper,
the trained parameters deﬁne a frequency spectrum which is transformed into convolutional
ﬁlters with Inverse Discrete Cosine Transform (IDCT, the same is applied in decompression
from JPEG). We analyze how switching off selected components of the spectra, thereby reducing
the number of trained weights of the network, affects its performance. Our experiments show
that coding the ﬁlters with trained DCT parameters leads to improvement over traditional
convolution. Also, the performance of the networks modiﬁed this way decreases very slowly
with the increasing extent of switching off these parameters. In some experiments, a good
performance is observed when even 99.9% of these parameters are switched off.

Keywords - neural networks, parameters reduction, convolution, discrete cosine transform.

P.Wawrzyński,
"Efﬁcient on-line learning with diagonal approximation of loss function Hessian",
International Joint Conference on Neural Networks (IJCNN), 2019.
[pdf]

ABSTRACT: The subject of this paper is stochastic
optimization as a tool for on-line learning. New ingredients are introduced
to Nesterov's Accelerated Gradient that increase efficiency of this algorithm
and determine its parameters that are otherwise tuned manually: step-size and
momentum decay factor. In this order a diagonal approximation of the Hessian
of the loss function is estimated. In the experimental study the approach
is applied to various types of neural networks, deep ones among others.

Keywords - on-line learning, accelerated gradient, parameter autotuning, deep learning.

A.Zanetti, A.Testolin, M.Zorzi, P.Wawrzynski,
"Numerosity Representation in InfoGAN: An Empirical Study",
International Work-Conference on Artificial Neural Networks (IWANN), pp.49-60, 2019.
[link]

ABSTRACT: It has been shown that “visual numerosity emerges as
a statistical property of images in ‘deep networks’ that learn a hierarchical generative
model of the sensory input”, through unsupervised deep learning [1]. The original deep
generative model was based on stochastic neurons and, more importantly, on input (image)
reconstruction. Statistical analysis highlighted a correlation between the numerosity present
in the input and the population activity of some neurons in the second hidden layer of the network,
whereas population activity of neurons in the first hidden layer correlated with total area (i.e.,
number of pixels) of the objects in the image. Here we further investigate whether numerosity
information can be isolated as a disentangled factor of variation of the visual input.
We train in unsupervised and semi-supervised fashion a latent-space generative model that has been
shown capable of disentangling relevant semantic features in a variety of complex datasets, and we
test its generative performance under different conditions. We then propose an approach to the problem
based on the assumption that, in order to let numerosity emerge as disentangled factor of variation,
we need to cancel out the sources of variation at graphical level.

P.Wawrzyński,
"ASD+M: Automatic parameter tuning in stochastic optimization and on-line learning",
*Neural Networks,* vol. 96, pp. 1-10, 2017.
[DOI]

ABSTRACT: In this paper the classic
momentum algorithm for stochastic optimization is considered.
A method is introduced that adjusts coefficients for this
algorithm during its operation. The method does not depend
on any preliminary knowledge of the optimization problem.
In the experimental study, the method is applied to on-line
learning in feed-forward neural networks, including deep
auto-encoders, and outperforms any fixed coefficients. The
method eliminates coefficients that are difficult to determine,
with profound influence on performance. While the method itself
has some coefficients, they are ease to determine and sensitivity
of performance to them is low. Consequently, the method makes on-line
learning a practically parameter-free process and broadens the area
of potential application of this technology.

Keywords - Stochastic gradient descent, classic momentum, step-size,
learning rate, on-line learning, deep learning.

P.Wawrzyński,
"Robot’s Velocity and Tilt Estimation Through Computationally Efficient Fusion of Proprioceptive Sensors Readouts",
International Conference on Methods and Models in Automation and Robotics (MMAR),
pp. 738-743, 2015. [pdf]

ABSTRACT: In this paper a method is introduced that combines
Inertial Measurement Unit (IMU) readouts with low accuracy
and temporarily unavailable velocity measurements (e.g., based
on kinematics or GPS) to produce high accuracy estimates of
velocity and orientation with respect to gravity. The method is
computationally cheap enough to be readily implementable in
sensors. The main area of application of the introduced method
is mobile robotics.

Keywords - velocity estimation, Kalman filter, mobile robotics.

M.Majczak, P.Wawrzyński,
"Comparison of two efficient control strategies for two-wheeled balancing robot",
International Conference on Methods and Models in Automation and Robotics (MMAR),
pp. 744-749, 2015.[pdf]

ABSTRACT: The subject of this paper is a two-wheeled balancing
robot with the center of mass above its wheels. Two control
strategies for this robot are analyzed. The first one combines a
kinematic model of the robot and a PI controller. The second
one is a cascade of two PIDs. These strategies are compared
experimentally.

Keywords - mobile robots, inverted pendulum, cost-effective
robots.

P.Wawrzyński,
"Control policy with autocorrelated noise in reinforcement learning for robotics",
*International Journal of Machine Learning and Computing,* Vol. 5, No. 2, pp. 91-95, IACSIT Press, 2015.
doi:10.7763/IJMLC.2015.V5.489.

ABSTRACT: Direct application of reinforcement learning in
robotics rises the issue of discontinuity of control signal.
Consecutive actions are selected independently on random,
which often makes them excessively far from one another. Such
control is hardly ever appropriate in robots, it may even lead to
their destruction. This paper considers a control policy in which
consecutive actions are modified by autocorrelated noise. That
policy generally solves the aforementioned problems and it is
readily applicable in robots. In the experimental study it is
applied to three robotic learning control tasks: Cart-Pole
SwingUp, Half-Cheetah, and a walking humanoid.

Index Terms - Machine learning, reinforcement learning,
actorcritics, robotics.

P.Wawrzyński, J.Mozaryn, J.Klimaszewski,
"Robust estimation of walking robots velocity and tilt using proprioceptive sensors data fusion",
*Robotics and Autonomous Systems,* Elsevier, Vol. 66, pp. 44-54, 2015.
doi:10.1016/j.robot.2014.12.012.

ABSTRACT: Availability of the instantaneous velocity of a legged robot is usually required for
its efficient control. However, estimation of velocity only on the basis of robot kinematics
has a significant drawback: the robot is not in touch with the ground all the time, or its feet
may twist. In this paper we introduce a method for velocity and tilt estimation in a walking robot.
This method combines a kinematic model of the supporting leg and readouts from an inertial sensor.
It can be used in any terrain, regardless of the robots body design or the control strategy applied,
and it is robust in regard to foot twist. It is also immune to limited foot slide and temporary lack
of foot contact.

J.Mozaryn, J.Klimaszewski, D.Swieczkowski-Feiz, P.Kolodziejczyk, P.Wawrzyński,
"Design process and experimental verification of the quadruped robot wave gait",
International Conference on Methods and Models in Automation and Robotics (MMAR),
pp. 206-211, 2014.
doi:10.1109/MMAR.2014.6957352.

ABSTRACT: In this paper there is presented the design process and experimental verification of
the quadruped robot wave gait. Mathematical model of a robot movement is a result of linking
together derived leg movement equations with a scheme of their locomotion. The gait is designed
and analysed based on twostep design procedure which consists of simulations using MSC Adams
and Matlab environments and experimental verification using real quadruped robot.

P.Wawrzyński,
"Reinforcement Learning with Experience Replay for Model-Free Humanoid Walking Optimization,"
*International Journal of Humanoid Robotics,* Vol. 11, No. 3, pp. 1450024, 2014.
doi:10.1142/S0219843614500248.

ABSTRACT: In this paper a control system for humanoid robot walking is approximately optimized
by means of reinforcement learning. Given is a 18 DOF humanoid whose gait is based on replaying
a simple trajectory. This trajectory is translated into a reactive policy. A neural network
whose input represents the robot state learns to produce appropriate output that additively
modifies the initial control. The learning algorithm applied is Actor-Critic with experience
replay. In 50 minutes of learning, the slow initial gait changes to a dexterous and fast walking.
No model of the robot dynamics is engaged. The methodology in use is generic and can be applied
to optimize control systems for diverse robots of comparable complexity.

Keywords: reinforcement learning, learning in robots, humanoids, bipedal walking.

P.Wawrzyński,
*Podstawy sztucznej inteligencji,*Oficyna Wydawnicza Politechniki Warszawskiej, 2014 i 2019.

Podręcznik zawiera materiał wprowadzający do dziedziny sztuczna inteligencja.
Jest podzielony na trzy części odpowiadające jej głównym działom: przeszukiwaniu, uczeniu maszynowemu
i wnioskowaniu. W opracowaniu sztuczna inteligencja jest przedstawiona jako zbiór metod
współtworzących arsenał współczesnej informatyki.

B.Papis, P.Wawrzyński,
"dotRL: A platform for rapid Reinforcement Learning methods development and validation,"
Federated Conference on Computer Science and Information Systems (FedCSIS),
pp. 129-136, 2013. [pdf]

ABSTRACT: This paper introduces dotRL, a platform that enables
fast implementation and testing of Reinforcement Learning algorithms against diverse environments.
dotRL has been written under .NET framework and its main characteristics include: (i) adding
a new learning algorithm or environment to the platform only requires implementing a simple interface,
from then on it is ready to be coupled with other environments and algorithms, (ii) a set of tools
is included that aid running and reporting experiments, (iii) a set of benchmark environments is included
with as demanding as Octopus-Arm and Half-Cheetah, (iv) the platform is available for instantaneous download,
compilation, and execution, without libraries from different sources.

Index Terms - Reinforcement learning, evaluation platform, software engineering.

P.Wawrzyński, J.Mozaryn, J.Klimaszewski,
"Robust velocity estimation for legged robot using on-board sensors data fusion,"
International Conference on Methods and Models in Automation and Robotics (MMAR),
August 26-29, 2013, Międzyzdroje, Poland, pp. 717-722, IEEE, 2013.
[pdf]

ABSTRACT: Availability of momentary velocity of a legged robot
is essential for its efficient control. However, estimation of
the velocity is difficult, because the robot does not need to touch
the ground all the time or its feet may twist. In this paper we
introduce a method for velocity estimation in a legged robot that
combines kinematic model of the supporting leg, readouts from an
inertial sensor, and Kalman Filter. The method alleviates all
the above mentioned difficulties.

Index Terms - legged locomotion, velocity estimation, Kalman Filter.

P.Suszynski, P.Wawrzyński,
"Learning population of spiking neural networks with perturbation of conductances,"
International Joint Conference on Neural Networks (IJCNN), August 4-9, 2013, Dallas TX, USA,
pp. 332-337, IEEE, 2013. [pdf]

ABSTRACT: In this paper a method is presented for learning of
spiking neural networks. It is based on perturbation of synaptic
conductances. While this approach is known to be model-free, it
is also known to be slow, because it applies improvement direction
estimates with large variance. Two ideas are analysed to alleviate
this problem: First, learning of many networks at the same
time instead of one. Second, autocorrelation of perturbations in
time. In the experimental study the method is validated on three
learning tasks in which information is conveyed with frequency
and spike timing.

Index terms - Spiking neural networks, learning.

P.Wawrzyński, A.K.Tanwani,"Autonomous Reinforcement Learning with Experience Replay,"
*Neural Networks,* vol. 41, pp. 156-167, Elsevier, 2013.
doi:10.1016/j.neunet.2012.11.007.

ABSTRACT: This paper considers the issues of efficiency and autonomy that are
required to make reinforcement learning suitable for real-life control tasks. A real-time reinforcement
learning algorithm is presented that repeatedly adjusts the control policy with the use of previously
collected samples, and autonomously estimates the appropriate step-sizes for the learning updates.
The algorithm is based on the actor-critic with experience replay whose step-sizes are determined on-line
by an enhanced fixed point algorithm for on-line neural network training. An experimental study with
simulated octopus arm and half-cheetah demonstrates the feasibility of the proposed algorithm to solve
difficult learning control problems in an autonomous way within reasonably short time.

Keywords: reinforcement learning, autonomous learning, step-size estimation, actor-critic

P.Wawrzyński, *Sterowanie Adaptacyjne i Uczenie Maszynowe - preskrypt wykładu,*
Politechnika Warszawska, 2012.

STRESZCZENIE:Skrypt omawia różne podejścia do adaptacji w zastosowaniu
do optymalizacji działania systemów sterujących. Te podejścia to: uczenie się ze wzmocnieniem
(reinforcement learning), sterowanie adaptacyjne z modelem referencyjnym (model reference adaptive
control), samostrojące się regulatory (self-tuning regulators). Ponadto, w skrypcie dokonany jest
przegląd innych forma adaptacji, której można użyć w systemach technicznych, np. omówiony jest Filtr Kalmana.

P.Wawrzyński,
"Autonomous Reinforcement Learning with Experience Replay for Humanoid Gait Optimization,"
Proceedings of the International Neural Network Society Winter Conference (INNS-WC2012),
pp. 205-211, Elsevier, 2012.
doi:10.1016/j.procs.2012.09.130.

ABSTRACT: This paper demonstrates application of Reinforcement Learning
to optimization of control of a complex system in realistic setting that requires efficiency and
autonomy of the learning algorithm. Namely, Actor-Critic with experience replay (which addresses
efficiency), and the Fixed Point method for step-size estimation (which addresses autonomy) is applied
here to approximately optimize humanoid robot gait. With complex dynamics and tens of continuous state
and action variables, humanoid gait optimization represents a challenge for analytical synthesis of control.
The presented algorithm learns a nimble gait within 80 minutes of training.

Keywords: Reinforcement learning; Autonomous learning; Learning in robots

P.Wawrzyński, B.Papis, "Fixed point method for autonomous on-line neural network training,"
*Neurocomputing 74,* pp. 2893-2905, Elsevier, 2011.
doi:10.1016/j.neucom.2011.03.029.

ABSTRACT: This paper considers on-line training of feedforward neural networks.
Training examples are only available through sampling from a certain, possibly infinite, distribution.
In order to make the learning process autonomous, one can employ Extended Kalman Filter or stochastic
steepest descent with adaptively adjusted step-sizes. Here the latter is considered. A scheme of determining
step-sizes is introduced that satisfies the following requirements: (i) it does not need any auxiliary
problem-dependent parameters, (ii) it does not assume any particular loss function that the training process
is intended to minimize, (iii) it makes the learning process stable and efficient. An experimental study with
several approximation problems is presented. Within this study the presented approach is compared with Extended
Kalman Filter and LFI, with satisfactory results.

Keywords: On-line learning; Autonomous learning; Step-size adaptation;
Extended Kalman Filter

P.Wawrzyński, "Fixed point method of step-size estimation for on-line neural network training,"
Intenational Joint Conference on Neural Networks (IJCNN), July, 18-23, 2010, Barcelona, Spain,
IEEE, pp. 2012-2017. [pdf]

ABSTRACT:This paper considers on-line training of feadforwardneural networks. Training
examples are only availablesampled randomly from a given generator. What emerges inthis setting is the problem of
step-sizes, or learning rates,adaptation. A scheme of determining step-sizes is introducedhere that satisfies the
following requirements: (i) it does notneed any auxiliary problem-dependent parameters, (ii) it doesnot assume any
particular loss function that the training processis intended to minimize, (iii) it makes the learning process
stable and efficient. An experimental study with the 2D Gabor functionapproximation is presented.

Keywords: neural networks, on-line learning, step-size adaptation,reinforcement learning.

P.Wawrzyński,
*Systemy adaptacyjne i uczące się - preskrypt wykładu,*
Oficyna Wydawnicza Politechniki Warszawskiej, 2009.

W skrypcie omówiono mechanizmy adaptacji możliwe do aplikowania w systemach
tworzonych przez człowieka. Celem adaptacji jest poprawa działania systemu w trakcie pracy. Nie zawsze
funkcjonowanie zaprojektowanego systemu jest zadowalające, więc musi on się *nauczyć* jak działać
optymalnie. W pracy podano metody i algorytmy potrzebne przy projektowaniu systemów adaptacyjnych
i uczących.

P.Wawrzyński,
"Real-Time Reinforcement Learning by Sequential Actor-Critics and Experience Replay,"
*Neural Networks,* vol. 22, pp. 1484-1497, Elsevier, 2009.
doi:10.1016/j.neunet.2009.05.011.

ABSTRACT: Actor-Critics constitute an important class
of reinforcement learning algorithms that can deal with continuous actions and states
in an easy and natural way. This paper shows how these algorithms can be augmented by
the technique of experience replay without degrading their convergence properties,
by appro- priately estimating the policy change direction. This is achieved by truncated
importance sampling applied to the recorded past experiences. It is formally shown that
the resulting estimation bias is bounded and asymptotically vanishes, which allows
the experience replay-augmented algorithm to preserve the convergence properties
of the original algorithm. The technique of experience replay makes it possible
to utilize the available computational power to reduce the required number of interactions
with the environment considerably, which is essential for real-world applications.
Experimental results are presented that demonstrate that the combination of experience
replay and Actor-Critics yields extremely fast learning algorithms that achieve successful
policies for nontrivial control tasks in considerably short time. Namely, the policies
for the cart-pole swing-up (Doya, 2000) are obtained after as little as 20 minutes
of the cart-pole time and the policy for Half-Cheetah (a walking 6-degree-of-freedom robot)
is obtained after four hours of Half-Cheetah time.

P.Wawrzyński, "A Cat-Like Robot Real-Time Learning to Run,"
* Lecture Notes in Computer Science 5495,*
pp. 380-390, Springer-Verlag, 2009.
doi:10.1007/978-3-642-04921-7_39.
For demo see here.

ABSTRACT: Actor-Critics constitute an important class of reinforcement learning
algorithms that can deal with continuous actions and states in an easy and natural way. In their original,
sequential form, these algo- rithms are usually to slow to be applicable to real-life problems. However,
they can be augmented by the technique of experience replay to obtain a satisfying speed of learning
without degrading their convergence prop- erties. In this paper experimental results are presented that
show that the combination of experience replay and Actor-Critics yields very fast learning algorithms
that achieve successful policies for nontrivial control tasks in considerably short time. Namely,
a policy for a model of 6-degree-of-freedom walking robot is obtained after 4 hours of the robot's time.

P.Wawrzyński, J. Arabas, P. Cichosz,
"Predictive Control for Artificial Intelligence in Computer Games,"
*Lecture Notes in Artificial Intelligence 5097,* pp. 1137-1148, Springer-Verlag, 2008.
doi:10.1007/978-3-540-69731-2_107.

ABSTRACT: The subject of this paper is artificial intelligence (AI)
of non-player characters in computer games, i.e. bots. We develop an idea of game AI based
on predictive control. Bot's activity is defined by a currently realized plan. This plan results
from an optimization process in which random plans are continuously generated and reselected.
We apply our idea to implement a bot for the game Half-Life. Our bot, Randomly Planning Fighter (RPF),
defeats the bot earlier designed for Half-Life with the use of behavior-based techniques.
The experiments prove that on-line planning can be feasible in rapidly changing environment
of modern computer games.

P.Wawrzyński, A.Pacut "Truncated Importance Sampling for Reinforcement Learning
with Experience Replay," International Multiconference on Computer Science
and Information Technology (MultCSIS), pp. 305-315, 2007. [pdf]

ABSTRACT: Reinforcement Learning (RL) is considered here as an adaptation
technique of neural controllers of machines. The goal is to make Actor-Critic algorithms require less
agent-environment interaction to obtain policies of the same quality, at the cost of additional
background computations. We propose to achieve this goal in the spirit of *it experience replay*.
An estimation method of improvement direction of a changing policy, based on preceding experience,
is essential here. We propose one that uses truncated importance sampling. We derive bounds of bias
of that type of estimators and prove that this bias asymptotically vanishes. In the experimental study
we apply our approach to the classic Actor-Critic and obtain 20-fold increase in speed of learning.

P.Wawrzyński, "Learning to Control a 6-Degree-of-Freedom Walking Robot,"
EUROCON 2007 The International Conference on Computer as a Tool, pp. 698-705, 2007.
[pdf]

ABSTRACT: We analyze the issue of optimizing a control policy for
a complex system in a simulated trial-and-error learning process. The approach to this problem
we consider is Reinforcement Learning (RL). Stationary policies, applied by most RL methods,
may be improper in control applications, since for time discretization fine enough they do not
exhibit exploration capabilities and define policy gradient estimators of very large variance.
As a remedy to those difficulties, we proposed earlier the use of piecewise non- Markov policies.
In the experimental study presented here we apply our approach to a 6-degree-of-freedom walking
robot and obtain an efficient policy for this object.

P.Wawrzyński, "Reinforcement Learning in Fine Time Discretization,"
*Lecture Notes in Computer Science 4431,* pp. 470-479, 2007.
doi:10.1007/978-3-540-71618-1_52.

ABSTRACT: Reinforcement Learning (RL) is analyzed here as a tool
for control system optimization. State and action spaces are assumed to be continuous.
Time is assumed to be discrete, yet the discretization may be arbitrarily fine. It is
shown here that stationary policies, applied by most RL methods, are improper in control
applications, since for fine time discretization they can not assure bounded variance
of policy gradient estimators. As a remedy to that difficulty, we propose the use of
piecewise non-Markov policies. Policies of this type can be optimized by means of most
RL algorithms, namely those based on likelihood ratio.

P.Wawrzyński, A.Pacut, "Balanced Importance Sampling Estimation,"
International Conference on Information Processing and Management
of Uncertainty in Knowledge-based Systems (IPMU), Paris, July 2-7, 2006, pp. 66-73.
[pdf ps]

ABSTRACT: In this paper we analyze a particular issue of estimation,
namely the estimation of the expected value of an unknown function for
a given distribution, with the samples drawn from other distributions.
A motivation of this problem comes from machine learning. In reinforcement
learning, an intelligent agent that learns to make decisions in an unknown
environment encounters the problem of judging an arbitrary decision policy
(the given distribution) on the basis of previous decisions and their outcomes
suggested by previous policies (other distributions).

The problem can be solved with the use of well established importance sampling estimators. To overcome a potential problem of excessive variance of such estimators, we introduce the family of balanced importance sampling estimators, prove their consistency and demonstrate empirically their superiority over the classical counterparts.

The problem can be solved with the use of well established importance sampling estimators. To overcome a potential problem of excessive variance of such estimators, we introduce the family of balanced importance sampling estimators, prove their consistency and demonstrate empirically their superiority over the classical counterparts.

Keywords: Estimation, Importance Sampling, Machine Learning, Reinforcement Learning.

P.Wawrzyński, "Symulacja Płaskich łańcuchów Kinematycznych,"
Raport nr 05-06 Instytutu Automatyki i Informatyki Stosowanej, Listopad 2005.
[pdf]

STRESZCZENIE: W raporcie przedstawiony jest pewien algorytm symulowania dynamiki płaskich łancuchów
kinematycznych. Jest on oparty na metodzie Eulera-Newtona. Przyjmuje się, że w ciągu kwantu czasu
przyspieszenia w obiekcie są stałe. Istotą algorytmu jest zatem znalezienie tych przyspieszeń. Koszt
obliczeniowy tej operacji jest liniowy w liczbie elementów obiektu.

Analizowane są płaskie łancuchy kinematyczne w postaci prętów (ogniw) połączonych obrotowymi stopniami swobody (złączami). Ogniwa są sztywne a cała masa obiektu jest rozlokowana w złaczach. Przeanalizowane sa złacza kilku typów: poruszające się bez ograniczeń, poruszajace sie po prostej, poruszające sie z zadanym przyspieszeniem. Ponadto kąt między ogniwami sąsiadującymi ze złączem może być stały lub zmianiać się stosowanie do indukowanych w łancuchu przyspieszeń złącz. Analizie poddano typowe zjawiska towarzyszące symulacji takie jak zderzenia (upadki) złącz.

Analizowane są płaskie łancuchy kinematyczne w postaci prętów (ogniw) połączonych obrotowymi stopniami swobody (złączami). Ogniwa są sztywne a cała masa obiektu jest rozlokowana w złaczach. Przeanalizowane sa złacza kilku typów: poruszające się bez ograniczeń, poruszajace sie po prostej, poruszające sie z zadanym przyspieszeniem. Ponadto kąt między ogniwami sąsiadującymi ze złączem może być stały lub zmianiać się stosowanie do indukowanych w łancuchu przyspieszeń złącz. Analizie poddano typowe zjawiska towarzyszące symulacji takie jak zderzenia (upadki) złącz.

P.Wawrzyński, A.Pacut, "Reinforcement Learning in Quasi-Continuous Time,"
International Conference on Computational Intelligence for Modelling,
Control and Automation, November 2005, Vienna, Austria, pp. 1031-1036.

Reinforcement Learning (RL) is used here as a tool
for control systems optimization. State and action spaces are assumed to be continuous.
Time is assumed to be discrete, yet the discretization may be arbitrarily fine.
Within the proposed algorithm, a piece of information that leads to a policy improvement,
is inferred from an experiment that lasts for several consecutive steps, rather than from
a single step, as in more traditional RL methods. Simulations reveal that the algorithm
is able to optimize the control policies for plants for which it is very difficult to
apply the traditional methods.

Keywords: Machine Learning, Reinforcement Learning, Adaptive Control.

P.Wawrzyński, "Intensive Reinforcement Learning,"
Ph.D. dissertation, Institute of Control and Computation Engineering,
Warsaw University of Technology, may 2005.
[ps pdf]

ABSTRACT: The Reinforcement Learning (RL) problem is analyzed
in this dissertation in the language of statistics as an estimation issue. A family of
RL algorithms is introduced. They determine a control policy by processing the entire
known history of plant-controller interactions. Stochastic approximation as a mechanism
that makes the classical RL algorithms converge is replaced with batch estimation.
The experimental study shows that the algorithms obtained are able to identify
parameters of nontrivial controllers within a few dozens of minutes of control.
This makes them a number of times more efficient than their existing equivalents.

P.Wawrzyński, A.Pacut,
"Model-free off-policy reinforcement learning in continuous environment,"
International Joint Conference on Neural Networks (IJCNN),
Budapest, July 2004, pp. 1091-1096.
[ps pdf]

ABSTRACT: We introduce an algorithm of reinforcement learning in continuous
state and action spaces. In order to construct a control policy, the algorithm
utilizes the entire history of agent-environment interaction. The policy is
a result of an estimation process based on all available information rather
than result of stochastic convergence as in classical reinforcement learning
approaches. The policy is derived from the history directly, not through any
kind of a model of the environment.

We test our algorithm in the Cart-Pole Swing-Up simulated environment. The algorithm learns to control this plant in about 100 trials, which corresponds to 15 minutes of plant's real time. This time is several times shorter than the one required by other algorithms.

We test our algorithm in the Cart-Pole Swing-Up simulated environment. The algorithm learns to control this plant in about 100 trials, which corresponds to 15 minutes of plant's real time. This time is several times shorter than the one required by other algorithms.

P.Wawrzyński, A.Pacut,
"Intensive versus nonintensive actor-critic algorithms of reinforcement learning,"
*Lecture Notes in Artificial Intelligence 3070,* pp. 934-941, Springer-Verlag, 2004.
doi: 10.1007/978-3-540-24844-6_145.

ABSTRACT: Algorithms of reinforcement learning usually employ consecutive agent's
actions to construct gradients estimators to adjust agent's policy. The policy is
a result of some kind of stochastic approximation. Because of the slowness of stochastic
approximation, such algorithms are usually much too slow to be employed, e.g. in real-time
adaptive control.

In this paper we analyze the replacing of the stochastic approximation with the estimation based on the entire available history of an agent-environment interaction. We design an algorithm of reinforcement learning in continuous space/action domain that is of orders of magnitude faster then the classical methods.

In this paper we analyze the replacing of the stochastic approximation with the estimation based on the entire available history of an agent-environment interaction. We design an algorithm of reinforcement learning in continuous space/action domain that is of orders of magnitude faster then the classical methods.

P.Wawrzyński, A.Pacut, "A simple actor-critic algorithm for continuous environments,"
International Conference on Methods and Models in Automation and Robotics (MMAR), August 2004,
pp. 1143-1149.

ABSTRACT: In reference to methods analyzed recently by Sutton *et al*,
and Konda & Tsitsiklis, we propose their modification called Randomized Policy Optimizer (RPO).
The algorithm has a modular structure and is based on the value function rather than on
the action-value function. The modules include neural approximators and a parameterized
distribution of control actions. The distribution must belong to a family of
*smoothly exploring* distributions that enables to sample from control action set
to approximate certain gradient. A *pre-action-value function* is introduced similarly
to the action-value function, with the first action replaced by the first action distribution parameter.

The paper contains an experimental comparison of this approach to reinforcement learning with model-free Adaptive Critic Designs, specifically with Action-Dependent Adaptive Heuristic Critic. The comparison is favorable for our algorithm.

The paper contains an experimental comparison of this approach to reinforcement learning with model-free Adaptive Critic Designs, specifically with Action-Dependent Adaptive Heuristic Critic. The comparison is favorable for our algorithm.

P.Wawrzyński, P.Podsiadly, G.Lehmann,
"IOT Methodology of Frequency Assignment in Cellular Network,"
MOST International Conference, October 2002, pp. 313-324.

ABSTRACT: We present the constraints based methodology of solving
the frequency assignment problem in Cellular Phone Network. The methodology
is based on taking radio measurements in territory where a given network works.
The measurements are exploited to approximate areas of cells and areas
of interference that would occur in case of transceivers' frequencies
assigned too close. A set of constraints for the frequency assignment
is computed in order to minimize the interference. The frequencies
are then assigned in the process of discrete optimization with constraints.

The standard method of dealing with the frequency assigned problem places emphasis on optimization. The shape of minimized function is determined with the use of signal propagation models. Unfortunately these models lack precision. Thus emerges the need of empirical assessment of signal strength. Determining the constraint set as much restrictively as possible becomes in practice even more important than the efficiency of optimization process.

The standard method of dealing with the frequency assigned problem places emphasis on optimization. The shape of minimized function is determined with the use of signal propagation models. Unfortunately these models lack precision. Thus emerges the need of empirical assessment of signal strength. Determining the constraint set as much restrictively as possible becomes in practice even more important than the efficiency of optimization process.

P.Wawrzyński, A.Pacut,
"Modeling of distributions with neural approximation of conditional quantiles,"
IASTED International Conference Artificial Intelligence and Applications,
Malaga, Spain, September 2002, pp. 539-543.[pdf,
ps]

ABSTRACT: We propose a method of recurrent estimation of conditional
quantiles stemming from stochastic approximation. The method employs a sigmoidal neural network
and specialized training algorithm to approximate the conditional quantiles. The approach may
be used in a wide range of fields, in particular in econometrics, medicine, data mining, and modeling.

P. Wawrzyński, "A two-wheeled bicycle with variable configuration", 2018-02-16, PL 424613, WO 2019/159140A1, US 11292549B2.

P. Wawrzyński, "Pojazd latający z kilkoma zespołami napędowymi i sposób sterowania jego lotem", 2017-07-03, PL 232732.

P. Wawrzyński, "Multikopter z wirnikami o zmiennym kącie natarcia i sposób sterowania jego lotem", 2017-07-03, PL 232731.

P. Wawrzyński, G. Lehmann, "Method for determination of the minimum distance between frequency
channels within pre-selected base station cells", 2001-05-23, WO 02/096141, US 6987973B2.

Copyright © Marianna Krzewińska 2016