Paweł Wawrzyński

Publications

Links to abstracts and patents.

Dominik Jacek Bogucki, Łukasz Lepak, Sonam Parashar, Bartłomiej Błachowski, Paweł Wawrzyński, "EnEnv 1.0: Energy Grid Environment for Multi-Agent Reinforcement Learning Benchmarking", International Conference on Autonomous Agents and Multiagent Systems (AAMAS), pp. 361-370, 2025. pdf

Patryk Krukowski, Anna Bielawska, Kamil Książek, Paweł Wawrzyński, Paweł Batorski, Przemysław Spurek, "HINT: Hypernetwork approach to training weight interval regions in continual learning", Information Sciences, vol. 717, 2025. DOI

P.Wawrzyński, Podstawy sztucznej inteligencji, Oficyna Wydawnicza Politechniki Warszawskiej, 2014, 2019, 2025.

Łukasz Lepak, Paweł Wawrzyński, "Reinforcement Learning Meets Microeconomics: Learning to Designate Price-Dependent Supply and Demand for Automated Trading", Machine Learning and Knowledge Discovery in Databases (ECML PKDD), Part IX, pp. 368-384, 2024. DOI

Radosław Nowak, Adam Małkowski, Daniel Cieślak, Piotr Sokół, Paweł Wawrzyński, "Graph Vertex Embeddings: Distance, Regularization and Community Detection", International Conference on Computational Science (ICCS), pp. 43-57, 2024. DOI preprint

J. Łyskawa, P. Wawrzyński, "Actor-Critic with Variable Time Discretization via Sustained Actions", Neural Information Processing (ICONIP), pp. 476-489, 2023.

Ł. Neumann, Ł. Lepak, P. Wawrzyński, "Least Redundant Gated Recurrent Neural Network", International Joint Conference on Neural Networks (IJCNN), pp. 1-10, 2023.

W. Masarczyk, P. Wawrzyński, D. Marczak, K. Deja, T. Trzciński, "Logarithmic Continual Learning", IEEE Access, vol. 10, pp. 117001-117010, DOI:10.1109/ACCESS.2022.3218907, 2022.

J. Łyskawa, P. Wawrzyński, "ACERAC: Efficient reinforcement learning in fine time discretization", IEEE Transactions on Neural Networks and Learning Systems, DOI:10.1109/TNNLS.2022.3190973, 2022.

A. Małkowski, J. Grzechociński, P. Wawrzyński, "ReGAE: Graph Autoencoder Based on Recursive Neural Networks", International Conference on Neural Information Processing (ICONIP), 2022.

G. Rypeść, Ł. E. Lepak, P. Wawrzyński, "Reinforcement Learning for on-line Sequence Transformation", Conference on Computer Science and Intelligence Systems (FedCSIS), pp. 133-139, 2022.

K. Deja, P. Wawrzyński, D. Marczak, W. Masarczyk, T. Trzciński, "Multiband VAE: Latent Space Partitioning for Knowledge Consolidation in Continual Learning", International Joint Conference on Artificial Intelligence (IJCAI), 2022.

J. Arabas, R. Biedrzycki, K. Budzyńska, J. Chudziak, P. Cichosz, W. Daszczuk, T. Gambin, P. Gawrysiak, M. Muraszkiewicz, R. Nowak, K. Piczak, P. Wawrzyński, P. Zawistowski, Sztuczna inteligencja dla inżynierów. Metody ogólne, Oficyna Wydawnicza Politechniki Warszawskiej, 2022.

P. Wawrzyński, Uczące się systemy decyzyjne, Oficyna Wydawnicza Politechniki Warszawskiej, 2021.

P. Wawrzyński, W. Masarczyk, Mateusz Ostaszewski, "Reinforcement learning with experience replay and adaptation of action dispersion", arXiv:2208.00156, 2022.

K. Deja, P. Wawrzyński, D. Marczak, W. Masarczyk, T. Trzciński, "BinPlay: A Binary Latent Autoencoder for Generative Replay Continual Learning", International Joint Conference on Neural Networks (IJCNN), 2021.

D. Chen, P. Wawrzynski, Zhihan Lv, "Cyber security in smart cities: A review of deep learning-based applications and case studies", Sustainable Cities and Society, 2021, vol. 66, pp.1-12.

M.Szulc, J.Łyskawa, P.Wawrzyński, "A Framework for Reinforcement Learning with Autocorrelated Actions", International Conference on Neural Information Processing (ICONIP), pp. 90-101, 2020.

P.Wawrzyński, P.Zawistowski, L.Lepak, "Automatic hyperparameter tuning in on-line learning: Classic Momentum and ADAM", International Joint Conference on Neural Networks (IJCNN), 2020.

K.Checinski, P.Wawrzyński, "DCT-Conv: Coding ﬁlters in convolutional networks with Discrete Cosine Transform", International Joint Conference on Neural Networks (IJCNN), 2020.

P.Wawrzyński, "Efficient on-line learning with diagonal approximation of loss function Hessian", International Joint Conference on Neural Networks (IJCNN), 2019.

A.Zanetti, A.Testolin, M.Zorzi, P.Wawrzynski, "Numerosity Representation in InfoGAN: An Empirical Study", International Work-Conference on Artificial Neural Networks (IWANN), pp.49-60, 2019.

P.Wawrzyński, "ASD+M: Automatic parameter tuning in stochastic optimization and on-line learning", Neural Networks, vol. 96, pp. 1-10, 2017.

P.Wawrzyński, "Robot’s Velocity and Tilt Estimation Through Computationally Efficient Fusion of Proprioceptive Sensors Readouts", International Conference on Methods and Models in Automation and Robotics (MMAR), pp. 738-743, 2015.

M.Majczak, P.Wawrzyński, "Comparison of two efficient control strategies for two-wheeled balancing robot", International Conference on Methods and Models in Automation and Robotics (MMAR), pp. 744-749, 2015.

P.Wawrzyński, "Control policy with autocorrelated noise in reinforcement learning for robotics", International Journal of Machine Learning and Computing, Vol. 5, No. 2, pp. 91-95, IACSIT Press, 2015.

P.Wawrzyński, J.Mozaryn, J.Klimaszewski, "Robust estimation of walking robots velocity and tilt using proprioceptive sensors data fusion", Robotics and Autonomous Systems, Elsevier, Vol. 66, pp. 44-54, 2015.

J.Możaryn, J.Klimaszewski, D.Swieczkowski-Feiz, P.Kolodziejczyk, P.Wawrzyński, "Design process and experimental verification of the quadruped robot wave gait", International Conference on Methods and Models in Automation and Robotics (MMAR), pp. 206-211, 2014.

P.Wawrzyński, "Reinforcement Learning with Experience Replay for Model-Free Humanoid Walking Optimization," International Journal of Humanoid Robotics,Vol. 11, No. 3, pp. 1450024, 2014.

B.Papis, P.Wawrzyński, "dotRL: A platform for rapid Reinforcement Learning methods development and validation," Federated Conference on Computer Science and Information Systems (FedCSIS), pp. 129-136, 2013.

P.Wawrzyński, J.Mozaryn, J.Klimaszewski, "Robust velocity estimation for legged robot using on-board sensors data fusion," International Conference on Methods and Models in Automation and Robotics (MMAR), August 26-29, 2013, Międzyzdroje, Poland, pp. 717-722, IEEE, 2013.

P.Suszynski, P.Wawrzyński, "Learning population of spiking neural networks with perturbation of conductances," International Joint Conference on Neural Networks (IJCNN), August 4-9, 2013, Dallas TX, USA, pp. 332-337, IEEE, 2013.

P.Wawrzyński, A.K.Tanwani, "Autonomous Reinforcement Learning with Experience Replay," Neural Networks, vol. 41, pp. 156-167, Elsevier, 2013.

P.Wawrzyński, Sterowanie Adaptacyjne i Uczenie Maszynowe - preskrypt wykładu, Politechnika Warszawska, 2012.

P.Wawrzyński, "Autonomous Reinforcement Learning with Experience Replay for Humanoid Gait Optimization," International Neural Network Society Winter Conference (INNS-WC2012), pp. 205-211, Elsevier, 2012.

P.Wawrzyński, B.Papis, "Fixed point method for autonomous on-line neural network training," Neurocomputing 74, pp. 2893-2905, Elsevier, 2011.

P.Wawrzyński, "Fixed Point Method of Step-size estimation for on-line neural network training," International Joint Conference on Neural Networks (IJCNN), July 18-23, 2010, Barcelona, Spain, IEEE, pp. 2012-2017.

P.Wawrzyński, Systemy adaptacyjne i uczące się - preskrypt wykładu, Oficyna Wydawnicza Politechniki Warszawskiej, 2009.

P.Wawrzyński, "Real-Time Reinforcement Learning by Sequential Actor-Critics and Experience Replay," Neural Networks, vol. 22, pp. 1484-1497, Elsevier, 2009.

P.Wawrzyński, "A Cat-Like Robot Real-Time Learning to Run," Lecture Notes in Computer Science 5495, pp. 380-390, Springer-Verlag, 2009.

P.Wawrzyński, J. Arabas, P. Cichosz, "Predictive Control for Artificial Intelligence in Computer Games," Lecture Notes in Artificial Intelligence 5097, pp. 1137-1148, Springer-Verlag, 2008.

P.Wawrzyński, A.Pacut "Truncated Importance Sampling for Reinforcement Learning with Experience Replay," International Multiconference on Computer Science and Information Technology, pp. 305-315, 2007.

P.Wawrzyński, "Learning to Control a 6-Degree-of-Freedom Walking Robot," EUROCON 2007 The International Conference on Computer as a Tool, pp. 698-705, 2007.

P.Wawrzyński, "Reinforcement Learning in Fine Time Discretization," Lecture Notes in Computer Science 4431, pp. 470-479, 2007.

P.Wawrzyński, A.Pacut, "Balanced Importance Sampling Estimation," International Conference on Information Processing and Management of Uncertainty in Knowledge-based Systems (IPMU), Paris, July 2-7, 2006, pp. 66-73.

P.Wawrzyński, "Symulacja Płaskich łańcuchów Kinematycznych," Raport nr 05-06 Instytutu Automatyki i Informatyki Stosowanej, Listopad 2005.

P.Wawrzyński, A.Pacut, "Reinforcement Learning in Quasi-Continuous Time," International Conference on Computational Intelligence for Modelling, Control and Automation, November 2005, Vienna, Austria, pp. 1031-1036.

P.Wawrzyński, "Intensive Reinforcement Learning," Ph.D. dissertation, Institute of Control and Computation Engineering, Warsaw University of Technology, may 2005.

P.Wawrzyński, A.Pacut, "Model-free off-policy reinforcement learning in continuous environment," International Joint Conference on Neural Networks (IJCNN), Budapest, July 2004, pp. 1091-1096.

P.Wawrzyński, A.Pacut, "Intensive versus nonintensive actor-critic algorithms of reinforcement learning," Lecture Notes in Artificial Intelligence 3070, pp. 934-941, Springer-Verlag, 2004.

P.Wawrzyński, A.Pacut, "A simple actor-critic algorithm for continuous environments," International Conference on Methods and Models in Automation and Robotics (MMAR), August 2004, pp. 1143-1149.

P.Wawrzyński, P.Podsiadly, G.Lehmann, "IOT Methodology of Frequency Assignment in Cellular Network," The MOST International Conference, October 2002, pp. 313-324.

P.Wawrzyński, A.Pacut, "Modeling of distributions with neural approximation of conditional quantiles," IASTED International Conference Artificial Intelligence and Applications, Malaga, Spain, September 2002, pp. 539-543.

Papers with Abstracts

ABSTRACT: Multi-agent reinforcement learning (MARL) offers prospects of efficient control in large distributed systems such as complex energy grids. The development of MARL algorithms is hampered by a scarcity of realistic benchmarks. In this paper, we introduce EnEnv 1.0 --- a simulation benchmark for MARL in modern energy grids. EnEnv 1.0 is a set of environments in which the energy grids are simulated with uncontrollable renewable energy sources, fossil fuel generators, and consumers. The role of learning agents is to control and coordinate batteries in a distributed Battery Energy Storage System (BESS) based on readouts such as weather forecasts and load demand forecasts. The energy grids in EnEnv 1.0 are based on standard test systems of different topological structures. These include the modified standard IEEE 33, Illinois 200, and PEGASE 89 bus systems. These networks are adjusted to serve as the MARL benchmark by introducing real weather observations, demand data for European locations, and software interfaces that enable coupling with a number of existing implementations of MARL algorithms, as well as single-agent reinforcement learning (SARL) algorithms. In the experimental study, we verify the performance of a catalog of MARL and SARL methods on EnEnv 1.0.

Keywords - Multi-Agent Reinforcement Learning, Energy Grid, Battery Energy Storage System

ABSTRACT: Recently, a new Continual Learning (CL) paradigm was presented to control catastrophic forgetting, called Interval Continual Learning (InterContiNet), which relies on enforcing interval constraints on the neural network parameter space. Unfortunately, InterContiNet training is challenging due to the high dimensionality of the weight space, making intervals difficult to manage. To address this issue, we introduce HINT, a technique that employs interval arithmetic within the embedding space and utilizes a hypernetwork to map these intervals to the parameter space of the target network. We train interval embeddings for consecutive tasks and train a hypernetwork to transform these embeddings into weights of the target network. An embedding for a given task is trained along with the hypernetwork, preserving the response of the target network for the previous task embeddings. Interval arithmetic works with lower-dimensional embedding space rather than directly preparing intervals in a high-dimensional weight space. Furthermore, HINT maintains the guarantee of not forgetting. At the end of the training, we can choose one universal embedding to produce a single network dedicated to all tasks. In such a framework, we can utilize one set of weights. HINT obtains significantly better results than InterContiNet and gives SOTA results on several benchmarks.

Keywords - continual learning, regional methods, interval aritchmetic.

P.Wawrzyński, Podstawy sztucznej inteligencji,Oficyna Wydawnicza Politechniki Warszawskiej, 2014, 2019, 2025.

Podręcznik zawiera materiał wprowadzający do dziedziny sztuczna inteligencja. Jest podzielony na trzy części odpowiadające jej głównym działom: przeszukiwaniu, uczeniu maszynowemu i wnioskowaniu. W opracowaniu sztuczna inteligencja jest przedstawiona jako zbiór metod współtworzących arsenał współczesnej informatyki.

ABSTRACT: The ongoing energy transition towards renewable sources increases the importance of energy exchanges and creates demand for automated trading tools on these exchanges. Day-ahead exchanges play a prominent role in this area. Participants in these exchanges place buy/sell bids collections before each trading day. However, machine learning-based approaches to automated trading are based on placing a single bid for each time instant. The bid is either executed or not, depending on the relation between the market price and the bid price. This is contrary to economic rationality, which usually requires buying more when the market price is lower and selling more when it is higher. Single bids do not allow the expression of such preferences. In this paper, we fill this gap and design a policy that translates the information available to the trading agent into price-dependent supply and demand curves. Also, we demonstrate how to train this policy with reinforcement learning and real-life data. Our proposed method is now being deployed in a real system for energy storage management. Here, we demonstrate how it performs in four data-driven simulations. The proposed method outperforms alternatives in all cases.

Keywords - reinforcement learning, automated trading, energy market.

ABSTRACT: Graph embeddings have emerged as a powerful tool for representing complex network structures in a low-dimensional space, enabling the use of efficient methods that employ the metric structure in the embedding space as a proxy for the topological structure of the data. In this paper, we explore several aspects that affect the quality of a vertex embedding of graph-structured data. To this effect, we first present a family of flexible distance functions that faithfully capture the topological distance between different vertices. Secondly, we analyze vertex embeddings as resulting from a fitted transformation of the distance matrix rather than as a direct result of optimization. Finally, we evaluate the effectiveness of our proposed embedding constructions by performing community detection on a host of benchmark datasets. The reported results are competitive with classical algorithms that operate on the entire graph while benefitting from a substantially reduced computational complexity due to the reduced dimensionality of the representations.

Keywords - graph neural networks, geometrical embeddings.

J. Łyskawa, P. Wawrzyński, "Actor-Critic with Variable Time Discretization via Sustained Actions", Neural Information Processing (ICONIP), pp. 476-489, 2023, preprint.

ABSTRACT: Reinforcement learning (RL) methods work in discrete time. In order to apply RL to inherently continuous problems like robotic control, a specific time discretization needs to be defined. This is a choice between sparse time control, which may be easier to train, and finer time control, which may allow for better ultimate performance. In this work, we propose SusACER, an off-policy RL algorithm that combines the advantages of different time discretization settings. Initially, it operates with sparse time discretization and gradually switches to a fine one. We analyze the effects of the changing time discretization in robotic control environments: Ant, HalfCheetah, Hopper, and Walker2D. In all cases our proposed algorithm outperforms state of the art.

Keywords - reinforcement learning, frame skipping, robotic control.

Ł. Neumann, Ł. Lepak, P. Wawrzyński, "Least Redundant Gated Recurrent Neural Network", International Joint Conference on Neural Networks (IJCNN), pp. 1-10, 2023, preprint.

ABSTRACT: Recurrent neural networks are important tools for sequential data processing. However, they are notorious for problems regarding their training. Challenges include capturing complex relations between consecutive states and stability and efficiency of training. In this paper, we introduce a recurrent neural architecture called Deep Memory Update (DMU). It is based on updating the previous memory state with a deep transformation of the lagged state and the network input. The architecture is able to learn to transform its internal state using any nonlinear function. Its training is stable and fast due to relating its learning rate to the size of the module. Even though DMU is based on standard components, experimental results presented here confirm that it can compete with and often outperform state-of-the-art architectures such as Long Short-Term Memory, Gated Recurrent Units, and Recurrent Highway Networks.

Keywords - Recurrent neural networks, universal function approximation.

W. Masarczyk, P. Wawrzyński, D. Marczak, K. Deja, T. Trzciński, "Logarithmic Continual Learning", IEEE Access, vol. 10, pp. 117001-117010, DOI:10.1109/ACCESS.2022.3218907, 2022.

ABSTRACT: We introduce a neural network architecture that logarithmically reduces the number of self-rehearsal steps in the generative rehearsal of continually learned models. In continual learning (CL), training samples come in subsequent tasks, and the trained model can access only a current task. Contemporary CL methods employ generative models to replay previous samples and train them recursively with a combination of current and regenerated past data. This recurrence leads to superfluous computations as the same past samples are regenerated after each task, and the reconstruction quality successively degrades. In this work, we address these limitations and propose a new generative rehearsal architecture that requires, at most, a logarithmic number of retraining sessions for each sample. Our approach leverages the allocation of past data in a set of generative models such that most of them do not require retraining after a task. The experimental evaluation of our logarithmic continual learning approach shows the superiority of our method with respect to the state-of-the-art generative rehearsal methods.

Keywords - Continual learning, generative rehersal.

J. Łyskawa, P. Wawrzyński, "ACERAC: Efficient reinforcement learning in fine time discretization", IEEE Transactions on Neural Networks and Learning Systems, DOI:10.1109/TNNLS.2022.3190973, 2022.

ABSTRACT: We propose a framework for reinforcement learning (RL) in fine time discretization and a learning algorithm in this framework. One of the main goals of RL is to provide a way for physical machines to learn optimal behavior instead of being programmed. However, the machines are usually controlled in fine time discretization. The most common RL methods apply independent random elements to each action, which is not suitable in that setting. It is not feasible because it causes the controlled system to jerk, and does not ensure sufficient exploration since a single action is not long enough to create a significant experience that could be translated into policy improvement. In the RL framework introduced in this paper, policies are considered that produce actions based on states and random elements autocorrelated in subsequent time instants. The RL algorithm introduced here approximately optimizes such a policy. The efficiency of this algorithm is verified against three other RL methods (PPO, SAC, ACER) in four simulated learning control problems (Ant, HalfCheetah, Hopper, and Walker2D) in diverse time discretization. The algorithm introduced here outperforms the competitors in most cases considered.

Keywords - Reinforcement learning, experience replay, fine time discretization.

A. Małkowski, J. Grzechociński, P. Wawrzyński, "ReGAE: Graph Autoencoder Based on Recursive Neural Networks", International Conference on Neural Information Processing (ICONIP), arXiv:2201.12165, 2022.

ABSTRACT: Invertible transformation of large graphs into constant dimensional vectors (embeddings) remains a challenge. In this paper we address it with recursive neural networks: The encoder and the decoder. The encoder network transforms embeddings of subgraphs into embeddings of larger subgraphs, and eventually into the embedding of the input graph. The decoder does the opposite. The dimension of the embeddings is constant regardless of the size of the (sub)graphs. Simulation experiments presented in this paper confirm that our proposed graph autoencoder can handle graphs with even thousands of vertices..

Keywords - Graph neural networks, Autoencoders, Recursive neural networks.

G. Rypeść, Ł. E. Lepak, P. Wawrzyński, "Reinforcement Learning for on-line Sequence Transformation", Conference on Computer Science and Intelligence Systems (FedCSIS), pp. 133-139, pdf, 2022.

ABSTRACT: A number of problems in the processing of sound and natural language, as well as in other areas, can be reduced to simultaneously reading an input sequence and writing an output sequence of generally different length. There are well developed methods that produce the output sequence based on the entirely known input. However, efficient methods that enable such transformations on-line do not exist. In this paper we introduce an architecture that learns with reinforcement to make decisions about whether to read a token or write another token. This architecture is able to transform potentially infinite sequences on-line. In an experimental study we compare it with state-of-the-art methods for neural machine translation. While it produces slightly worse translations than Transformer, it outperforms the autoencoder with attention, even though our architecture translates texts on-line thereby solving a more difficult problem than both reference methods.

Keywords - Reinforcement learning, neural machine translation.

ABSTRACT: We propose a new method for unsupervised continual knowledge consolidation in generative models that relies on the partitioning of Variational Autoencoder's latent space. Acquiring knowledge about new data samples without forgetting previous ones is a critical problem of continual learning. Currently proposed methods achieve this goal by extending the existing model while constraining its behavior not to degrade on the past data, which does not exploit the full potential of relations within the entire training dataset. In this work, we identify this limitation and posit the goal of continual learning as a knowledge accumulation task. We solve it by continuously re-aligning latent space partitions that we call bands which are representations of samples seen in different tasks, driven by the similarity of the information they contain. In addition, we introduce a simple yet effective method for controlled forgetting of past data that improves the quality of reconstructions encoded in latent bands and a latent space disentanglement technique that improves knowledge consolidation. On top of the standard continual learning evaluation benchmarks, we evaluate our method on a new knowledge consolidation scenario and show that the proposed approach outperforms state-of-the-art by up to twofold across all testing scenarios.

Keywords - Continual learning.

P. Wawrzyński, W. Masarczyk, Mateusz Ostaszewski, "Reinforcement learning with experience replay and adaptation of action dispersion", arXiv:2208.00156, 2022.

ABSTRACT: Effective reinforcement learning requires a proper balance of exploration and exploitation defined by the dispersion of action distribution. However, this balance depends on the task, the current stage of the learning process, and the current environment state. Existing methods that designate the action distribution dispersion require problem-dependent hyperparameters. In this paper, we propose to automatically designate the action distribution dispersion using the following principle: This distribution should have sufficient dispersion to enable the evaluation of future policies. To that end, the dispersion should be tuned to assure a sufficiently high probability (densities) of the actions in the replay buffer and the modes of the distributions that generated them, yet this dispersion should not be higher. This way, a policy can be effectively evaluated based on the actions in the buffer, but exploratory randomness in actions decreases when this policy converges. The above principle is verified here on challenging benchmarks Ant, HalfCheetah, Hopper, and Walker2D, with good results. Our method makes the action standard deviations converge to values similar to those resulting from trial-and-error optimization.

STRESZCZENIE: Autorami poszczególnych rozdziałów opracowania są wykładowcy Wydziału Elektroniki i Technik Informacyjnych Politechniki Warszawskiej, którzy aktywnie biorą udział w rozwoju sztucznej inteligencji. Planując zakres i treść książki, z szerokiej gamy metod i zagadnień autorzy wybrali te, które uważają za szczególnie istotne i mające zastosowanie w ramach całego obszaru SI – również w obrębie innych, nieomówionych tutaj metod. W rozdziale pierwszym znajdują się informacje o historii, charakterze i o zastosowaniach sztucznej inteligencji. Rozdział drugi traktuje o podstawowym (nie tylko dla sztucznej inteligencji) zagadnieniu przeszukiwania przestrzeni stanów w poszukiwaniu rozwiązań zadanego problemu. Towarzyszy temu omówienie metod optymalizacji, które wskazują najlepsze rozwiązanie z punktu widzenia przyjętego kryterium. Tematem kolejnego rozdziału jest uczenie maszynowe. Rozdział czwarty został poświęcony architekturom sztucznych sieci neuronowych, w tym sieciom głębokim. W rozdziale piątym znajduje się prezentacja i dyskusja dotycząca wzajemnych związków etyki i sztucznej inteligencji, ze szczególnym naciskiem na konieczność przedstawiania wyników działania systemów SI w sposób zrozumiały dla człowieka. Każdy rozdział jest opatrzony notą bibliograficzną, która podaje pozycje rozszerzające omówiony materiał.
Książka może służyć jako podręcznik i wsparcie dydaktyczne wykładów z zakresu SI oraz jako materiał referencyjny dla przedstawionych w niej metod i algorytmów.

Keywords - Sztuczna inteligencja.

P. Wawrzyński, Uczące się systemy decyzyjne, Oficyna Wydawnicza Politechniki Warszawskiej, 2021.

STRESZCZENIE: Podręcznik omawia podejścia do sekwencyjnego problemu decyzyjnego w warunkach niepewności, w sytuacji, kiedy agent podejmujący decyzje nie zna z góry środowiska, w którym je podejmuje. A zatem, agent musi nauczyć się podejmować właściwe decyzje podejmując próby i popełniając błędy. Omówione zostają kluczowe dziedziny wiedzy dostarczające rozwiązań dla tak postawionego problemu. Najwięcej miejsca zajmuje gałąź uczenia maszynowego pod nazwą uczenie się ze wzmocnieniem. Wyczerpująco zostają omówione główne wyniki z dziedzin programowania dynamicznego oraz sterowania adaptacyjnego. Pokrótce omówione są także inne podejścia, jak sterowanie z iteracyjnym uczeniem się, aproksymowane programowanie dynamiczne, stochastyczne sterowanie adaptacyjne, a także Filtr Kalmana.

Keywords - uczenie się ze wzmocnieniem, sterowanie adaptacyjne, programowanie dynamiczne.

ABSTRACT: We introduce a binary latent space autoencoder architecture to rehearse training samples for the continual learning of neural networks. The ability to extend the knowledge of a model with new data without forgetting previously learned samples is a fundamental requirement in continual learning. Existing solutions address it by either replaying past data from memory, which is unsustainable with growing training data, or by reconstructing past samples with generative models that are trained to generalize beyond training data and, hence, miss important details of individual samples. In this paper, we take the best of both worlds and introduce a novel generative rehearsal approach called BinPlay. Its main objective is to find a quality-preserving encoding of past samples into precomputed binary codes living in the autoencoder's binary latent space. Since we parametrize the formula for precomputing the codes only on the chronological indices of the training samples, the autoencoder is able to compute the binary embeddings of rehearsed samples on the fly without the need to keep them in memory. Evaluation on three benchmark datasets shows up to a twofold accuracy improvement of BinPlay versus competing generative replay methods.

Keywords - Continual Learning, Binary latent space autoencoder.

D. Chen, P. Wawrzynski, Z. Lv, "Cyber security in smart cities: A review of deep learning-based applications and case studies", Sustainable Cities and Society, 2021, vol. 66, pp.1-12. DOI:10.1016/j.scs.2020.102655

ABSTRACT: On the one hand, smart cities have brought about various changes, aiming to revolutionize people's lives. On the other hand, while smart cities bring better life experiences and great convenience to people's lives, there are more hidden dangers of cyber security, including information leakage and malicious cyber attacks. The current cyber security development cannot keep up with the eager adoption of global smart city technologies so correct design based on deep learning methods is essential to protect smart city cyber. This paper summarizes the knowledge and interpretation of Smart Cities (SC), Cyber Security (CS), and Deep Learning (DL) concepts as well as discussed existing related work on IoT security in smart cities. Specifically, we briefly reviewed several deep learning models, including Boltzmann machines, restricted Boltzmann machines, deep belief networks, recurrent neural networks, convolutional neural networks, and generative adversarial networks. Then we introduced cyber security applications and use cases based on deep learning technology in smart cities. Finally, we describe the future development trend of smart city cyber security.

Keywords - Smart cities, Cyber security, Deep learning, A review.

M.Szulc, J.Łyskawa, P.Wawrzyński, "Framework for Reinforcement Learning with Autocorrelated Actions", International Conference on Neural Information Processing (ICONIP), pp. 90-101, 2020. [preprint]

ABSTRACT: The subject of this paper is reinforcement learning. Policies are considered here that produce actions based on states and random elements autocorrelated in subsequent time instants. Consequently, an agent learns from experiments that are distributed over time and potentially give better clues to policy improvement. Also, physical implementation of such policies, e.g. in robotics, is less problematic, as it avoids making robots shake. This is in opposition to most RL algorithms which add white noise to control causing unwanted shaking of the robots. An algorithm is introduced here that approximately optimizes the aforementioned policy. Its efficiency is verified for four simulated learning control problems (Ant, HalfCheetah, Hopper, and Walker2D) against three other methods (PPO, SAC, ACER). The algorithm outperforms others in three of these problems.

Keywords - Reinforcement learning, Actor-Critic, Experience replay, Fine time discretization.

P.Wawrzyński, P.Zawistowski, L.Lepak, "Automatic hyperparameter tuning in on-line learning: Classic Momentum and ADAM", International Joint Conference on Neural Networks (IJCNN), 2020. [pdf]

ABSTRACT: We propose a method that adapts hyperparameters, namely step-sizes and momentum decay factors, in on-line learning with classic momentum and ADAM. The approach is based on the estimation of the short- and long-term inﬂuence of these hyperparameters on the loss value. In the experimental study, our approach is applied to on-line learning in small neural networks and deep autoencoders. Automatically tuned coefﬁcients surpass or roughly match the best ones selected manually in terms of learning speed. As a result, on-line learning can be a fully automatic process, producing results from the ﬁrst run, without preliminary experiments aimed at manual hyperparameter tuning.

Keywords - online optimization, Stochastic Gradient Descent, hyperparameter tuning, neural networks, Classic Momentum, ADAM.

K.Checinski, P.Wawrzyński, "DCT-Conv: Coding ﬁlters in convolutional networks with Discrete Cosine Transform", International Joint Conference on Neural Networks (IJCNN), 2020. [preprint]

ABSTRACT: Convolutional neural networks are based on a huge number of trained weights. Consequently, they are often datagreedy, sensitive to overtraining, and learn slowly. We follow the line of research in which ﬁlters of convolutional neural layers are determined on the basis of a smaller number of trained parameters. In this paper, the trained parameters deﬁne a frequency spectrum which is transformed into convolutional ﬁlters with Inverse Discrete Cosine Transform (IDCT, the same is applied in decompression from JPEG). We analyze how switching off selected components of the spectra, thereby reducing the number of trained weights of the network, affects its performance. Our experiments show that coding the ﬁlters with trained DCT parameters leads to improvement over traditional convolution. Also, the performance of the networks modiﬁed this way decreases very slowly with the increasing extent of switching off these parameters. In some experiments, a good performance is observed when even 99.9% of these parameters are switched off.

Keywords - neural networks, parameters reduction, convolution, discrete cosine transform.

P.Wawrzyński, "Efﬁcient on-line learning with diagonal approximation of loss function Hessian", International Joint Conference on Neural Networks (IJCNN), 2019. [pdf]

ABSTRACT: The subject of this paper is stochastic optimization as a tool for on-line learning. New ingredients are introduced to Nesterov's Accelerated Gradient that increase efficiency of this algorithm and determine its parameters that are otherwise tuned manually: step-size and momentum decay factor. In this order a diagonal approximation of the Hessian of the loss function is estimated. In the experimental study the approach is applied to various types of neural networks, deep ones among others.

Keywords - on-line learning, accelerated gradient, parameter autotuning, deep learning.

A.Zanetti, A.Testolin, M.Zorzi, P.Wawrzynski, "Numerosity Representation in InfoGAN: An Empirical Study", International Work-Conference on Artificial Neural Networks (IWANN), pp.49-60, 2019. [link]

ABSTRACT: It has been shown that “visual numerosity emerges as a statistical property of images in ‘deep networks’ that learn a hierarchical generative model of the sensory input”, through unsupervised deep learning [1]. The original deep generative model was based on stochastic neurons and, more importantly, on input (image) reconstruction. Statistical analysis highlighted a correlation between the numerosity present in the input and the population activity of some neurons in the second hidden layer of the network, whereas population activity of neurons in the first hidden layer correlated with total area (i.e., number of pixels) of the objects in the image. Here we further investigate whether numerosity information can be isolated as a disentangled factor of variation of the visual input. We train in unsupervised and semi-supervised fashion a latent-space generative model that has been shown capable of disentangling relevant semantic features in a variety of complex datasets, and we test its generative performance under different conditions. We then propose an approach to the problem based on the assumption that, in order to let numerosity emerge as disentangled factor of variation, we need to cancel out the sources of variation at graphical level.

P.Wawrzyński, "ASD+M: Automatic parameter tuning in stochastic optimization and on-line learning", Neural Networks, vol. 96, pp. 1-10, 2017. [DOI]

ABSTRACT: In this paper the classic momentum algorithm for stochastic optimization is considered. A method is introduced that adjusts coefficients for this algorithm during its operation. The method does not depend on any preliminary knowledge of the optimization problem. In the experimental study, the method is applied to on-line learning in feed-forward neural networks, including deep auto-encoders, and outperforms any fixed coefficients. The method eliminates coefficients that are difficult to determine, with profound influence on performance. While the method itself has some coefficients, they are ease to determine and sensitivity of performance to them is low. Consequently, the method makes on-line learning a practically parameter-free process and broadens the area of potential application of this technology.

Keywords - Stochastic gradient descent, classic momentum, step-size, learning rate, on-line learning, deep learning.

ABSTRACT: In this paper a method is introduced that combines Inertial Measurement Unit (IMU) readouts with low accuracy and temporarily unavailable velocity measurements (e.g., based on kinematics or GPS) to produce high accuracy estimates of velocity and orientation with respect to gravity. The method is computationally cheap enough to be readily implementable in sensors. The main area of application of the introduced method is mobile robotics.

Keywords - velocity estimation, Kalman filter, mobile robotics.

ABSTRACT: The subject of this paper is a two-wheeled balancing robot with the center of mass above its wheels. Two control strategies for this robot are analyzed. The first one combines a kinematic model of the robot and a PI controller. The second one is a cascade of two PIDs. These strategies are compared experimentally.

Keywords - mobile robots, inverted pendulum, cost-effective robots.

ABSTRACT: Direct application of reinforcement learning in robotics rises the issue of discontinuity of control signal. Consecutive actions are selected independently on random, which often makes them excessively far from one another. Such control is hardly ever appropriate in robots, it may even lead to their destruction. This paper considers a control policy in which consecutive actions are modified by autocorrelated noise. That policy generally solves the aforementioned problems and it is readily applicable in robots. In the experimental study it is applied to three robotic learning control tasks: Cart-Pole SwingUp, Half-Cheetah, and a walking humanoid.

Index Terms - Machine learning, reinforcement learning, actorcritics, robotics.

ABSTRACT: Availability of the instantaneous velocity of a legged robot is usually required for its efficient control. However, estimation of velocity only on the basis of robot kinematics has a significant drawback: the robot is not in touch with the ground all the time, or its feet may twist. In this paper we introduce a method for velocity and tilt estimation in a walking robot. This method combines a kinematic model of the supporting leg and readouts from an inertial sensor. It can be used in any terrain, regardless of the robots body design or the control strategy applied, and it is robust in regard to foot twist. It is also immune to limited foot slide and temporary lack of foot contact.

J.Mozaryn, J.Klimaszewski, D.Swieczkowski-Feiz, P.Kolodziejczyk, P.Wawrzyński, "Design process and experimental verification of the quadruped robot wave gait", International Conference on Methods and Models in Automation and Robotics (MMAR), pp. 206-211, 2014. doi:10.1109/MMAR.2014.6957352.

ABSTRACT: In this paper there is presented the design process and experimental verification of the quadruped robot wave gait. Mathematical model of a robot movement is a result of linking together derived leg movement equations with a scheme of their locomotion. The gait is designed and analysed based on twostep design procedure which consists of simulations using MSC Adams and Matlab environments and experimental verification using real quadruped robot.

P.Wawrzyński, "Reinforcement Learning with Experience Replay for Model-Free Humanoid Walking Optimization," International Journal of Humanoid Robotics, Vol. 11, No. 3, pp. 1450024, 2014. doi:10.1142/S0219843614500248.

ABSTRACT: In this paper a control system for humanoid robot walking is approximately optimized by means of reinforcement learning. Given is a 18 DOF humanoid whose gait is based on replaying a simple trajectory. This trajectory is translated into a reactive policy. A neural network whose input represents the robot state learns to produce appropriate output that additively modifies the initial control. The learning algorithm applied is Actor-Critic with experience replay. In 50 minutes of learning, the slow initial gait changes to a dexterous and fast walking. No model of the robot dynamics is engaged. The methodology in use is generic and can be applied to optimize control systems for diverse robots of comparable complexity.

Keywords: reinforcement learning, learning in robots, humanoids, bipedal walking.

ABSTRACT: This paper introduces dotRL, a platform that enables fast implementation and testing of Reinforcement Learning algorithms against diverse environments. dotRL has been written under .NET framework and its main characteristics include: (i) adding a new learning algorithm or environment to the platform only requires implementing a simple interface, from then on it is ready to be coupled with other environments and algorithms, (ii) a set of tools is included that aid running and reporting experiments, (iii) a set of benchmark environments is included with as demanding as Octopus-Arm and Half-Cheetah, (iv) the platform is available for instantaneous download, compilation, and execution, without libraries from different sources.

Index Terms - Reinforcement learning, evaluation platform, software engineering.

ABSTRACT: Availability of momentary velocity of a legged robot is essential for its efficient control. However, estimation of the velocity is difficult, because the robot does not need to touch the ground all the time or its feet may twist. In this paper we introduce a method for velocity estimation in a legged robot that combines kinematic model of the supporting leg, readouts from an inertial sensor, and Kalman Filter. The method alleviates all the above mentioned difficulties.

Index Terms - legged locomotion, velocity estimation, Kalman Filter.

ABSTRACT: In this paper a method is presented for learning of spiking neural networks. It is based on perturbation of synaptic conductances. While this approach is known to be model-free, it is also known to be slow, because it applies improvement direction estimates with large variance. Two ideas are analysed to alleviate this problem: First, learning of many networks at the same time instead of one. Second, autocorrelation of perturbations in time. In the experimental study the method is validated on three learning tasks in which information is conveyed with frequency and spike timing.

Index terms - Spiking neural networks, learning.

P.Wawrzyński, A.K.Tanwani,"Autonomous Reinforcement Learning with Experience Replay," Neural Networks, vol. 41, pp. 156-167, Elsevier, 2013. doi:10.1016/j.neunet.2012.11.007.

ABSTRACT: This paper considers the issues of efficiency and autonomy that are required to make reinforcement learning suitable for real-life control tasks. A real-time reinforcement learning algorithm is presented that repeatedly adjusts the control policy with the use of previously collected samples, and autonomously estimates the appropriate step-sizes for the learning updates. The algorithm is based on the actor-critic with experience replay whose step-sizes are determined on-line by an enhanced fixed point algorithm for on-line neural network training. An experimental study with simulated octopus arm and half-cheetah demonstrates the feasibility of the proposed algorithm to solve difficult learning control problems in an autonomous way within reasonably short time.

Keywords: reinforcement learning, autonomous learning, step-size estimation, actor-critic

P.Wawrzyński, Sterowanie Adaptacyjne i Uczenie Maszynowe - preskrypt wykładu, Politechnika Warszawska, 2012.

STRESZCZENIE:Skrypt omawia różne podejścia do adaptacji w zastosowaniu do optymalizacji działania systemów sterujących. Te podejścia to: uczenie się ze wzmocnieniem (reinforcement learning), sterowanie adaptacyjne z modelem referencyjnym (model reference adaptive control), samostrojące się regulatory (self-tuning regulators). Ponadto, w skrypcie dokonany jest przegląd innych forma adaptacji, której można użyć w systemach technicznych, np. omówiony jest Filtr Kalmana.

P.Wawrzyński, "Autonomous Reinforcement Learning with Experience Replay for Humanoid Gait Optimization," Proceedings of the International Neural Network Society Winter Conference (INNS-WC2012), pp. 205-211, Elsevier, 2012. doi:10.1016/j.procs.2012.09.130.

ABSTRACT: This paper demonstrates application of Reinforcement Learning to optimization of control of a complex system in realistic setting that requires efficiency and autonomy of the learning algorithm. Namely, Actor-Critic with experience replay (which addresses efficiency), and the Fixed Point method for step-size estimation (which addresses autonomy) is applied here to approximately optimize humanoid robot gait. With complex dynamics and tens of continuous state and action variables, humanoid gait optimization represents a challenge for analytical synthesis of control. The presented algorithm learns a nimble gait within 80 minutes of training.

Keywords: Reinforcement learning; Autonomous learning; Learning in robots

P.Wawrzyński, B.Papis, "Fixed point method for autonomous on-line neural network training," Neurocomputing 74, pp. 2893-2905, Elsevier, 2011. doi:10.1016/j.neucom.2011.03.029.

ABSTRACT: This paper considers on-line training of feedforward neural networks. Training examples are only available through sampling from a certain, possibly infinite, distribution. In order to make the learning process autonomous, one can employ Extended Kalman Filter or stochastic steepest descent with adaptively adjusted step-sizes. Here the latter is considered. A scheme of determining step-sizes is introduced that satisfies the following requirements: (i) it does not need any auxiliary problem-dependent parameters, (ii) it does not assume any particular loss function that the training process is intended to minimize, (iii) it makes the learning process stable and efficient. An experimental study with several approximation problems is presented. Within this study the presented approach is compared with Extended Kalman Filter and LFI, with satisfactory results.

Keywords: On-line learning; Autonomous learning; Step-size adaptation; Extended Kalman Filter

P.Wawrzyński, "Fixed point method of step-size estimation for on-line neural network training," Intenational Joint Conference on Neural Networks (IJCNN), July, 18-23, 2010, Barcelona, Spain, IEEE, pp. 2012-2017. [pdf]

ABSTRACT:This paper considers on-line training of feadforwardneural networks. Training examples are only availablesampled randomly from a given generator. What emerges inthis setting is the problem of step-sizes, or learning rates,adaptation. A scheme of determining step-sizes is introducedhere that satisfies the following requirements: (i) it does notneed any auxiliary problem-dependent parameters, (ii) it doesnot assume any particular loss function that the training processis intended to minimize, (iii) it makes the learning process stable and efficient. An experimental study with the 2D Gabor functionapproximation is presented.

Keywords: neural networks, on-line learning, step-size adaptation,reinforcement learning.

P.Wawrzyński, Systemy adaptacyjne i uczące się - preskrypt wykładu, Oficyna Wydawnicza Politechniki Warszawskiej, 2009.

W skrypcie omówiono mechanizmy adaptacji możliwe do aplikowania w systemach tworzonych przez człowieka. Celem adaptacji jest poprawa działania systemu w trakcie pracy. Nie zawsze funkcjonowanie zaprojektowanego systemu jest zadowalające, więc musi on się nauczyć jak działać optymalnie. W pracy podano metody i algorytmy potrzebne przy projektowaniu systemów adaptacyjnych i uczących.

P.Wawrzyński, "Real-Time Reinforcement Learning by Sequential Actor-Critics and Experience Replay," Neural Networks, vol. 22, pp. 1484-1497, Elsevier, 2009. doi:10.1016/j.neunet.2009.05.011.

ABSTRACT: Actor-Critics constitute an important class of reinforcement learning algorithms that can deal with continuous actions and states in an easy and natural way. This paper shows how these algorithms can be augmented by the technique of experience replay without degrading their convergence properties, by appro- priately estimating the policy change direction. This is achieved by truncated importance sampling applied to the recorded past experiences. It is formally shown that the resulting estimation bias is bounded and asymptotically vanishes, which allows the experience replay-augmented algorithm to preserve the convergence properties of the original algorithm. The technique of experience replay makes it possible to utilize the available computational power to reduce the required number of interactions with the environment considerably, which is essential for real-world applications. Experimental results are presented that demonstrate that the combination of experience replay and Actor-Critics yields extremely fast learning algorithms that achieve successful policies for nontrivial control tasks in considerably short time. Namely, the policies for the cart-pole swing-up (Doya, 2000) are obtained after as little as 20 minutes of the cart-pole time and the policy for Half-Cheetah (a walking 6-degree-of-freedom robot) is obtained after four hours of Half-Cheetah time.

P.Wawrzyński, "A Cat-Like Robot Real-Time Learning to Run," Lecture Notes in Computer Science 5495, pp. 380-390, Springer-Verlag, 2009. doi:10.1007/978-3-642-04921-7_39. For demo see here.

ABSTRACT: Actor-Critics constitute an important class of reinforcement learning algorithms that can deal with continuous actions and states in an easy and natural way. In their original, sequential form, these algo- rithms are usually to slow to be applicable to real-life problems. However, they can be augmented by the technique of experience replay to obtain a satisfying speed of learning without degrading their convergence prop- erties. In this paper experimental results are presented that show that the combination of experience replay and Actor-Critics yields very fast learning algorithms that achieve successful policies for nontrivial control tasks in considerably short time. Namely, a policy for a model of 6-degree-of-freedom walking robot is obtained after 4 hours of the robot's time.

P.Wawrzyński, J. Arabas, P. Cichosz, "Predictive Control for Artificial Intelligence in Computer Games," Lecture Notes in Artificial Intelligence 5097, pp. 1137-1148, Springer-Verlag, 2008. doi:10.1007/978-3-540-69731-2_107.

ABSTRACT: The subject of this paper is artificial intelligence (AI) of non-player characters in computer games, i.e. bots. We develop an idea of game AI based on predictive control. Bot's activity is defined by a currently realized plan. This plan results from an optimization process in which random plans are continuously generated and reselected. We apply our idea to implement a bot for the game Half-Life. Our bot, Randomly Planning Fighter (RPF), defeats the bot earlier designed for Half-Life with the use of behavior-based techniques. The experiments prove that on-line planning can be feasible in rapidly changing environment of modern computer games.

ABSTRACT: Reinforcement Learning (RL) is considered here as an adaptation technique of neural controllers of machines. The goal is to make Actor-Critic algorithms require less agent-environment interaction to obtain policies of the same quality, at the cost of additional background computations. We propose to achieve this goal in the spirit of it experience replay. An estimation method of improvement direction of a changing policy, based on preceding experience, is essential here. We propose one that uses truncated importance sampling. We derive bounds of bias of that type of estimators and prove that this bias asymptotically vanishes. In the experimental study we apply our approach to the classic Actor-Critic and obtain 20-fold increase in speed of learning.

P.Wawrzyński, "Learning to Control a 6-Degree-of-Freedom Walking Robot," EUROCON 2007 The International Conference on Computer as a Tool, pp. 698-705, 2007. [pdf]

ABSTRACT: We analyze the issue of optimizing a control policy for a complex system in a simulated trial-and-error learning process. The approach to this problem we consider is Reinforcement Learning (RL). Stationary policies, applied by most RL methods, may be improper in control applications, since for time discretization fine enough they do not exhibit exploration capabilities and define policy gradient estimators of very large variance. As a remedy to those difficulties, we proposed earlier the use of piecewise non- Markov policies. In the experimental study presented here we apply our approach to a 6-degree-of-freedom walking robot and obtain an efficient policy for this object.

P.Wawrzyński, "Reinforcement Learning in Fine Time Discretization," Lecture Notes in Computer Science 4431, pp. 470-479, 2007. doi:10.1007/978-3-540-71618-1_52.

ABSTRACT: Reinforcement Learning (RL) is analyzed here as a tool for control system optimization. State and action spaces are assumed to be continuous. Time is assumed to be discrete, yet the discretization may be arbitrarily fine. It is shown here that stationary policies, applied by most RL methods, are improper in control applications, since for fine time discretization they can not assure bounded variance of policy gradient estimators. As a remedy to that difficulty, we propose the use of piecewise non-Markov policies. Policies of this type can be optimized by means of most RL algorithms, namely those based on likelihood ratio.

ABSTRACT: In this paper we analyze a particular issue of estimation, namely the estimation of the expected value of an unknown function for a given distribution, with the samples drawn from other distributions. A motivation of this problem comes from machine learning. In reinforcement learning, an intelligent agent that learns to make decisions in an unknown environment encounters the problem of judging an arbitrary decision policy (the given distribution) on the basis of previous decisions and their outcomes suggested by previous policies (other distributions).
The problem can be solved with the use of well established importance sampling estimators. To overcome a potential problem of excessive variance of such estimators, we introduce the family of balanced importance sampling estimators, prove their consistency and demonstrate empirically their superiority over the classical counterparts.

Keywords: Estimation, Importance Sampling, Machine Learning, Reinforcement Learning.

P.Wawrzyński, "Symulacja Płaskich łańcuchów Kinematycznych," Raport nr 05-06 Instytutu Automatyki i Informatyki Stosowanej, Listopad 2005. [pdf]

STRESZCZENIE: W raporcie przedstawiony jest pewien algorytm symulowania dynamiki płaskich łancuchów kinematycznych. Jest on oparty na metodzie Eulera-Newtona. Przyjmuje się, że w ciągu kwantu czasu przyspieszenia w obiekcie są stałe. Istotą algorytmu jest zatem znalezienie tych przyspieszeń. Koszt obliczeniowy tej operacji jest liniowy w liczbie elementów obiektu.
Analizowane są płaskie łancuchy kinematyczne w postaci prętów (ogniw) połączonych obrotowymi stopniami swobody (złączami). Ogniwa są sztywne a cała masa obiektu jest rozlokowana w złaczach. Przeanalizowane sa złacza kilku typów: poruszające się bez ograniczeń, poruszajace sie po prostej, poruszające sie z zadanym przyspieszeniem. Ponadto kąt między ogniwami sąsiadującymi ze złączem może być stały lub zmianiać się stosowanie do indukowanych w łancuchu przyspieszeń złącz. Analizie poddano typowe zjawiska towarzyszące symulacji takie jak zderzenia (upadki) złącz.

Reinforcement Learning (RL) is used here as a tool for control systems optimization. State and action spaces are assumed to be continuous. Time is assumed to be discrete, yet the discretization may be arbitrarily fine. Within the proposed algorithm, a piece of information that leads to a policy improvement, is inferred from an experiment that lasts for several consecutive steps, rather than from a single step, as in more traditional RL methods. Simulations reveal that the algorithm is able to optimize the control policies for plants for which it is very difficult to apply the traditional methods.

Keywords: Machine Learning, Reinforcement Learning, Adaptive Control.

P.Wawrzyński, "Intensive Reinforcement Learning," Ph.D. dissertation, Institute of Control and Computation Engineering, Warsaw University of Technology, may 2005. [ps pdf]

ABSTRACT: The Reinforcement Learning (RL) problem is analyzed in this dissertation in the language of statistics as an estimation issue. A family of RL algorithms is introduced. They determine a control policy by processing the entire known history of plant-controller interactions. Stochastic approximation as a mechanism that makes the classical RL algorithms converge is replaced with batch estimation. The experimental study shows that the algorithms obtained are able to identify parameters of nontrivial controllers within a few dozens of minutes of control. This makes them a number of times more efficient than their existing equivalents.

P.Wawrzyński, A.Pacut, "Model-free off-policy reinforcement learning in continuous environment," International Joint Conference on Neural Networks (IJCNN), Budapest, July 2004, pp. 1091-1096. [ps pdf]

ABSTRACT: We introduce an algorithm of reinforcement learning in continuous state and action spaces. In order to construct a control policy, the algorithm utilizes the entire history of agent-environment interaction. The policy is a result of an estimation process based on all available information rather than result of stochastic convergence as in classical reinforcement learning approaches. The policy is derived from the history directly, not through any kind of a model of the environment.
We test our algorithm in the Cart-Pole Swing-Up simulated environment. The algorithm learns to control this plant in about 100 trials, which corresponds to 15 minutes of plant's real time. This time is several times shorter than the one required by other algorithms.

P.Wawrzyński, A.Pacut, "Intensive versus nonintensive actor-critic algorithms of reinforcement learning," Lecture Notes in Artificial Intelligence 3070, pp. 934-941, Springer-Verlag, 2004. doi: 10.1007/978-3-540-24844-6_145.

ABSTRACT: Algorithms of reinforcement learning usually employ consecutive agent's actions to construct gradients estimators to adjust agent's policy. The policy is a result of some kind of stochastic approximation. Because of the slowness of stochastic approximation, such algorithms are usually much too slow to be employed, e.g. in real-time adaptive control.
In this paper we analyze the replacing of the stochastic approximation with the estimation based on the entire available history of an agent-environment interaction. We design an algorithm of reinforcement learning in continuous space/action domain that is of orders of magnitude faster then the classical methods.

P.Wawrzyński, A.Pacut, "A simple actor-critic algorithm for continuous environments," International Conference on Methods and Models in Automation and Robotics (MMAR), August 2004, pp. 1143-1149.

ABSTRACT: In reference to methods analyzed recently by Sutton et al, and Konda & Tsitsiklis, we propose their modification called Randomized Policy Optimizer (RPO). The algorithm has a modular structure and is based on the value function rather than on the action-value function. The modules include neural approximators and a parameterized distribution of control actions. The distribution must belong to a family of smoothly exploring distributions that enables to sample from control action set to approximate certain gradient. A pre-action-value function is introduced similarly to the action-value function, with the first action replaced by the first action distribution parameter.
The paper contains an experimental comparison of this approach to reinforcement learning with model-free Adaptive Critic Designs, specifically with Action-Dependent Adaptive Heuristic Critic. The comparison is favorable for our algorithm.

P.Wawrzyński, P.Podsiadly, G.Lehmann, "IOT Methodology of Frequency Assignment in Cellular Network," MOST International Conference, October 2002, pp. 313-324.

ABSTRACT: We present the constraints based methodology of solving the frequency assignment problem in Cellular Phone Network. The methodology is based on taking radio measurements in territory where a given network works. The measurements are exploited to approximate areas of cells and areas of interference that would occur in case of transceivers' frequencies assigned too close. A set of constraints for the frequency assignment is computed in order to minimize the interference. The frequencies are then assigned in the process of discrete optimization with constraints.
The standard method of dealing with the frequency assigned problem places emphasis on optimization. The shape of minimized function is determined with the use of signal propagation models. Unfortunately these models lack precision. Thus emerges the need of empirical assessment of signal strength. Determining the constraint set as much restrictively as possible becomes in practice even more important than the efficiency of optimization process.

ABSTRACT: We propose a method of recurrent estimation of conditional quantiles stemming from stochastic approximation. The method employs a sigmoidal neural network and specialized training algorithm to approximate the conditional quantiles. The approach may be used in a wide range of fields, in particular in econometrics, medicine, data mining, and modeling.

Patents

P. Wawrzyński, "A two-wheeled bicycle with variable configuration", 2018-02-16, PL 424613, WO 2019/159140A1, US 11292549B2.

P. Wawrzyński, "Pojazd latający z kilkoma zespołami napędowymi i sposób sterowania jego lotem", 2017-07-03, PL 232732.

P. Wawrzyński, "Multikopter z wirnikami o zmiennym kącie natarcia i sposób sterowania jego lotem", 2017-07-03, PL 232731.

P. Wawrzyński, G. Lehmann, "Method for determination of the minimum distance between frequency channels within pre-selected base station cells", 2001-05-23, WO 02/096141, US 6987973B2.