{"id":11637,"date":"2023-03-18T17:02:40","date_gmt":"2023-03-18T17:02:40","guid":{"rendered":"https:\/\/aurelis.org\/blog\/?p=11637"},"modified":"2023-03-19T13:10:51","modified_gmt":"2023-03-19T13:10:51","slug":"more-about-rewards-also-in-a-i","status":"publish","type":"post","link":"https:\/\/aurelis.org\/blog\/artifical-intelligence\/more-about-rewards-also-in-a-i","title":{"rendered":"More about Rewards (also in A.I.)"},"content":{"rendered":"\n<h3>A reward is a nudge \u2013 with more or less lasting result \u2013 into some preferred direction. Anything can be experienced as a reward. Thinking about it as a pattern within a broader pattern is clarifying.<\/h3>\n\n\n\n<p><strong>Pattern recognition and completion (PRC)<\/strong><strong><\/strong><\/p>\n\n\n\n<p>Seeing rewards in the context of PRC, a reward is <em>always<\/em> just a part of the observation of reality \u2015 thus, not of reality itself, as explained in <a href=\"https:\/\/aurelis.org\/blog?p=11639\">Agents and Expected Rewards<\/a>.<\/p>\n\n\n\n<p>Thus, the reward doesn\u2019t need to be explicitly posited. Not doing so gives more freedom to the \u2018rewarded\u2019 agent to react in more freedom, in unexpected ways that may eventually also be more efficient.<\/p>\n\n\n\n<p><strong>The human case<\/strong><\/p>\n\n\n\n<p>Thinking about a reward as a pattern within a broader pattern corresponds with <a href=\"https:\/\/aurelis.org\/blog?p=5746\">how the brain works as a predictor<\/a>. The \u2018prize\u2019 then becomes the completion of the broader pattern.<\/p>\n\n\n\n<p>Brain happy \u2015 you happy.<\/p>\n\n\n\n<p>Note that this can make one vulnerable to \u2018rewards\u2019 that are not necessarily in one\u2019s best interest \u2015 for instance, the woman repeatedly choosing the bad guy. Anyway, inside the brain, <a href=\"https:\/\/aurelis.org\/blog?p=11639\">dopamine flows in expectation of the reward<\/a>, which can always be seen as some form of PRC.<\/p>\n\n\n\n<p><strong>A.I. Reinforcement Learning<\/strong><\/p>\n\n\n\n<p>Reinforcement Learning (R.L.) is already an important part of A.I.. It is bound to become an even (much) more important part. Contrary to other A.I. technologies, it deals with simultaneously sequential, stochastic\/evaluative, and partially observed reality (to be read twice, if you like). Within this, R.L. is about seeking a good policy (or strategy) to tackle pertinent problems.<\/p>\n\n\n\n<p>Whereas other A.I. is about solving problems, R.L. is about developing strategies to solve problems (and standard software is about making the tools for solving them). Of course, R.L. can incorporate other A.I. technologies (for instance, DNN in Deep R.L.) and the needed software to make it all happen.<\/p>\n\n\n\n<p><strong>Sorry for this explanation at the sideline. It is needed to see where the rewards fit in.<\/strong><\/p>\n\n\n\n<p>Strategies are developed in R.L. by making an agent (either in software or as a three-dimensional robot) that can roam about and gather experiences. This is: it observes its environment, including what it is being given as rewards (positive or negative). Rewards are the food that R.L. needs to form its policies, from state to state and action to action.<\/p>\n\n\n\n<p>Simply said, give it some rewards, and it will like you \u2015 asking for more.<\/p>\n\n\n\n<p><strong>An indirect reward is an (auto)suggestion.<\/strong><\/p>\n\n\n\n<p>Instead of a direct command or a direct saying what is a good or bad move, an indirect \u2018reward\u2019 lends more freedom to the rewarded agent, also if it\u2019s software. This has already been shown to lead to better results in most cases. The A.I. finds its own ways toward the preferred strategy. Not even formalizing what is or isn&#8217;t a reward gives it one more degree of freedom to look for the ideal solution.<\/p>\n\n\n\n<p>Of course, this means that one needs to think through very well what one wants the system to achieve. This also pushes us toward thinking through very well what the final goals may be \u2015 what <em>our<\/em> final goals may be.<\/p>\n\n\n\n<p>Unfortunately, there is still much to be done concerning the latter.<\/p>\n\n\n\n<p>We need to do that now.<\/p>\n<div data-object_id=\"11637\" class=\"cbxwpbkmarkwrap cbxwpbkmarkwrap_no_cat cbxwpbkmarkwrap-post \"><a  data-redirect-url=\"https:\/\/aurelis.org\/blog\/wp-json\/wp\/v2\/posts\/11637\"  data-display-label=\"0\" data-show-count=\"0\" data-bookmark-label=\" \"  data-bookmarked-label=\" \"  data-loggedin=\"0\" data-type=\"post\" data-object_id=\"11637\" class=\"cbxwpbkmarktrig  cbxwpbkmarktrig-button-addto\" title=\"Bookmark This\" href=\"#\"><span class=\"cbxwpbkmarktrig-label\"  style=\"display:none;\" > <\/span><\/a> <div  data-type=\"post\" data-object_id=\"11637\" class=\"cbxwpbkmarkguestwrap\" id=\"cbxwpbkmarkguestwrap-11637\"><div class=\"cbxwpbkmarkguest-message\"><a href=\"#\" class=\"cbxwpbkmarkguesttrig_close\"><\/a><h3 class=\"cbxwpbookmark-title cbxwpbookmark-title-login\">Please login to bookmark<\/h3>\n\t\t<form name=\"loginform\" id=\"loginform\" action=\"https:\/\/aurelis.org\/blog\/wp-login.php\" method=\"post\">\n\t\t\t\n\t\t\t<p class=\"login-username\">\n\t\t\t\t<label for=\"user_login\">Username or Email Address<\/label>\n\t\t\t\t<input type=\"text\" name=\"log\" id=\"user_login\" class=\"input\" value=\"\" size=\"20\" \/>\n\t\t\t<\/p>\n\t\t\t<p class=\"login-password\">\n\t\t\t\t<label for=\"user_pass\">Password<\/label>\n\t\t\t\t<input type=\"password\" name=\"pwd\" id=\"user_pass\" class=\"input\" value=\"\" size=\"20\" \/>\n\t\t\t<\/p>\n\t\t\t\n\t\t\t<p class=\"login-remember\"><label><input name=\"rememberme\" type=\"checkbox\" id=\"rememberme\" value=\"forever\" \/> Remember Me<\/label><\/p>\n\t\t\t<p class=\"login-submit\">\n\t\t\t\t<input type=\"submit\" name=\"wp-submit\" id=\"wp-submit\" class=\"button button-primary\" value=\"Log In\" \/>\n\t\t\t\t<input type=\"hidden\" name=\"redirect_to\" value=\"https:\/\/aurelis.org\/blog\/wp-json\/wp\/v2\/posts\/11637\" \/>\n\t\t\t<\/p>\n\t\t\t\n\t\t<\/form><\/div><\/div><\/div>","protected":false},"excerpt":{"rendered":"<p>A reward is a nudge \u2013 with more or less lasting result \u2013 into some preferred direction. Anything can be experienced as a reward. Thinking about it as a pattern within a broader pattern is clarifying. Pattern recognition and completion (PRC) Seeing rewards in the context of PRC, a reward is always just a part <a class=\"moretag\" href=\"https:\/\/aurelis.org\/blog\/artifical-intelligence\/more-about-rewards-also-in-a-i\">Read the full article&#8230;<\/a><\/p>\n<div data-object_id=\"11637\" class=\"cbxwpbkmarkwrap cbxwpbkmarkwrap_no_cat cbxwpbkmarkwrap-post \"><a  data-redirect-url=\"https:\/\/aurelis.org\/blog\/wp-json\/wp\/v2\/posts\/11637\"  data-display-label=\"0\" data-show-count=\"0\" data-bookmark-label=\" \"  data-bookmarked-label=\" \"  data-loggedin=\"0\" data-type=\"post\" data-object_id=\"11637\" class=\"cbxwpbkmarktrig  cbxwpbkmarktrig-button-addto\" title=\"Bookmark This\" href=\"#\"><span class=\"cbxwpbkmarktrig-label\"  style=\"display:none;\" > <\/span><\/a> <div  data-type=\"post\" data-object_id=\"11637\" class=\"cbxwpbkmarkguestwrap\" id=\"cbxwpbkmarkguestwrap-11637\"><div class=\"cbxwpbkmarkguest-message\"><a href=\"#\" class=\"cbxwpbkmarkguesttrig_close\"><\/a><h3 class=\"cbxwpbookmark-title cbxwpbookmark-title-login\">Please login to bookmark<\/h3>\n\t\t<form name=\"loginform\" id=\"loginform\" action=\"https:\/\/aurelis.org\/blog\/wp-login.php\" method=\"post\">\n\t\t\t\n\t\t\t<p class=\"login-username\">\n\t\t\t\t<label for=\"user_login\">Username or Email Address<\/label>\n\t\t\t\t<input type=\"text\" name=\"log\" id=\"user_login\" class=\"input\" value=\"\" size=\"20\" \/>\n\t\t\t<\/p>\n\t\t\t<p class=\"login-password\">\n\t\t\t\t<label for=\"user_pass\">Password<\/label>\n\t\t\t\t<input type=\"password\" name=\"pwd\" id=\"user_pass\" class=\"input\" value=\"\" size=\"20\" \/>\n\t\t\t<\/p>\n\t\t\t\n\t\t\t<p class=\"login-remember\"><label><input name=\"rememberme\" type=\"checkbox\" id=\"rememberme\" value=\"forever\" \/> Remember Me<\/label><\/p>\n\t\t\t<p class=\"login-submit\">\n\t\t\t\t<input type=\"submit\" name=\"wp-submit\" id=\"wp-submit\" class=\"button button-primary\" value=\"Log In\" \/>\n\t\t\t\t<input type=\"hidden\" name=\"redirect_to\" value=\"https:\/\/aurelis.org\/blog\/wp-json\/wp\/v2\/posts\/11637\" \/>\n\t\t\t<\/p>\n\t\t\t\n\t\t<\/form><\/div><\/div><\/div>","protected":false},"author":2,"featured_media":11643,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"spay_email":"","jetpack_publicize_message":""},"categories":[28,30],"tags":[],"jetpack_featured_media_url":"https:\/\/i0.wp.com\/aurelis.org\/blog\/wp-content\/uploads\/2023\/03\/2055.jpg?fit=960%2C560&ssl=1","jetpack_publicize_connections":[],"jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p9Fdiq-31H","jetpack-related-posts":[],"_links":{"self":[{"href":"https:\/\/aurelis.org\/blog\/wp-json\/wp\/v2\/posts\/11637"}],"collection":[{"href":"https:\/\/aurelis.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/aurelis.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/aurelis.org\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/aurelis.org\/blog\/wp-json\/wp\/v2\/comments?post=11637"}],"version-history":[{"count":5,"href":"https:\/\/aurelis.org\/blog\/wp-json\/wp\/v2\/posts\/11637\/revisions"}],"predecessor-version":[{"id":11681,"href":"https:\/\/aurelis.org\/blog\/wp-json\/wp\/v2\/posts\/11637\/revisions\/11681"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/aurelis.org\/blog\/wp-json\/wp\/v2\/media\/11643"}],"wp:attachment":[{"href":"https:\/\/aurelis.org\/blog\/wp-json\/wp\/v2\/media?parent=11637"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/aurelis.org\/blog\/wp-json\/wp\/v2\/categories?post=11637"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/aurelis.org\/blog\/wp-json\/wp\/v2\/tags?post=11637"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}