Adopted RL strategy
- when a seller gets a task at state ac this state and all the lower price states receive a positive reinforcement:
- if the next state (next higher price) has a higher expected value (Q(ai+1)>Q(ai)) it is immediately adopted. Otherwise exploration is performed with probability ?.
- sellers bid according to the price defined for their current state Pr(ac).
“if we got a deal at the current price it would also be achieved at any lower price ”