D365 Quality Evaluation Agent: Step-by-Step Configuration Guide
If you’ve been following along, in a previous article I wrote called “Understanding the Dynamics 365 Quality Evaluation Agent: A Complete Breakdown“, I dove into the different components that make up the Quality Evaluation Agent, explained what each of them means, and I walked through how they all work together. If you haven’t had a chance to read it yet, I’d definitely recommend checking it out first before continuing with this one as it’ll give you the foundation you need to get the most out of this article.
Now that we have a solid understanding of what the Quality Evaluation Agent is and how it works under the hood, it’s time to roll up our sleeves and actually get it configured! In this article I’m going to walk you through the step by step process of configuring the Quality Evaluation Agent in Dynamics 365 Contact Center / Dynamics 365 Customer Service. Whether you’re setting this up for the first time or you’ve been waiting for a clear guide to help you get it right, you’re in the right place. I’ll cover everything you need to know so you can get the agent up and running in your environment with confidence.
Enable the agent
The first step is to enable the agent. This is done in the Copilot Service Admin Center. You’ll see the ‘Quality Management’ navigation link under ‘Customer Support’, which will take you to a screen that will allows you to go the Quality Evaluation Agent setup. You’ll notice three steps that need to be completed under the prerequisites section on top of the page. I love the new simplified the setup of the agent! In step 1 you’ll need to configure the connection references. You can do this by clicking the ‘Manage Connections’ button on the tile, which will open a pop-up window with the newly created connections. You can use these connections for the agent, switch to a different connection or create a new one. To select the connections for the agent, click the ‘Update connection references to use your connector’ button. In the second step you’ll need to enable the power automate flows related to the agent. All you have to do is click the ‘Enable’ button to turn all flows on! The last step is to click the button to publish the agent. Once all prerequisites are completed, we can configure the record types we want the agent to evaluate.

There are currently three different tables that you can use for evaluation: cases (in D365 Customer Service only) , conversations(in D365 Contact Center only) and emails (preview)(in D365 Customer Service only). To enable the table for evaluations you’ll need to check the box next to the table name. If you want to enable bulk evaluations, you’ll need to check that box too. Under ‘specify record data’ you can configure the data that will be included in the evaluation, such as table columns and related tables like emails, notes and more. Admins can use a max of 10 one to one data types (these are the columns related to the source table) and 6 one to many data types (these are related tables), in addition to the ones that are already part of the configuration. You can add or modify the out of the box configurations by clicking ‘Add Data’ below below the ‘Specify Record Data’ section.
To enable evaluation criteria scores you’ll need to check the related box. Please note that once you enable this, you can’t turn it off. This setting also requires a threshold between 1-100 to determine if the evaluation meets the minimal score you configured here. Save the changes when completed.

Evaluation Criteria
The configurations and setup of evaluation criteria and evaluation plans are done in the Customer Service Workspace. They can be accessed by clicking the hamburger menu on the top left. Let’s start with the evaluation criteria. There are several criteria that are provided out of the box. You can use these as-is or make a copy and adjust them. Out of the box evaluation criteria can’t be modified.
Click ‘+New’ or make a copy of an existing evaluation criteria to create a new evaluation criteria record. You’ll need to give it a name and a description. Below the instructions you’ll need to enter instructions for the agent. These are criteria-level instructions. This is your big-picture guidance. This is where you’re defining the overall goals, expectations, and any guardrails for how evaluations using this criteria should be handled. Think of this as setting the tone of what matters most, what good service looks like, and any constraints the agent should respect. Everything below, every question, answer and score follows this direction.
Below the criteria details section we can create sections. Inside the sections is where we create the questions and answer options. We can add agent instructions for each question to guide the agent on how to evaluate a specific question. One question could be about empathy while another is about accuracy. Each question can have its own tailored instructions so the agent knows exactly what to look for. The key here is that these instructions are scoped just to that question. The same goes for the answers. The answer instructions is where we’re giving the agent context to each available answer, explaining what the answer actually means. This helps the agent understand the intent behind each answer so it can evaluate responses better instead of making assumptions. This is also where you can add conditions, for example if you have a Yes/No question you could enter: Answer ‘Yes’ if any of the following is true… and then clearly list out each condition. The more explicit you are, the more consistent the evaluations will be. To learn more about evaluation criteria best practices please take a look here.
Weight vs Score
If you enabled criteria scoring for an evaluation criteria, you’ll need to assign weightage for each section and a score for each question in the section. Keep in mind that these represent two completely different things, even though they both impact the final score! The question score is the raw score for an individual question. This is what the agent assigns based on the answer, so for example, a question might be worth 0, 5, or 10 points depending on how the interaction performed. Section weight is all about importance. You’re grouping questions into sections and the section weight determines how important that entire section is to the overall evaluation score. For example, you could have a section with lower question scores, but if that section has a high weight (I.E. compliance), it will have a much bigger impact on the final score. This is giving organizations more control, allowing them to prioritize what is most important. If compliance is critical, you weight it higher. If tone and empathy matter but aren’t make-or-break, you weight them accordingly. So here’s the key difference:
- Question score = how well did the agent perform on this specific item
- Section weight = how much does this group of items matter overall
These configuration options are what allows organizations move from ‘everything is equally important’ (which isn’t very realistic) to a scoring model that is more focused on what matters most.
There is also the ability to mark a question as ‘Critical’. This is very important, because questions marked as critical are flagged as must-pass items. Think of things like compliance, safety, or required process steps that simply cannot be missed. When you mark a question as critical you can also select the failing answer. Here is an example of a critical question with a failed answer: ‘Was customer consent clearly obtained before recording?’ failing answer: ‘No’. If a critical question receives a ‘failed’ answer like in the example I just gave you, the entire evaluation fails. It doesn’t matter how well the agent did everywhere else, that one missed critical question overrides everything!
Once the evaluation criteria is completed, you can run a simulation by clicking the ‘Create Simulation’ button on the command bar. NOTE: Simulations can be created for draft and published criteria and by default they run on the 25 most recent records that match the conditions selected in the ‘condition’ section of the simulation. Each simulation consumes Copilot Credits. By clicking the ‘Publish’ button the evaluation criteria will become available to use. Once published, edits can be made by clicking the ‘Edit’ button on the command bar. This will create a draft version, while the previous published versions stay available under the ‘Versioning History’ in the criteria record. Restoring to a pervious version is supported.
Evaluation Plans
Evaluation plans are used to automatically run evaluations, so instead of manually reviewing interactions, the plan does it for you based on plan defined conditions. We can configure which interactions need to be evaluated (cases, conversations or emails(preview)), when they need to be evaluated and which evaluation criteria needs to be used in each plan. NOTE: We can only create plans for tables that are enabled for evaluations in the Quality Evaluation Agent configuration.
For example, we can configure a plan to automatically evaluate all high-priority cases, and configure a different plan to review chat conversations. The plan configuration options available depend on the record type/table that is being evaluated. For example if we’re creating a plan to evaluate cases, only the recurring frequency is available and only ‘Daily’ is available in the ‘occurrence’ field. If you select the recurring option, you’ll also need to enter a start and end date for the plan. If we select conversations, we can select ‘real-time’ or ‘trigger’ as frequency types. When selecting the ‘trigger’ evaluation type, only the ‘Closed Conversation’ trigger is available.

In the ‘conditions’ settings you use filters to target the rows that need to be evaluated. In the ‘Assign Evaluation’ section we select the evaluation criteria that needs to be used and the evaluation method. NOTE: There are 3 evaluation methods that can be used with the evaluation agent: AI assisted, AI agent, or Manual. When you select ‘Manual‘, the agent will not answer any of the questions, instead the questions will be generated and will need to be completed by a supervisor. AI Assisted will generate the questions with the agent’s selected answers, but a supervisor needs to review and can edit the responses provided by the agent. The AI Agent option will populate all the questions and answers and editing will not be available. NOTE: AI agent mode: All questions in the evaluation criteria must have the ‘AI-enabled’ box checked. AI assisted mode: At least one question must be AI-enabled.
Each time plan evaluation runs it automatically creates a tun history record. It captures details about the plan that was executed such as when it ran, how many rows were processed and the final status of the run. This means we’ll have full visibility into what actually happened behind the scenes.
Run on-demand evaluations
Now, let’s talk about on-demand evaluations, because sometimes you don’t want to wait for a plan to run! On-demand evaluations let you manually trigger an evaluation for one or more specific interactions. For case evaluations we can select one or multiple cases from a grid and click the ‘+Request evaluation’ button. You’ll have to choose which evaluation criteria to use, the method (AI Assisted, Manual or AI Agent), who the evaluation should be assigned to and the expiration date for the evolution. Once the fields are populated click ‘Request’ to submit the evaluation request. You can do the same for conversations by navigating to activities > closed conversations and repeating the same steps and emails (preview) can be evaluated by navigating to activities > email. You can review completed by navigating to evaluations on the sitemap. When you click on the evaluation name, the evaluation details will open in a sidepane, showing an evaluation summary (containing suggestions) on the top, followed by the questions, answers and details about the answers (if AI assisted or AI agent was selected as the method.)
I hope you enjoyed this article! Be sure to check in again soon or subscribe here to never miss another post!


