Fluent choosing the wrong dataset

Fluent is really good at choosing the relevant dataset, however, if you find Fluent is consistently not choosing the right dataset, check that you haven’t done any of the following.

#1 You have multiple datasets with very similar data

Like a human data analyst, if you don’t tell Fluent what the difference between 2 very similar datasets are, Fluent will not understand when to use which.

Fluent uses the dataset descriptions to determine which data is the most relevant for your question. Try editing the dataset descriptions to highlight the key differences between your datasets and indicate when Fluent should use dataset A vs dataset B.

How to fix

In the 'What can this dataset be used for?' and 'Are there specific use cases where this dataset may not be suitable?' sections, highlight what each dataset's intended use case is and what it shouldn't be used for.

It's always more powerful to contextualise the use cases – explain why a dataset isn't suitable for an analysis. Is it due to potential missing data? Different granularities?

#2 Your dataset description is too simple

If the description doesn’t encapsulate the data that’s available in your dataset, it may not provide enough information for Fluent to accurately determine its relevance to your question.

You want to strike a balance here – you don’t want to be too general, but you also don’t want to describe each individual column in your dataset. Provide a good high-level overview by focusing on the most important dimensions of data.

A good rule of thumb is to highlight 3-5 key metrics that are central to that business line.ffenc

How to fix

In the 'What does this dataset have?' section, explain the different dimensions of data available in the dataset.

In the 'What can this dataset be used for?' section, highlight 3-5 specific metrics your business cares about that can be answered with this dataset.

#3 You’ve always used one dataset for a topic and you want Fluent to use a different dataset for the same topic

This is a workflow we do not recommend – especially if the 2 datasets are very similar. Fluent's learnings are specific to datasets – i.e. definitions, training examples – so changing to a different dataset will mean starting afresh.

If you want to make changes to your dataset, edit the existing dataset’s SQL instead of creating a new one.

#4 You've given Fluent incorrect information

Fluent uses the dataset descriptions to determine which dataset to query and how to query it. If you gave Fluent incorrect information in the description, that will cause answers to be wrong.

For example, if you say a dataset only has data from 2020-2022, but it actually also includes data from 2023, Fluent will never choose that dataset to answer questions about 2023 because it thinks it doesn’t have that data.

How to fix

Review and edit the the dataset description to ensure it is accurate