On large language models and decision science
View LLMs as the traveling salesperson problem
Source: images.unsplash.com
Introduction
Plainly put, I see current large language models (LLMs), those representing the current state of Generative AI, as systems that are trained to infer logical tours or routes among characters, words, phrases, and larger linguistic patterns from vocabularies, dictionaries, blogs, and papers to generate a piece of text that matches a user’s prompted context. Nothing is special about it. LLMs today are similar to advanced search engines or advanced researchers who know which keywords to search for to find internet content that satisfies their needs. But now, with much less time wasted. And somehow, LLMs can reason by connecting ideas, text, and programming instructions from each observed resource.
You, as an Operations Research (OR) scientist, can think of LLMs as systems that repeatedly use input-distance graphs to solve the traveling-salesman problem. Each sentence, paragraph, or article is a route through a set of nodes (characters, words, concepts), and the “distance” between nodes is learned or adjusted during training or fine-tuning. Because of featuring this property, frequent co-occurrences in widely shared sources and recurring patterns pull some nodes closer together, so the LLM tends to favor those routes. That tendency explains why it leans toward familiar phrasings and can overproduce common sequences. When it follows less-traveled routes, “hallucination” becomes more likely. Because of this mechanism, LLMs can follow formal logic, manipulate algebraic or code structures, and produce step-by-step instructions, demonstrating coherent chains of reasoning when the selected paths match valid patterns.
LLMs solve problems along paths they have learned. It’s like a closed-book versus an open-book exam. I wrote about optimization and learning two years ago, and I believe this perspective remains valid in 2025 and beyond (refer to my LinkedIn post). In the real world, or over the internet, you may find at least one or two experts who can instantly respond to your prompt or who have documented how to respond to that prompt. But no expert can instantly solve a decision problem; finding a solution requires time for inference. Decision problems require explicit preferences, context, and trade-offs. That is Decision Science.
Before 30 November 2022, people saved time and cost by outsourcing coding to experts in a field; now, they save time and cost by outsourcing prompting to experts in a field. However, a small deviation from expectations in decisions suggested by LLMs can erode ROI and create real business losses. That is why, in many cases, adoption of LLMs does not deliver a clear return on investment or value to the business. It is not clear who truly captures the value of LLMs: the LLM user or their customers, and why wouldn’t the customers use LLMs themselves?
In this article, I aim to outline potential use cases, and situations to avoid LLMs, for non-experts and experts in the decision science field, which will evolve over time. Stay tuned! Notably, when ChatGPT was first introduced on 30 November 2022, it was common for everyone to share their experiences using it. I try, however, to focus on writing key takeaways for you, the reader; not to judge, but to reveal some facts on today LLMs.
Using LLMs as a non-expert decision maker
I prompted OpenAI’s latest free and publicly available LLM, the ChatGPT 5 API, using its Thinking mode, with the following prompt: “I want to select from a set of items with different weights because my backpack has limited capacity. These items have different values to me. How can I decide which to take?”
Before even trying to ask this prompt, the non-expert *should first be able to recognize that this is even a problem or weakness to decide on or address it. This isn’t unique to using an LLM. The problem of not getting concerned about a critical aspect of a system, industry, or supply chain existed in the past, is observable today, and will likely remain in the future. This can be a barrier to even asking a question, as posted by Adam Dejans Jr. on LinkedIn (see his post).
ChatGPT 5 structured its response with sections titled “Quick plan,” “When to pick which,” “Code — full, ready-to-run Python,” “Steps you should follow (short),” “Troubleshooting / if step fails, do this,” and “Complexity summary.” Because it’s designed to ask follow-ups, it ended the response with: “If you want, paste your item list and capacity (IDs, weights, values). I can run the appropriate routine and give you the exact selection quickly.”
Note that it responds based on how one have thought, rated, or interacted before; so reproducing the same result from a similar prompt is not guaranteed. It is constantly personalized because of reinforcement learning from human feedback. In decision science, however, the way one thinks might be suboptimal; optimization algorithms surpass human-level reasoning.
Everything about the outline seems fine, but it clearly preferred Python; unsurprising, since this programming language dominates most blogs and tutorials and has become the world’s most popular programming language. Therefore, the LLM tends to prefer information that is repeated frequently on the internet as a measure of reliability. But, of course, repetition does not guarantee reliability.
Looking at the code, the first thing it attempted was to develop a heuristic using pure Python, without relying on any external solver. So, the common claim that LLMs are simply an interface between your tool and the user isn’t entirely accurate. LLMs can, in fact, design the tool themselves. LLMs can solve a decision problem without an external solver and it can even design the tool itself (whether that’s an LP solver, MIP solver, etc.). But they still rely on a programming language, and cannot solve a decision problem by a direct, timely inference.
Interestingly, the privided code by the LLM did not rely on a single solution algorithm; instead, it provided two alternatives. Selecting among the alternatives it provides is itself a decision problem. A non-expert user might worry that the solution is incorrect, or be unsure which algorithm to use. LLMs may confuse you and lead you to overthink. The algorithms it used, fractional knapsack (value-to-weight ratio) and dynamic programming, are technically sound, but the typical user is unlikely to care about these details.
Here is the code it provided:
# knapsack_full.py
from typing import List, Tuple
def fractional_knapsack(items: List[Tuple[float, float, str]], capacity: float):
"""
items: list of (weight, value, id)
capacity: backpack capacity (float)
Returns: (total_value, list_of_taken (id, taken_weight, taken_value))
Greedy: takes highest value/weight ratio first.
"""
# compute ratio and sort descending
items_sorted = sorted(items, key=lambda it: (it[1]/it[0]) if it[0] > 0 else float('inf'), reverse=True)
taken = []
total_value = 0.0
remaining = capacity
for w, v, id_ in items_sorted:
if remaining <= 0:
break
take_w = min(w, remaining)
take_v = (v * (take_w / w)) if w > 0 else 0
taken.append((id_, take_w, take_v))
total_value += take_v
remaining -= take_w
return total_value, taken
def knapsack_01_dp(items: List[Tuple[int, int, str]], capacity: int):
"""
0/1 knapsack DP.
items: list of (weight:int, value:int, id)
capacity: int
Returns: (max_value, list_of_taken ids)
Complexity: O(n * capacity) time and O(n * capacity) memory (we use optimized memory but still reconstruct choices).
"""
n = len(items)
# dp[w] = max value achievable with capacity w
dp = [0] * (capacity + 1)
# parent store: for reconstruction, store index of item used to reach dp[w] or -1
parent = [[False] * (capacity + 1) for _ in range(n)] # parent[i][w] = True if item i taken in optimal for capacity w after processing i
for i, (wt, val, _) in enumerate(items):
# iterate w from high to low to avoid reuse
for w in range(capacity, wt - 1, -1):
if dp[w - wt] + val > dp[w]:
dp[w] = dp[w - wt] + val
parent[i][w] = True
max_value = dp[capacity]
# reconstruct chosen items
chosen = []
w = capacity
for i in range(n - 1, -1, -1):
if parent[i][w]:
wt, val, id_ = items[i]
chosen.append(id_)
w -= wt
chosen.reverse()
return max_value, chosen
# Example usage
if __name__ == "__main__":
# Example for fractional
items_frac = [
(10, 60, "A"), # weight, value, id
(20, 100, "B"),
(30, 120, "C")
]
capacity = 50
total_val, taken = fractional_knapsack(items_frac, capacity)
print("Fractional knapsack result:")
print(" Total value:", total_val)
print(" Taken:", taken)
# Example for 0/1 knapsack (weights and capacity must be integers)
items_01 = [
(10, 60, "A"),
(20, 100, "B"),
(30, 120, "C")
]
cap_int = 50
max_val, chosen = knapsack_01_dp(items_01, cap_int)
print("\n0/1 knapsack result:")
print(" Max value:", max_val)
print(" Chosen items:", chosen)
Then I followed its suggestion and asked:
“Please run the appropriate routine with simulated data.”
It responded by generating a simulated dataset, assuming the knapsack capacity is 50:
| id | weight | value | value/weight |
|---|---|---|---|
| item_1 | 4 | 93 | 23.25 |
| item_2 | 9 | 36 | 4.00 |
| item_3 | 5 | 14 | 2.80 |
| item_4 | 4 | 91 | 22.75 |
| item_5 | 3 | 10 | 3.33 |
| item_6 | 19 | 59 | 3.11 |
| item_7 | 2 | 8 | 4.00 |
| item_8 | 3 | 32 | 10.67 |
| item_9 | 8 | 69 | 8.63 |
| item_10 | 5 | 63 | 12.60 |
Since the LLM can be connected to a Python backend, or any other tool, it can execute the code directly and produce the results. The output is as follows:
Fractional Knapsack (you can take fractions)
Total value: 308.75
Items (fully or partially taken):
item_4: 4 weight, 91.0 value
item_8: 3 weight, 32.0 value
item_9: 8 weight, 69.0 value
item_2: 9 weight, 36.0 value
item_7: 2 weight, 8.0 value
item_6: 19 weight, 59.0 value
item_3: 5 weight (partially), 13.75 value
and
0/1 Knapsack (you must take whole items)
Max total value: 303
Optimal selection:
item_1, item_2, item_4, item_6, item_7, item_8, item_9
LLMs can conclude the answer they provide with a concise summary sentence, of course:
"If your items cannot be split, select item_1, item_2, item_4, item_6, item_7, item_8, item_9. They fit within the 50-unit capacity and give the maximum total value = 303."
A non-expert who is not familiar with optimization might stop here; and that might seem perfectly fine. Non-experts often care more about practical outcomes than technical details. They were presented with two reasonable options, and which one is best depends on how the items are chosen for the knapsack. Writing from a non-expert’s perspective, I find the answer satisfactory and see no need to explore it further.
Using LLMs as a Decision Optimization Expert
Now, let’s assume the person using the LLM is an expert in the field. Experts typically use LLMs as tools, mainly to reduce repetitive or tedious tasks, when given the opportunity. They usually work with either simulated datasets or real-world data. Suppose I want to solve the same problem using the simulated data provided earlier by the answer of the LLM to the non-expert. After clearing the previous chat, I entered the following prompt:
“Please provide a Python code that solves the knapsack problem with the following data, using FelooPy:”
| id | weight | value |
|---|---|---|
| item_1 | 4 | 93 |
| item_2 | 9 | 36 |
| item_3 | 5 | 14 |
| item_4 | 4 | 91 |
| item_5 | 3 | 10 |
| item_6 | 19 | 59 |
| item_7 | 2 | 8 |
| item_8 | 3 | 32 |
| item_9 | 8 | 69 |
| item_10 | 5 | 63 |
The first thing the LLM did was to search the web, without me asking it to. The likely reason is that FelooPy, the integrated decision environment used for solving decision problems, is still not well-documented despite being around for three years. It then presented the following code:
# pip install feloopy[stock] # uncomment & run once if FelooPy is not installed
import feloopy as flp
# --- data (from your table) ---
items = [
("item_1", 4, 93),
("item_2", 9, 36),
("item_3", 5, 14),
("item_4", 4, 91),
("item_5", 3, 10),
("item_6", 19, 59),
("item_7", 2, 8),
("item_8", 3, 32),
("item_9", 8, 69),
("item_10", 5, 63),
]
# Change this capacity as needed
CAPACITY = 50
def build_knapsack_model(m):
# create a binary decision var for each item
x_vars = []
for name, w, v in items:
# m.bvar creates a binary variable (per FelooPy examples)
x = m.bvar(name=name)
x_vars.append(x)
# objective: maximize total value
obj_expr = sum(v * x for (_, _, v), x in zip(items, x_vars))
m.obj(obj_expr)
# knapsack capacity constraint: sum(weights * x) <= CAPACITY
cap_expr = sum(w * x for (_, w, _), x in zip(items, x_vars))
m.con(cap_expr <= CAPACITY, name="capacity")
# expose variables so we can inspect them later
m._x_vars = x_vars
return m
def try_get_var_value(var):
"""
Try several common attribute/method names to retrieve the solved value of a variable.
FelooPy interfaces may wrap solver variables differently, so we attempt multiple ways.
"""
for attr in ("value", "x", "X", "value_", "solution_value"):
if hasattr(var, attr):
val = getattr(var, attr)
# if it's callable (method), call it
try:
return val() if callable(val) else val
except Exception:
pass
# try call directly (some wrappers overload __call__)
try:
return var()
except Exception:
pass
# couldn't read
return None
# Build model using FelooPy 'exact' modelling interface and a common open solver ('pulp')
m = flp.model('exact', 'knapsack_problem', 'pulp')
# Add the knapsack structure
m = build_knapsack_model(m)
# Solve: objective direction 'max' with 'pulp' solver
# (You can replace 'pulp' with any solver available on your machine/framework)
m.sol(['max'], 'pulp')
# Print FelooPy report (human-readable)
m.report()
# Try to display variables using FelooPy helper (prints to console)
# This typically shows variable names and values in FelooPy
try:
m.dis_variable(*m._x_vars)
except Exception:
# if the helper is not available for the chosen interface, ignore
pass
# Now attempt to collect solution values programmatically and print chosen items + totals
selected = []
total_weight = 0
total_value = 0
for (name, w, v), var in zip(items, m._x_vars):
val = try_get_var_value(var)
# Val might be float-like; treat > 0.5 as chosen (binary)
chosen = False
if val is None:
# fallback: try to query model's solution summary if available
chosen = False
else:
try:
chosen = float(val) > 0.5
except Exception:
chosen = bool(val)
if chosen:
selected.append(name)
total_weight += w
total_value += v
# Print concise solution summary
print("\n--- Solution summary ---")
print("Capacity:", CAPACITY)
print("Chosen items:", selected)
print("Total weight:", total_weight)
print("Total value:", total_value)
The code it generated contains several errors and is unnecessarily complex. It seems the LLM couldn’t keep up with the updates that feloopy has undergone over the past three years. The error is:
raise ValueError(f"Invalid '{label}'. Please choose from: {str(list_of_allowed_values)}.")
ValueError: Invalid 'method'. Please choose from: ['constraint', 'convex', 'exact', 'heuristic', 'uncertain'].
I fixed the code myself and tried running it again:
# pip install feloopy[stock] # uncomment & run once if FelooPy is not installed
import feloopy as flp
# --- data (from your table) ---
items = [
("item_1", 4, 93),
("item_2", 9, 36),
("item_3", 5, 14),
("item_4", 4, 91),
("item_5", 3, 10),
("item_6", 19, 59),
("item_7", 2, 8),
("item_8", 3, 32),
("item_9", 8, 69),
("item_10", 5, 63),
]
# Change this capacity as needed
CAPACITY = 50
def build_knapsack_model(m):
# create a binary decision var for each item
x_vars = []
for name, w, v in items:
# m.bvar creates a binary variable (per FelooPy examples)
x = m.bvar(name=name)
x_vars.append(x)
# objective: maximize total value
obj_expr = sum(v * x for (_, _, v), x in zip(items, x_vars))
m.obj(obj_expr)
# knapsack capacity constraint: sum(weights * x) <= CAPACITY
cap_expr = sum(w * x for (_, w, _), x in zip(items, x_vars))
m.con(cap_expr <= CAPACITY, name="capacity")
# expose variables so we can inspect them later
m._x_vars = x_vars
return m
def try_get_var_value(var):
"""
Try several common attribute/method names to retrieve the solved value of a variable.
FelooPy interfaces may wrap solver variables differently, so we attempt multiple ways.
"""
for attr in ("value", "x", "X", "value_", "solution_value"):
if hasattr(var, attr):
val = getattr(var, attr)
# if it's callable (method), call it
try:
return val() if callable(val) else val
except Exception:
pass
# try call directly (some wrappers overload __call__)
try:
return var()
except Exception:
pass
# couldn't read
return None
# Build model using FelooPy 'exact' modelling interface and a common open solver ('pulp')
m = flp.model(method='exact',name='knapsack_problem', interface='pulp')
# Add the knapsack structure
m = build_knapsack_model(m)
# Solve: objective direction 'max' with 'pulp' solver
# (You can replace 'pulp' with any solver available on your machine/framework)
m.sol(['max'], 'cbc')
# Print FelooPy report (human-readable)
m.report()
# Try to display variables using FelooPy helper (prints to console)
# This typically shows variable names and values in FelooPy
try:
m.dis_variable(*m._x_vars)
except Exception:
# if the helper is not available for the chosen interface, ignore
pass
# Now attempt to collect solution values programmatically and print chosen items + totals
selected = []
total_weight = 0
total_value = 0
for (name, w, v), var in zip(items, m._x_vars):
val = try_get_var_value(var)
# Val might be float-like; treat > 0.5 as chosen (binary)
chosen = False
if val is None:
# fallback: try to query model's solution summary if available
chosen = False
else:
try:
chosen = float(val) > 0.5
except Exception:
chosen = bool(val)
if chosen:
selected.append(name)
total_weight += w
total_value += v
# Print concise solution summary
print("\n--- Solution summary ---")
print("Capacity:", CAPACITY)
print("Chosen items:", selected)
print("Total weight:", total_weight)
print("Total value:", total_value)
The result was:
┌─ FelooPy v0.3.9 ───────────────────────────────────────────────────────────────────────┐
│ │
│ Date: 2025-11-11 Time: 11:13:25 │
│ Interface: pulp Solver: cbc │
│ │
└────────────────────────────────────────────────────────────────────────────────────────┘
┌─ Model ────────────────────────────────────────────────────────────────────────────────┐
│ │
│ Name: knapsack_problem │
│ Feature: Class: Total: │
│ Binary variable 10 10 │
│ Total variables 10 10 │
│ Objective - 1 │
│ Constraint 1 1 │
│ │
└────────────────────────────────────────────────────────────────────────────────────────┘
┌─ Solve ────────────────────────────────────────────────────────────────────────────────┐
│ │
│ Method: exact Objective value │
│ Status: max │
│ Optimal 429.00 │
│ │
└────────────────────────────────────────────────────────────────────────────────────────┘
┌─ Metric ───────────────────────────────────────────────────────────────────────────────┐
│ │
│ CPT (microseconds) 8.70e+04 │
│ CPT (hour:min:sec) 00:00:00 │
│ │
└────────────────────────────────────────────────────────────────────────────────────────┘
┌─ Decision ─────────────────────────────────────────────────────────────────────────────┐
│ │
│ item_1 = 1.0 │
│ item_3 = 1.0 │
│ item_4 = 1.0 │
│ item_6 = 1.0 │
│ item_7 = 1.0 │
│ item_8 = 1.0 │
│ item_9 = 1.0 │
│ item_10 = 1.0 │
│ │
└────────────────────────────────────────────────────────────────────────────────────────┘
--- Solution summary ---
Capacity: 50
Chosen items: ['item_1', 'item_3', 'item_4', 'item_6', 'item_7', 'item_8', 'item_9', 'item_10']
Total weight: 50
Total value: 429
This run reached a value of 429! 38.96% better than the solution obtained by a non-expert using an LLM. So yes, an LLM can act as an interface between user and solver, but it’s more like navigating a crowded road than walking an empty one: you save coding time, but may spend more time debugging. Even experts can take much longer if they don’t know how to fix issues. The code it produced works, but it could be written more cleanly as follows:
import feloopy as flp
n = 10
id = [f"item_{i+1}" for i in range(n)]
weight = [4,9,5,4,3,19,2,3,8,5]
value = [93,36,14,91,10,59,8,32,69,63]
capacity = 50
def knapsack(m):
x = m.bvar(name="x", dim=[n])
m.con(m.sum(weight[i]*x[i] for i in range(n))<=50)
m.obj(m.sum(value[i]*x[i] for i in range(n)))
return m
problem = {
"environment":knapsack,
"interface": "pulp",
"solver": "cbc",
"directions": ["max"]
}
model = flp.search(**problem)
x = model.get("x")
for i in range(n):
if x[i]==1:
print(id[i], "is selected")
Knowing the right tips and tricks can help you code more efficiently, build stronger models, and find better solutions. How much you improve depends on the time you invest in learning. It is not just about writing constraints in the correct mathematical form; each constraint is simple math. The real challenge lies in how constraints interact, and diagnosing those interactions is difficult even for experts.
Conclusion
Getting stakeholders to implement your recommendations; especially when they have relied on intuition for years, is one of the most challenging tasks for industrial engineers. When you write a research proposal or prepare a model or algorithm for deployment, you must clearly demonstrate business value to both the paper reviewer and the industry decision-maker; without that, your work risks rejection.
LLMs can deliver real value for routine, repetitive tasks. In operations research, and in any field that automates repetitive work; code, models, and algorithms are tools, not sources of competitive advantage. The real edge comes from continually clarifying and updating the assumptions that define the problem, ensuring solutions genuinely fit real-world expectations or address system weaknesses. That is continuous improvement: be context- and result-oriented, not tool-oriented.
We must also consider who builds the systems, who makes decisions with LLMs, and who benefits. If an LLM only produces what an equally skilled developer could have written unaided, its added value is limited. Trust and expertise matter: a practitioner who understands the underlying theory and can actively collaborate with an LLM is far more reliable than someone who treats the process as a black box. Non-experts often remain uncertain about when to stop prompting or whether their prompts are adequate; experts, by contrast, can use LLMs to simplify repetitive tasks; but should avoid overreliance.