NeuraLinux Bringing GenAI to the Linux desktop

Toggle the command-line between English and shell

Natural language is the holy grail of HCI, but its principle challenge is precisely describing instructions of high granularity. Think of how many times you’ve failed to get Alexa or Google Home to do a slightly complicated task. This is because the number of possible interpretations is incredibly large. Natural language is inherently ambiguous, varied, and nuanced. There has been great attempts and struggles over decades in the NLP community to create such an interface, and this effort was supercharged with the LLM craze, which successfully merged human intention-smarts with code-smarts. The problem is that NLP models can’t do anything on their own, except respond with text or call functions. These models, while surface-level impressive, manipulate and generate text without an intrinsic understanding of the tasks they describe. This limitation is crucial in settings that demand precise, context-aware execution, such as programming or system management. However, developers are recently using LLMs to communicate with cloud APIs to affect the user environment, which have taken the AI term of “agents.” However, I strongly believe the most powerful translation medium is through shell commands.

Historically, the development of NLIs has focused on reducing the complexity of human commands into computable actions. Early systems like SHRDLU in the 1970s offered interfaces capable of understanding and acting on structured natural language commands within a limited domain. The 1980s and 1990s saw the advent of more sophisticated systems, integrating voice recognition and deeper parsing capabilities but still struggled with the variability and ambiguity of natural language. The turn of the millennium brought smarter, more contextual systems thanks to advances in machine learning and contextual analysis, setting the stage for modern assistants like Siri, Alexa, and Google Assistant.

Despite these advancements, the transition from simple task execution to handling complex computational tasks through natural language has been slow. The granularity and specificity required in fields such as software development or system administration typically exceed the capabilities of current NLIs, leading to frustration when attempting more complex commands.

In bridging the gap between natural language and the precise, granular control provided by shell environments, we see an opportunity to harness the strengths of both. The shell, with its capacity for detailed and specific commands, offers a robust interface for computer operations, free from the ambiguity of natural language. It provides a direct method to leverage the operating system’s capabilities through concise syntax and well-established conventions. This system’s reliability and speed are unmatched, particularly for complex tasks that require detailed control and immediate feedback.

First, you may need some convincing if the word “shell” brings a sour taste to your mouth and invokes images of green text and poor error messages. I believe the shell is the ultimate middle-man between your brain and your computer because it can do literally any task, tersely. This endless bag of functions allows you to compose instructions of arbitrary granularity through the use of flags, pipes, and scripts, which makes it infinitely versatile. It does not suffer from network latency, it has a consistent API due to decades of conventions, and it’s free like speech. A major win for data scientists is that shell commands are extremely consistently documented due to old conventions (man, -h, README.md, StackOverflow).

Next, you may need some convincing that natural language is even the right tool for an interface. If you’ve read a research paper, you painfully understand the tradeoff between precision and interpretability of speech; what they’re saying may be extremely granular, but it comes at the cost of understanding. Well, machine learning architectures face the same trade off, and it forms a pareto frontier of model interpretability and predictive power.

The lesson here is that we don’t need to maximize precision at the expense of interpretability, which is what shell commands do. Our speech can get most of the point across while still being easy to understand and interpret, just like decision trees. If you were to describe your desired tasks to another person, you can be really imprecise and still get your point across for ~80% of them. Really think about this for a second, but how many unique or complex commands do you run in an average session? Unless you’re trying to swoon a girl with your shell-fu, you want to be lazy and describe your workflow in the same language in which you think, or use a shell script that’s already written, rather than this weird archaic pseudo-programming language. That’s a very basic explanation why a natural language interface can enable efficient computer interfaces.

So what solutions are out there to specify common tasks in natural language? Like I mentioned before, the two options are voice assistants and GPT-based programs. But my problem with the GPT programs is the middleman that adds to the cognitive overhead. It makes iterative translations clunky, and overall not as fun. I’m less willing to type “sgpt” every time I run a command because it’s another program to keep track of. The tradeoff is that it is more powerful than a shell integration. But What I want is for English to be a first class member, like the shell was designed from the ground up to understand natural language. This can be achieved using the zsh line editor, which allows you to run arbitrary functions on the contents of the command line buffer, triggered through keybindings.

Introducing: englizsh

Underutilized feature: zsh line editor (ZLE) Natural language to bash is a powerful tool that I believe deserves its own built in interface into the shell rather than a separate program. I propose an interface where users can toggle between natural language and English with a key binding to create a more user-friendly shell. Not to bash bash, but simple tasks should not take research on which program + flag combo will yield the right result with external programs like man and google. It is a more intuitive interface to toggle between a description and its command with a keybinding that invokes a GPT agent automatically, rather than invoking chatgpt -s “command” every time. This is useful if you forget a flag midway through a command or want to figure out what a command does. Developers of any experience, especially people new to the terminal, frequently face this issue; thus, there must be an interface integrated directly with the shell to save both key presses and the cognitive load of another program.

This interface behaves as such. Make a key binding that transforms the content of the buffer based on what is currently in it, which has 4 cases:

  1. Shell command -> natural language description
  2. Natural language -> shell command
  3. No text -> natural language description of previous command
  4. If there is natural language in the buffer and the user presses enter, then the command is translated to shell before executing. If it is a non-destructive command, it is directly executed. Otherwise, the user is prompted before execution, just like the default behavior.

My algorithm to enable iterative translations is satisfyingly simple: using 3 variables, shift the contents down 1 position every translation. The syntax becomes a bit ugly in a shell implementation, but the idea is nonetheless the same.

This integration faces significant challenges, primarily around the precision and safety of command execution. The variability of natural language could lead to commands that are imprecise or potentially harmful if misinterpreted. Therefore, this system would need robust parsing algorithms capable of understanding context and user intent with high reliability. Furthermore, user education on the limitations and proper use of such a system would be crucial to prevent errors and ensure effective use.

Natural language expressed through shell commands represents a promising frontier in making computational tools more accessible and intuitive. By reducing the barrier to entry for using command-line interfaces and enhancing the precision of natural language commands, we can create a more versatile and user-friendly computing environment. This approach not only improves individual productivity but also democratizes access to complex computing operations, making them accessible to a broader audience without requiring extensive technical background.