blog.silas.nyc

Network Structure

March 6th, 2025

I’ve been thinking about how to initialize the network for a few days now. My first thought was to just fully randomize edge creation, picking from a pool of nodes, making sure at least 1 was already connected to the network. That seemed too prone to creating largely disjoint clusters though. From there, I thought about trying to do some sort of biasing/classification thing, where as I proceeded through joining the neurons up with synapses, I’d progressively bias them toward picking input, network, or output nodes, but my intuition also gave me the sense that would too prone to broken or flawed networks.

Finally, I decided I’d take an approach of building a cloud of neurons off of each input and output, then interlinking those clouds. It feels like this has the best chance of creating something that might actually work. I thought about maybe having some free floating neurons that would only be be linked up as part of the interlinking process, but I felt that would be more complicated than I wanted to tackle at the moment, as then I’d have to ensure that a connected free neuron gets linked to at least 1 incoming and 1 outgoing synapse, which would mean a whole extra control layer.

I started by writing a class to build the clusters. https://github.com/silasray/aiexplore/commit/64396d45f39a8cc121a4377ec48d593c7781e4e0
How 2 Train Your Network

March 5th, 2025

I think I have the reinforce() aka train part in, at least to a reasonable degree, now (https://github.com/silasray/aiexplore/commit/8bac937184645d4b90639b81091f65c06b7a4a40). I think it’s becoming increasingly clear how this is transferable to matrix transformations (duh, given we run this stuff on GPUs), but at least to my mind, writing it the way I am is a little clearer to think through and reason about, at least to a novice. The basic idea of the training step is we walk back over the open nodes for a given solution to the input nodes, accumulating weights based on the inverse of how much activation a given neuron had for the solution. In other words, the more a Neuron played a role in a solution, the higher the likelihood it will become part of a reinforcement path.

Next, we collect all the paths we can within a given number of iterations, sorted by input. Then, we take all the paths for each input, and adjust the weights based on the inverse of their weight proportional to the total cost of all paths that reached that input. I’m sure this approach has flaws, but it at least makes sense to me for now.

Tomorrow, I’m going to work on the code for initializing and exercising a network. Maybe I’ll even get to running and debugging, exciting.
How to Train Your Network

March 4th, 2025

Today, I spent some time working on the mechanism to facilitate training (https://github.com/silasray/aiexplore/commit/d32af871b95c0664097326aabc1e61e92272adb9). Spent some time thinking through how the current design will handle cycles. While it’s not great, it should be decent enough for now. I’m not super happy with the fact that max_steps means something different in resolve() than in reinforce(), but I didn’t want to get distracted trying to make the patterns match for now.

The idea is that after reaching a solution, it walks back from the solution to the input nodes to find the shortest paths, then reinforces those, either with a positive or negative value. I’m playing with some ideas in my head about how exactly to do the scoring and adjustments to the Synapse weights during the training/reinforce phase, but I think I’ll get more into that tomorrow.
Let’s Learn AI

March 3rd, 2025

I’ve been interested in AI since school, coming from my video gaming hobby. Neural nets have always piqued my interest, so I decided that I will jump in to this and learn through both doing and researching. I’m starting, intentionally without doing any research, by giving building a basic neural net a shot. I think it could be interesting to document my growth, and see how things evolve as I learn more. The first repo I have to share my work is https://github.com/silasray/aiexplore . I intend to spend at least a few hours a day on this for the foreseeable future, so let’s see how it goes, and with some luck, it’ll be an interesting look into learning.

The initial code in the repo is just a start, not running. I’m trying to build a simple neural network, where there will be a solve phase and a reinforce phase. Solve will take inputs and produce an output, and reinforce will modify the network to “learn”. Right now, basically, the idea is that the training resides in the Synapses (aka edge weights) while the solving is localized in the Neurons. I’m sure this is incredibly novice, but that’s kind of the point. I’m excited to see where it goes.
The First Derivative of Agility

December 1st, 2023

Agile is in many ways a synonym for task switching, and I think it’s a fairly widely accepted principle that switching incurs costs. We all seem to at least implicitly agree there is some balance to be had between the costs of switching and the costs of rigidity, as we have as an industry accepted agile as gospel. We’ve generally seemed to have settled on 1-5 story points as the idealized unit of work and 1-2 weeks as the idealized sprint time, but over the last few days, I’ve been thinking, that seems like something that should be quantifiable and optimizable rather than just done by gut and feel.

If thought of as a machine, there’s a slider that selects a point between 2 extremes. On 1 side, everyone on a team does 1 narrowly defined task over and over; on the other, everyone is switching tasks instantaneously and continuously. What’s the function that defines the relationship between moving that slider and business outcomes? What variables are even in that function? How do we even start to tell if we’re optimizing that function given the methodologies we currently have?

This isn’t a typical blog post, I suppose. I’m not giving solutions here, just posing a set of questions that seems interesting to me. I’d really love to hear what others had to say, and if not too presumptuous, start a conversation.
Coding for Visitors vs Residents

November 21st, 2023

I’ve been thinking more based on the analogy from my previous post and had some more ideas about how the design of a city nicely illuminates the design of a software project. In a city, there are differences between how you optimize design for locals and for visitors. For locals, you want specialized shortcuts, quickly digestible localisms, and other efficiencies that leverage familiarity. When designing for visitors, you want ease of use. Broadly familiar terms on signs, clear and easy to navigate streets, etc. Further, within a city, different neighborhoods are going to have different balances between local and visitor audiences.

Similarly, areas within a codebase will have different needs given the primary audience that will experience them. In some places, you’ll want hyper specialized and nuanced functions and constructs, while in others, it may be worth some sacrifice in performance or DRYness for readability and ease of understanding (this is subtly but importantly different from clarity). It’s important to realize that this balance will not be consistent across the entire codebase either.

What’s possibly most important to understand, in my mind, about where a particular piece of code falls on the visitor-resident spectrum is how your organization works with respect to that piece of code, rather than anything specific about the actual functionality of the code. Is this code a handful of people are interfacing with every week or 2 (residents), or is this something a large number of people are going to work with once or twice a quarter (visitors)?

Consider 2 sets of functions. On the one hand,
apply_sku_adjustments(cart, customer)
On the other,
apply_taxes(line_items, jurisdictions) apply_quantity_discounts(line_items)

Neither of these is wrong in an absolute sense, but they are optimized for different audiences. The first requires much more specialized knowledge to make sense of (what’s a sku, why are we adjusting them, what kind of adjustments are they, how do they relate to a cart, what do they have to do with the customer). The second communicates at the abstraction boundary in terms and concepts most people can understand right away. The first may well suit residents better, while the second is better for visitors.

When writing code like when designing a city, it’s important to ask yourself, who will see this, and how can I help them understand it quickly and clearly. These are not independent concerns, but rather the later is contingent on the former.
The Navigable City and Knowable Neighborhood of Good Software

November 5th, 2023

I’ve been thinking about this paradigm of navigability and knowability in software for a couple weeks, and thought it might be interesting or helpful to others, so here goes.

Consider 2 towns. The first, Tinyton, is a town of 150 residents, with 5 windy streets that cross back and forth over each other. It has a general store and a diner. The second, Bigville is a town of 15,000. It has a design pattern of gridded streets, with a convention of numbering them, and a few dozen businesses scattered around the town.

Tinyton is knowable. It’s small enough that it is reasonably possible to know every place in the town, and how to get there from anywhere else, at least with a few well known landmarks. However, it is not very navigable. If you’re not already familiar with it, you will get lost a lot.

Bigville on the other hand is navigable, though less knowable. It is unlikely that anyone can learn every place of note in the town, but because the streets are predictable and meaningfully labeled, as long as you know where you are and where you mean to go, you can find your way pretty easily. Within Bigville, neighborhoods will be knowable, but not likely the whole town.

As with urban planning, design patterns and conventions in software should also serve navigability. They should make it easy to find your way around a codebase, and make sense of what your looking at in any particular piece of code you maybe looking at. Architecture should explain why things are where they within a system, but not really what they are or what they do.

Going back to the town analogy, let’s drill down a little farther here and imagine within the Bigville there’s a store called Mike’s Deli. Mike was a plumber, so his deli sells plumbing supplies and work clothes, then has a small deli counter in the back that makes sandwiches. Within the context of his experience, this abstraction provides a useful and meaningful encapsulation, but to others, its highly specialized and poorly scoped, and the name is an obfuscation.

Mike also runs his store out of his garage, so it doesn’t conform to the patterns and conventions of the town. This makes it harder to use as well as being hard to understand, as it requires special knowledge and handling compared to everything else in the town.

Again translating to software, it’s important to remember that while undeniably valuable, encapsulation and abstraction are inherently forms of obfuscation that increase the cognitive load of working with a system. Additionally, they introduce opportunity to violate architecture. Put another way, when things get DRY, they get lighter, but also often harder and more brittle. Try to make sure your encapsulations minimize cost to benefit given this.

Also, when considering abstractions and encapsulations, try to generalize them as much as is reasonable. The less local context a visitor to your neighborhood of the codebase needs to make sense of things, the better.

I hope the navigability/knowability paradigm is a useful framing for others when thinking about software they’re building, and that I’ve done something to help illuminate the cost calculus of encapsulation. In the least, they seem useful for framing code review conversations.
YAST and the Best Practices Cargo Cult

August 17th, 2023

Let us start with an allegorical tale. Imagine you are in NYC, and you need to get to Boston. You go online, read some blogs, attend some lectures, and learn that best practice means getting in your car for 4 hours. You go climb in your car, and sit for 4 hours. You open your door expecting to find yourself outside Faneuil Hall, but find yourself still just outside your house.

You think to yourself, “how am I still where I started, I followed all the best practices for how to get to Boston, but failed!” So you go online, and after a few more hours of research, you find that all the experienced travelers use AssAligner to make sure they are positioned correctly, so you sign up. You even pay for AssAligner Pro so each member of your family can have a seat license.

Your confidence buoyed, you again go out, get in your car, and wait. 4 hours later, you get out and again find yourself right where you started. “Unbelievable!” Back online you go, diving deep on CarOverflow, and even reaching out to AssAligner support for guidance. All your research leads you to a conclusion that you really must adopt a suite of monitoring tools. After a few days comparing vendors, you go with VehiPro. Your car is so full of sensors now you also need to hire someone just to make sure they all keep working, so you bring on Joe.

“Today, we’re getting to Boston,” you think. You get in your car. You call Joe to make sure all your telemetry looks good. And you sit for 4 hours.

All of these tools and services in this story are YAST, Yet Another Specialized Tool. All are adopted to Do Things Right and Follow Best Practices, but we get so focused on the details of The Right Way we forget to see if we’re actually addressing the problem we have, or just the proximate standin therefor. We follow the actions of those who report success without really stopping to understand if their situations and motivations match our own, all the while adopting more and more things that make actually accomplishing our goals more and more complicated.

Consider this story, then look at your CI pipeline, your testing strategy, or your infrastructure solutions. Are you making choices that meet your challenges and needs, or are you spending time trying to do the thing some FAANG does when they encounter your class of problem The Right Way.
Risk vs Quality

June 27th, 2023

I had a code interview the other day that I failed. I tried to adapt a solution from a similar problem that I had in my head, but about 25 minutes into the 40 minutes allocated to coding, I realized the problem was just that little bit too different from the solution I was trying to adapt, but I didn’t have time to pivot. In reflecting on this, I realized there are some parallels with the experience of working in the Agile paradigm as it is most frequently manifested.

Agile, much like an interview, is, at root, all about time boxing. Systems and their components are designed so their implementations can be neatly chunked up and stuffed into predictable boxes. This isn’t really surprising, when you think about it; Agile comes from industrial manufacturing, and this is exactly how you efficiently manufacture things. You get a stamping machine, you get a couple different dies, and you run that machine until it breaks. The incentives are for risk mitigation, not innovation; how do I turn as many profitable problems into things I can stamp in the time it takes to stamp a thing.

The thing is, no where in this incentive structure is quality or innovation rewarded. We end up with systems that are jumbles of stamped parts rather than optimized or even really designed solutions to problems. People are promoted to senior and lead for being faster at stamping parts or more familiar with the quirks of more dies. People are promoted to manager and staff+ for being good at keeping the stampers on schedule or for being good at jerry rigging stamped parts to address narrow business problems. All the while, no one is really thinking about the problem being solved, and whether the stamped parts are the best, or even a good solution to them.

I would argue that much of our tech debt, and much of the velocity killing technical bureaucracy and inefficiency we deal with, comes from this parts stamping approach. To mix metaphors for a moment, we can’t see the forest for the trees, and we build ourselves into corners because of it. We’re so focused on efficiently lining up and cutting down trees, we don’t notice that we’ve driven 5 miles from the access road straight into a swamp, and the sawmill stopped accepting logs 2 days ago anyway.

Software development is, at least in part, a creative pursuit. Creativity requires experimentation to achieve anything worth achieving, and experimentation can only exist with risk. Sometimes, a good solution only comes out when you don’t know for sure where you’ll end up going in, and if you go in not knowing where you’ll come out, sometimes, you just don’t come out at all. As an industry, we’ve focused so hard on risk mitigation through process that we’ve almost entirely eliminated experimentation and creativity, at least for anyone below manager/staff levels. The thing is, having been eliminated below those levels, people making it to those levels have never really developed the skillset, so now we’ve largely eliminated it entirely. I think we need to change that.
Your Framework is (Probably) Hurting Your Onboarding

May 19th, 2023

What is the most important thing about your software application? Your feature sets? Your product verticals? Your customer customizations? Well, if you follow commonly held best practices of most web frameworks, the most important thing about your application, according to your codebase, is that your application is built from models, views, and utils.

By convention, models, views, and utils are the top level directories of 99% of modern codebases. This taxonomic breakdown by type implies a semantic value judgement; that the most important thing about your codebase is the same as your competitors, but more importantly, that it is the same as the last place most of your new hires worked. We think that this is effective organization, as it signposts the familiar to others, but actually, all it does is give up a powerful opportunity to convey information about your domain to people. Another way to say this is that the structure of your code is a form of documentation, and all the current best practice does is document that you are, in fact, using one of the same MVC frameworks that everyone has been using for 15+ years. This may be going out on a limb, but that’s not particularly valuable or surprising information to convey.

When you (or the new hire you’re training) encounter a new codebase, you are looking to understand what, why, and how, in that order. What does it do, why does it do that, and how does it make it happen. You may think how is first, but encapsulation is literally all about making how last, and I don’t think it’s controversial to say encapsulation is good (well, that’s a different post at least…). The models views utils structure puts how first, what second, and doesn’t even touch on why, but why is really the thing that lets you be effective in building new features. That structure is putting datatypes and design patterns as the core organizing principle of your system, but those shouldn’t be the core organizing principle of any system except for exemplars in school or Stack Overflow.

In summary, your code structure is one of the first pieces of documentation about what your application is. Treat it that way, and resist the urge to set the taxonomy of your system by datatype and design pattern.