Dan Nemec's Blog

Programming Information Security Personal

Exploring PathLib - A path manipulation library for .Net

By Dan Nemec

21 Dec 2014

freebigpictures.com

PathLib is available on NuGet and its source can be found on Github

##Why a library for paths?

Paths are commonly used in programming, from opening files to storage directories. They’re integral to any program, yet unlike their siblings URLs and URIs very few programming languages (with strong typing) have a strongly typed solution for storing and manipulating paths.

Instead, programmers are forced to store these paths as strings and use a host of static methods to pinch and twist one path into another. In .Net, these are found in the System.IO.Path and System.IO.Directory namespaces. Common operations include combining paths (Path.Combine(path1, path2)) and extracting the filename (Path.GetFileName("C:\file.txt")). Since these are only valid for a particular subset of strings, I’m surprised that more languages do not have objects corresponding to a path so that methods and libraries can accept a “path object” and be confident that the input data conforms to at least a rudimentary set of validation criteria (even if the path itself doesn’t exist on disk).

Despite being available in Ruby, Python, C++, and Java, I’ve never seen real-life code that uses any of these libraries, nor have I read a single blog post extolling their virtues (and if my experience dogfooding PathLib is any indication, there are many). The best thing about having a class dedicated to paths, in my opinion, is setting expectations for methods that use paths. Which of these more clearly explains what value to pass into the method?

public Database OpenDatabase(string dbLocation) {}
    
public Database OpenDatabase(Path dbLocation) {}

I can’t speak for anyone else, but I’m certainly less tempted to pass "http://localhost:4000" to the latter than the former.

To give you a small taste of the power of PathLib, compare implementations of this (real) scenario: list all files within the user’s “myapp” directory and copy all alphanumeric characters into a new file with an extra “.clean” extension (so file.txt becomes file.clean.txt). I’ve avoided using var to show how many “stringly typed” objects are created in the non-PathLib version compared to using PathLib.

IPath appDir = new WindowsPath("~/myapp").ExpandUser();
foreach(IPath file in appDir.ListDir())
{
	if(!file.IsFile()) continue;
	string text = file.ReadAsText();
	text = Regex.Replace(text, @"\W+", "");
	
	IPath newFile = file.WithFilename(
		file.Basename + ".clean" + file.Extension);
	using(var output = new StreamWriter(newFile.Open(FileMode.Create)))
	{
		output.Write(text);
	}
}
string userDir = Environment.GetFolderPath(
	System.Environment.SpecialFolder.UserProfile);  // Only in .Net 4.0
string appDir = Path.Combine(userDir, "myapp");
foreach(string file in Directory.EnumerateFiles(appDir))
{
	string text = File.ReadAllText(file);
	text = Regex.Replace(text, @"\W+", "");

	string newFile = Path.Combine(appDir, 
		Path.GetFileNameWithoutExtension(file) + 
		".clean" + 
		Path.GetExtension(file));
	using(var output = new StreamWriter(File.Open(newFile, FileMode.Create)))
	{
		output.Write(text);
	}
}

What is PathLib?

The goal of PathLib is to extend the feature set of System.IO.Path and bundle it all into a strongly typed path object. It borrows some terminology from the similarly named Python library mentioned above.

There are four main classes and two main interfaces in the library:

Factories

Since application and library developers usually want to be as cross-platform-compatible as possible, it doesn’t make much sense to explicitly create instances of a “windows path” or “posix path”. To that end, the library provides a couple of path factories that automatically detect the user’s operating system and create the appropriate path on command.

###PurePathFactory

This factory builds a pure path for the current operating system. You may also provide a set of PurePathFactoryOptions to the builder:

For convenience, a static, global PurePathFactory instance can be accessed from the PurePath class (no generic arguments).

PathFactory

This factory builds a concrete path for the current operating system. You may also provide a set of PathFactoryOptions to the builder:

For convenience, a static, global PurePathFactory instance can be accessed from the PurePath class (no generic arguments).

Serialization

One of the key behaviors of a new data type is the ability to serialize and deserialize to/from the various data storage and transport formats out there (namely, XML and JSON). Due to the magic of TypeConverters, that support is built in! Here’s an example of deserializing a path in JSON.Net:

class Data
{
	public IPath Path { get; set; }
}

public static void Main()
{
    var json = @"{ ""path"": ""C:/users/me/file.txt"" }";
    var data = JsonConvert.DeserializeObject<Data>(json);
    Console.WriteLine(data.Path.Directory);
    // C:\users\me
}

Note that I used an IPath interface rather than WindowsPath or PosixPath. While each of those have their own TypeConverters, both IPath and IPurePath have special converters that use a PathFactory to choose the correct interface type. This makes it simple to support users on any platform.

Click to display comments. Why? Many hosted comment systems use their reach to track users as they browse, similar to how Facebook like buttons can see when you load a page that embeds the button. That information is used to feed you targeted ads (among other things). Disqus will not be contacted until you open the comments.