Posts (What is this?)

Git diffs of LaTeX, Markdown, text, ...

1 Jul 2022 #git, #tig, #lazygit

Since I often edit LaTeX and Markdown files, the default style of viewing git diffs that programmers use is not really ideal for me. My diffs usually have very long lines where only a couple words changes, somewhere in the middle. Normally, when a line is changed diff displays the line twice, once for before and once for after the change (typically also distinguishing the two by colours). This, however, is not very useful when working with text and not code. You'll see two very long lines of text where you'll struggle to find what actually changed.

Luckily, git has you covered, you can use git diff with the --word-diff flag and then diffs don't appear as the same line repeated twice but instead the changes are displayed in place, within the line. If a segment is added then it is enclosed in {+ and +} and shown in green and if it is deleted then it's within {- and -} and shown in red. There is one caveat to this. It is crucial that you set your pager (typically less) to break lines, otherwise you need to scroll to the right and back all the time to review the commits which is really annoying.

I knew that and so I set the following in my ~/.gitconfig:

[core]
        pager = less -FRX

To automate the process you might also want to add an alias, e.g.

alias giwd="git diff --word-diff"

to your shell config. Perfect, now you just write giwd <commit-hash> and it shows all changes till the last commit in the --word-diff form.

There is only one problem with this. You still need to open git log or tig, find the commit, and copy-paste the commit hash to giwd. Imagine you do this again and again every day...

Ideally we would just run tig --word-diff and browse the commit diffs in the --word-diff fashion. This flag is supported in tig but there is one huge problem. The lines do not wrap in tig when --word-wrap is used so you find yourself scrolling right and left all the time. The problems with text wrapping in tig go back to issue #2 on Github!

Luckily, tig is scriptable, so if we add

bind generic w !git show --word-diff %(commit)
bind generic W !git diff --word-diff %(commit)

in ~/.tigrc, then upon pressing w after selecting a commit in tig, it shows the --word-diff of the selected commit and pressing W shows the --word-diff of all commits combined up until the selected one. Wonderful!

If you use Lazygit instead of tig, then you also can't show diffs with --word-diff directly. I asked how to do this on Lazygit's slack and Luka Markušić (mark2185) found out that this is impossible to achieve thanks to the limitations of the underlying Go library that Lazygit is build on.

Luckily, as with tig, Lazygit is also scriptable. It is enough to save the following into ~/.config/lazygit/config.yml:

customCommands:
  - key: '<c-w>'
    command: 'git show --word-diff {{.SelectedLocalCommit.Sha}}'
    context: 'commits'
    description: 'Show the commit diff with --word-diff'
    subprocess: true

Then, again pressing Control-w on a selected commit shows the --word-diff. The key setting that makes this work is the subprocess: true flag.

One more thing to add is that maybe you might have issues with the colours or wrapping. Then the safest way is to just replace the above command flag with the following:

    command: 'GIT_PAGER="less -R" git show --word-diff --color=always {{.SelectedLocalCommit.Sha}}'

This ensures that less -R is used as pager and that git won't get distracted and outputs the diff in colours.

Simple Haskell scripting

29 Jul 2021 #Haskell, #CSV

It's been quite some time since Martín Escardó told me about the somehow forgotten Haskell function

interact :: (String -> String) -> IO ()

What it does is that it takes the function String -> String and simply throws the entire program input into it and whatever it outputs produces as the program output. For example, the following Haskell program prints back the first 10 characters of its input.

main :: IO ()
main = interact (take 10)

This becomes really useful when chained with the lines :: String -> [String] and unlines :: [String] -> String functions. Then writing Haskell scripts that deal with text data, with entries split by lines, is just simple. The usual Haskell script then looks something like this.

main :: IO ()
main = interact pipe
    where
        pipe = unlines . map linepipe . lines

linepipe :: String -> String
linepipe = ...  -- a function that handles a single line of input

There are quite a few Haskell scripting libraries out there and they get quite a bit of attention. However, I haven't seen many articles praising the simplicity and power of the interact+lines+unlines pattern.

As a bonus, here is one real-world example. Let's say we want to convert the CSV data of this form

Alice;travelling, maths;https://alice.crypto
Bob;espionage;
Jonáš;;
...

into an html of this form

<ul>
<li><a href="https://alice.crypto">Alice</a> (travelling, maths)</li>
<li>Bob (espionage)</li>
<li>Jonáš</li>
...
</ul>

The following is an easy script I wrote to do just that with interact. I decided to use the Data.Text.Lazy version of interact because I needed to deal with unicode characters properly. The added benefit of this choice is that the script can handle inputs that don't fit into the memory.

#!/usr/bin/env runhaskell

{-# LANGUAGE OverloadedStrings #-}

import qualified Data.Text.Lazy    as T
import qualified Data.Text.Lazy.IO as T

main :: IO ()
main = do
        putStrLn "<ul>"
        T.interact pipe
        putStrLn "</ul>"
    where
        pipe :: T.Text -> T.Text
        pipe = T.unlines . map linepipe . T.lines

linepipe :: T.Text -> T.Text
linepipe line =
        "<li>" <> name <> hobbies <> "</li>"
    where
        (a:b:c:xs) = T.split (== ';') line

        name | c == ""   = a
             | otherwise = "<a href=\"" <> c <> "\">" <> a <> "</a>"

        hobbies | b /= "" = " (" <> b <> ")"
                | otherwise = ""

To run it, simply type cat data.csv | ./script.sh. This should work provided that script.sh is executable and the package text is installed.

Copyright unsplash.com/@matthewelancaster

Exporting Mastodon (ActivityPub) posts

30 Apr 2020 #Mastodon, #Haskell, #JSON

In the attempt to own the content I produce on the internet, I decided to move all my Mastodon posts to here. In order to do that I wrote a little Haskell script that takes the Mastodon exported files (it only reads outbox.json) and produces files in folders notes/, replies/, reposts/ with the correct frontmatter (according to the convention I adopted from IndieKit).

Here is the script, in case somebody finds it useful:

{-# LANGUAGE OverloadedStrings #-}
module Main where

import Data.Aeson
import Data.Time.Format.ISO8601
import Control.Monad
import Control.Applicative
import Data.List.Extra (split)
import Data.Time.Clock (UTCTime(..))
import qualified Data.ByteString.Lazy as BS
import qualified Data.Text as T
import qualified Data.Text.IO as T

data ActivityStreams = AS { orderedItems :: [ASItem] }
                     deriving (Show)

data ASItem = I
    { itemId    :: String
    , asObject  :: ASObject
    , published :: UTCTime
    , to        :: [String]
    } deriving (Show)

data ASObject =
    Note
    { url       :: T.Text
    , content   :: T.Text
    , inReplyTo :: Maybe T.Text
    }
    | Boost { boostUrl :: T.Text }
    deriving (Show)

instance FromJSON ActivityStreams where
     parseJSON (Object v) = AS <$> v .: "orderedItems"
     parseJSON _          = mzero

instance FromJSON ASItem where
     parseJSON (Object v) = I <$> v .: "id" <*> v .: "object" <*> v .: "published" <*> v .: "to"
     parseJSON _          = mzero

instance FromJSON ASObject where
     parseJSON (Object v) = Note <$> v .: "url" <*> v .: "content" <*> v .: "inReplyTo"
     parseJSON (String t) = return $ Boost t
     parseJSON _          = mzero


handleItem :: ASItem -> IO ()
handleItem item = do
    let isoDate    = iso8601Show $ published item
        packedDate = T.pack isoDate
        fileName   = concat
                   [ take 10 isoDate                      -- extracts the YYYY-MM-DD part
                   , "-mastodon:"
                   , (split (== '/') $ itemId item) !! 6  -- extracts the Mastodon post id
                   , ".html"
                   ]

    case asObject item of
        Note u c r -> do
            let (folder, replyTo) =
                  case r of
                    Just replyUrl -> ("replies/", [ "in-reply-to: " <> replyUrl ])
                    Nothing       -> ("notes/", [])
                fullFileName = folder ++ fileName

            putStrLn $ fullFileName

            T.writeFile fullFileName $ T.unlines $
                    [ "---"
                    , "title: ''"
                    , "date: " <> packedDate
                    , "mastodon-original: " <> u
                    ] ++ replyTo ++
                    [ "---"
                    , c
                    ]

        Boost u -> do
            putStrLn $ "reposts/" ++ fileName

            T.writeFile ("reposts/" ++ fileName) $ T.unlines
                    [ "---"
                    , "title: ''"
                    , "date: " <> packedDate
                    , "repost-of: " <> u
                    , "---"
                    ]


main :: IO ()
main = do
    contents <- BS.readFile "outbox.json"
    let maybeAS = eitherDecode contents

    case maybeAS of
        Right as -> do
            putStrLn "Parsed!"

            let public    = "https://www.w3.org/ns/activitystreams#Public"
                followers = "https://mastodon.social/users/jaklt/followers"

                filteredItems = filter (\it -> public `elem` to it || followers `elem` to it)
                              $ orderedItems as

            forM_ (filteredItems) handleItem

        Left err -> putStrLn err

One thing to note is that I also decided to publish posts that were originally available only to my followers. This is hardcoded in the url assigned to followers. If you also want to make those previously private posts available change followers to the corresponding url of your profile. Or remove the second branch of || in filteredItems if you only want to export publicly available posts.

Assuming that we saved the haskell file as export.hs then the cabal file export.cabal is as follows:

name:               export
version:            0.1.0.0
build-type:         Simple
cabal-version:      >= 1.10

executable export
  main-is:          export.hs
  build-depends:    base
                  , aeson
                  , time
                  , extra
                  , bytestring
                  , text
  ghc-options:      -threaded
  default-language: Haskell2010

To export everything, it's enought to just run stack build followed by ./export.

The source files are also published at gist.github.com.

Welcome

11 Apr 2020

On this website I'm writing my remarks about internet technologies, functional programming, mathematics and theoretical computer science and ocasionally also politics.

I've built this website as an experiment, to gather my social activity on the internet. I believe that the internet should be inhabited by many small independent websites which communicate among themselves, as opposed to one or two omnipresent platforms like Facebook or Twitter that manage all our activity for us but also take away our freedoms. Chris Aldrich wrote a nice article about it here.

You can subscribe to my website by using any old or new RSS/Atom or JSONfeed reader, links are at the top of the microposts and posts pages. I use a fairly simple protocol and internet W3 standard called Webmentions for communication with other websites in the IndieWeb community (but it works with some Wordpress websites too!).

How does this website get updated? Typically I write new posts, like or bookmark in either Indigenous, Micropublish or Quill. Once I save, the content is sent to my Github repository via (an old version of) Indiekit. Afterwords, Netlify is notified, which then builds and serves the website. Works like charm ;-)

I don't want to write a tutorial on how to do it, there is a plenty of good resources out there. If you also want to try out integrating some IndieWeb ideas on your website, the good place to start is indiewebify. One tip: start slowly, only build the basic functionality first, see how you like it and only after some time add some more.

This website is intentionally lightweight (in fact, it's less than 512 kB and also less than 250 kB).

Useful references

PS: Now that Heroku closed their free service, I'm using a small command line script to help me quickly write and publish new posts, instead of using Indiekit. It's not bad but Indiekit is so much more cool. I hope to have it restored soon. :-)