www.red-lang.org Open in urlscan Pro
2a00:1450:4001:811::2013  Public Scan

Submitted URL: https://red-lang.org/
Effective URL: https://www.red-lang.org/
Submission Tags: analytics-framework
Submission: On April 23 via api from US — Scanned from DE

Form analysis 1 forms found in the DOM

https://www.red-lang.org/search

<form action="https://www.red-lang.org/search" class="gsc-search-box" target="_top">
  <table cellpadding="0" cellspacing="0" class="gsc-search-box">
    <tbody>
      <tr>
        <td class="gsc-input">
          <input autocomplete="off" class="gsc-input" name="q" size="10" title="search" type="text" value="">
        </td>
        <td class="gsc-search-button">
          <input class="gsc-search-button" title="search" type="submit" value="Search">
        </td>
      </tr>
    </tbody>
  </table>
</form>

Text Content

RED PROGRAMMING LANGUAGE






PAGES

 * Home
 * About
 * Getting Started
 * Download
 * Documentation
 * Contributions
 * Roadmap
 * Privacy Policy






JULY 29, 2022


NEW RED BINARIES


Since many years, we are offering pre-built binaries for the Red toolchain, as a
more convenient way to use Red, even if it is not strictly needed, as Red can be
run from its sources, the toolchain being run by a Rebol2 interpreter. As the
Red REPL and toolchain are not run by the same engine, the console (REPL) used
to be compiled on first run of the `red` executable (when no arguments was
provided or a Red script was passed). This resulted in a significant delay on
the first use of the console (both for the GUI and CLI versions). 


We have now decided to change that by providing separate pre-built binaries for
the consoles and toolchain. This is a temporary split until Red gets
self-hosted, at which point we can recombine everything into a single binary.


Another change is the temporary dropping of the semantic versioning until
version 1.0 and related "stable" releases, as it seems to be too confusing to
some users (Red being still in alpha stage). This also will remove a tendency
from some users to care more about version increments than feature availability
and work being done overall. We will now be proposing only pre-built binaries
for latest commit, though older binaries will still be available if that can be
of any help to anyone.


So the pre-built binaries now are:
 * Red GUI : Red interpreter + View + GUI console
 * Red CLI : Red interpreter + CLI console
 * Red Toolchain : Encapper for Red + Red/System compiler



We are also considering ways to merge the GUI and CLI consoles into a single
binary which can work even if no GUI API is available, falling back on CLI mode.
We will also have the console(s) act as a front-end for the toolchain, even
downloading it for you in the background when needed. Though for that we need a
proper asynchronous `call` function implementation. More news about this soon.


In the meantime, enjoy running Red consoles almost instantly from just a click
on the Download page!





Posted by Nenad Rakocevic at 6:18 PM 5 comments:
Labels: binaries, console, download



JULY 14, 2022


THE ROAD TO 1.0



You cannot have missed that in the last months (and even last years), our
overall progress has slowed down drastically. One of the main reasons is that we
have spread our limited resources chasing different objectives while making
little progress on the core language. That is not satisfying at all and would
bring us most likely to a dead-end as we exhaust our funding. We have spent the
last weeks discussing about how to change that. This is our updated action plan.

From now on, our only focus will be to finish the core language and bring it to
the much-awaited version 1.0. We need to reach that point in order to kickstart
a broader adoption and provide us and our users a stable and robust foundation
upon which we can build commercial products and services necessary for
sustainability.

Given the complexities involved in completing the language and bringing an
implementation that can run on modern 64-bit platforms, we have devised a
two-stage plan.





UPGRADE THE CURRENT 32-BIT RED IMPLEMENTATION




👉 LANGUAGE SPECIFICATION

It is now time to do so in order to clean-up some semantic rules and address all
possible edge cases which will help fulfill our goals of implementation
robustness and stability. The process of writing down the complete language
specs will result in dropping some features that we currently have that end up
being problematic or inconsistent. OTOH, we might add some new features that
will need to be implemented for 1.0.

👉 MODULES

We need a proper module system in order to be scalable. We also need to have a
proper package management system which will be tied to a central repo where we
can gather third-party libraries. That would also enable modular/incremental
compilation (or encapping) which will be most probably supported in the
self-hosted toolchain.

👉 CONCURRENCY

We need a proper model for concurrent execution in order to leverage multicore
architectures. We will define one and make a prototype implementation in the
32-bit version.

👉 TOOLCHAIN

Before starting to work on the new toolchain, we will make some changes to the
existing version in order to prepare for the transition. The biggest change is
the dropping of the Red compiler, which will only act as a (smart) encapper.
Routines and #system directives will still be supported, but probably with some
restrictions. The Red preprocessor might also see some changes. This change
means that Red will only have one execution model instead of the two it has
currently. The Red compiler has become more of a burden than a help. The speed
gains are not that significant in real code (even if they can be in some
micro-benchmarks), but the impossibility for the compiler to support the exact
same semantics as the interpreter is a bigger problem. This move not only will
bring more stability by eliminating some edge case issues but also will reduce
the toolchain by almost 25% in size, which will help reduce the number of
features to support in the new toolchain.

👉 RUNTIME LIBRARY

Some improvements are long overdue in the Red runtime library. Among them:



 * unified Red evaluation stack.
 * unified node! management.
 * improved processing of path calls with refinements.
 * improved object! semantics.



All those changes are meant to simplify, reduce the runtime library code and
address some systemic issues (e.g. stack management issues and GC node leaks).

👉 DOCUMENTATION

We need proper, exhaustive, user-oriented documentation for the Red core
language. This is one of the mandatory tasks that needs to be completed and done
well for wider adoption.





SELF-HOSTED RED FOR 64-BIT VERSION




👉 TOOLCHAIN

In order to go 64-bit, we have to drop entirely our current toolchain code based
on Rebol2 and rewrite it with a newer architecture in Red itself. The current
toolchain code was disposable anyway, it was not meant to live this long, so
this was a move we had to do for 1.0 anyway.

So the new toolchain will feature:



 * a new compilation pipeline with a plugin model.
 * an IR layer.
 * one or more optimizing layers.
 * modular/incremental compilation support.
 * x64, AArch64 and WASM backends.
 * linker support for 64-bit executable file formats for the big 3 OS.
 * support for linking third-party static libraries.

32-bit backends will not be supported in 1.0, though, they could be added back
in the future.



👉 RUNTIME LIBRARY



The current Red runtime library written in R/S will be kept and some adjustments
will be needed in order to be fully compatible with a 64-bit environment (like
updating all imported OS API to their 64-bit versions). 


View engine will not be part of that upgrade for 1.0, but will be done in a 1.1
version, priority is given to Red/Core for the 1.0.





ROADMAP

Here are the main milestones:



 * v0.7   : Full I/O with async support.
 * v1.0b : (beta) completed self-hosted Red with 64-bit support.
 * v1.0r  : (release) first official stable and complete Red/Core language
   release.
 * v1.1   : View 64-bit release.
 * v1.2   : Android backend and toolchain release.
 * v1.3   : Red/C3 release.
 * v1.4   : Web backend for View release.
 * v2.0   : Red JIT-compiler release.
 * v3.0   : Red/...



The 0.7 should be the last version for the 32-bit Red version and current
toolchain and we will be working on that first.

For reaching the 1.0-beta milestone, we target 12 months of intensive work, so
that will bring us to Q3 2023. That's an ambitious goal but necessary to reach
for the sake of Red's future.

The currently planned beta period for 1.0 is 2-3 months. We want a polished,
rock-solid, production-ready 1.0 release.

For the 1.1, we will probably make some (needed) improvements to View engine
architecture and backends.

For Red/C3, as the Ethereum network is transitioning to 2.0 and a new EVM, we
need the WASM backend in order to support it.

Version 1.4 will bring a proper web runtime environment to the WASM backend,
including GUI support.

The 2.0 will be focused on bringing a proper JIT-compiler to Red runtime, that
should radically improve code execution of critical parts without having to drop
to R/S.

Version 3.0 is already planned, but I will announce that once 1.0 will be
released. ;-)

One major platform is missing from the above plan, that is iOS. Given how closed
that platform is, we will need to come up with a specific plan on how to support
it, as it won't be able to cross-compile for it (you would need a Mac computer),
nor probably generate iOS apps without relying on Xcode at some point (not even
mentioning dynamic code restrictions on the AppStore), which are layers of
complexity that Red is trying to fight against in the first place... So for now,
that platform is not among our priorities.




To finish, let me borrow some words from someone who succeeded more than anyone
else in our industry:







Expect me to say "no" even more so from now on, as we get laser-focused on our
primary goal.

Cheers and let's go!


Posted by Nenad Rakocevic at 6:41 PM 29 comments:
Labels: announce



DECEMBER 31, 2021


2021 WINDING DOWN









Another quarter, another blog post. Seems almost rushed after the previous
drought. 

To set the stage, I'll start with a bit of a rant about complexity. If you just
want the meat of what's happening in the Red world, feel free to skip the
introduction. 


COMPLEXITY CONSIDERATIONS: PART 1

I liked what the InfoWorld article, Complexity is Killing Software Developers
said, which we all know, about difficult domains (voice and image recognition,
etc.) being available as APIs. This lets us tackle things we couldn't in some
cases. Though I imagine @dockimbel or others also used Dragon Dictate's
libraries back in the 90s. What we have now is massive data to train systems
like that. Those work well, allowing us to add features we otherwise couldn't
with a small team.

The problem I see is that the trend has become for everything to be outsourced,
including simple features like logging, and those libraries have exploded. There
must be graphs available to show the change. Moderately complex domains, UIs for
example, have risen in number and lead to what @hiiamboris says about Brownian
Movement. It's a random collection of things, not designed to work together,
without a coherent vision. A quote from the above article says it this way:

"Complexity is less the issue than inconsistency in an environment."

> 

It used to be that you could take a FORTRAN, COBOL, Lisp, VB, Pascal/Delphi,
Access/PowerBuilder, dBase/Clipper/Paradox, or even a Java developer, drop them
into a project, and they could work from a solid core, learning the team's
custom bits and any commercial tools as they went. With JS leading the way, but
not alone in this, a programmer can only rely on a much smaller core, relative
to how many libraries are used.

Because those libraries, and the choices to use a particular combination of them
were not designed to work together, there is no guarantee (or perhaps hope) of
consistency to leverage. It's worse if you came from a history of other tools
that were based on different principles or priorities, because you have to
unlearn, breaking the patterns in your mind. Or you convince people to use what
you did before, even if there is overlap with tools already in use.

Things are changing now, and will even more. New service-based companies are
coming, and a drive to APIs rather than libraries. So we not only have risks
like LeftPad, but also companies going out of business under you. The modern
trend means it's no longer dependent on an author or team committed to a project
long term, but to what investors want, and what changes are made to gain
adoption at all costs. As a service-based company you can't hold dearly to
design principles if the investors tell you to pivot. Because it's no longer
about your vision, but their return. If it is a solo FOSS author or small team,
what is their incentive to maintain a project for free, while others profit from
it? Success can be your worst enemy, and we need a more equitable solution than
what we have now. The software business model has changed dramatically, and will
likely continue to do so.

Here is what I personally see as the crux of the problem: the goal of scaling.
FOSS projects and companies are only considered successful if they have millions
(or, indirectly, billions) of users. Companies that want to be sustainable,
providing long term, moderate profits don't make headlines, but they make the
world go 'round. They are not the next big social media disruption where end
users are the product, to be bought and sold. It is a popular business model and
profit is the goal. It's nothing personal.

This has led us to the thinking that every project needs to be designed for
millions of users at the very least. Sub-second telemetry for all the data
collected, another explosion, giving rise to data analytics for everyone; not
just Business Intelligence (BI) for large companies. I won't argue against
having data. I love data and learning from it. But I do believe there is a point
of diminishing returns which is often ignored. Rather, in this case, there is a
cost of entry that small projects wouldn't otherwise need to pay.

What do you do, as an "architect" (see the previous blog post about my thoughts
on software architecture) or developer on a team? Your small team (we all know
small teams are best, plenty of research and history there) simply can't design
and build every piece to support these scaling demands, while the sword of
Damocles hangs over you in the form of potential pivots (dramatic changes in
goals).

As an industry, we are being inexorably forced to make these choices. Either
you're a leader and make your own Faustian bargain, or you're in the general
mass of developers being whipped and driven to the gates of Hell.

Only you, dear reader, can decide the turns this tragic story will take, and
what you forgive in this telling perchance I should exaggerate.


COMPLEXITY CONSIDERATIONS: PART 2

Complexity doesn't come only in the form McCabe is famous for, the decision
points in a piece of code, but in how many pieces there are and how often they
change either by choice or necessity. Temporal Complexity if you will. This
concept is unrelated to algorithmic time complexity. Rebol2 for any faults we
can point out, still works to this day (except in cases where the world changed
out from under it, e.g., in protocols). It was self-contained, and relied only
on what the OS (Operating System) provided. As long as OSs don't break a core
set of functionality that tools rely on, things keep working. R2 had a full GUI
system (non-native, which insulated it from changes there), and I can only smile
when I run code that is 20 years old and it works flawlessly. If that sounds
silly, remember that technology, in most cases, is not the goal. It is a means
to an end. A lot of very old code is still in production, keeping businesses
running.

We talk about needing to keep up with changes, but some things don't change very
much, if at all. Other things change rapidly, but for no good reason, and
without being an improvement. If a change is just a lateral move there is no
value in it, unless it is to align us on a different, and better, path in the
future. I started programming with QuickBASIC, but also used other tools as I
quickly learned my tool of choice came with a stigma attached, and I wanted to
be a serious, "real" programmer. What became clear was that QB was a great tool,
with a few companies providing terrific ASM libraries, and had a wonderful IDE
to boot. It was simpler, not only as a language, but because every 12-18 months
(the release cycle way back when) my new C compiler would break something in my
code. But QB, and later BASIC/PDS and then VB very rarely broke working code.
Temporal complexity.

Even then there were more complex options. The cool kids used Zortech C++ and
there were various cross-platform GUI toolkits. But those advanced tools were
often misapplied to simple projects. We still do that today. Much of that is
human nature, and the nature of programmers. If it's easy we are no longer
special. We may not mean to, but we make things harder than they need to be.
Some of us are even elitist about what we do, to our own detriment. If you don't
need to be cross platform, why do you have multiple machines or VMs each with a
different compiler setup? If you need a GUI, why are you using a language that
was not designed with them in mind? If you need easy deployment, which is
simpler: a single EXE with no dependencies, or a containerization approach with
all that entails? How many technologies do you need in your web stack? Are you
the victim of peer pressure, where you feel your site has to be shiny and
"responsive", or use the latest framework?

A big argument for using other's work is performance. They've taken time, and
may be experts, to optimize Thing X far beyond what you could ever do. That JIT
compiler, an incredible virtual DOM, such clever CSS tricks, the key-value DB
with no limits, and yet...and yet our software is slower and more bloated than
ever. How can that be? Is it possible we're overbuilding? Is software sprawl
just something we accept now?

Earlier I mentioned that a hodge-podge assembly of parts that have no standards,
norms, or even aesthetic sense applied does not make our lives easier. Lego
blocks, the originals anyway, are limited, but consistent in how they can be
used. We misapply that analogy, because the things we build are far from
consistent or designed to interact. Even in the realm of UX and A/B testing on
subsets of users that companies apply today. I love the idea of data-driven HCI
to guide us to a more evidence-oriented approach. This includes languages. But
when a site or service moves fast and changes their interface based on their own
A/B testing, they don't account for the others doing the same. Temporal
complexity.

As a user, every app or site I access may change out from under me in the flash
of refresh or automatic update I didn't ask for. Maybe it's better, an actual
improvement, if you only use that one site. But if all your tools constantly
change out from under you, it's like someone sneaking into your office and
rearranging it every night while you sleep. Maybe this is the developer's
revenge, for the pain we inflict on ourselves by constantly changing our own
tools. If we suffer, why shouldn't our users? For those who truly have empathy
for their users and don't want to drive them mad, or away, perhaps the lesson is
to have empathy for ourselves, for our own tribe. I don't want to see my friends
and colleagues burn out, when it was probably the enjoyment and passion that
solving problems with software can bring which led them here to begin with.

Every moving part in your system is a potential point of failure. Reduce the
moving parts and reliability increases. Whether it's the OS you run on (we now
have more of those than ever, between Linux distros and mobile platforms always
trying to outdo each other), extra packages or commercial tools, FOSS libraries,
environments, [?]aaS, or platform components like containers and cluster
management, every single piece is a point of failure. And if any of them break
your code, or your system, even in the name of improvements or bug fixes, you
may find yourself running just to stay in the same place. Many of those pieces
are touted as the solution to reliability problems, but a lot of them just push
problems around, or target problems you don't have. Don't solve problems you
don't have. That adds complexity, and now you really have a problem.


LESS PHILOSOPHY, MORE RED





INTERPRETER EVENTS



Having a debugger in Red has been a request of many users for a long time, even
since the Rebol era. We have tackled this feature from a larger perspective,
considering general instrumentation of the interpreter (note: not the compiler),
extending it with an event system and user-provided event handlers, similar to
how parse and lexer tracing operate today. This approach allows us to build more
than just a debugger, though it was a lot of work to design and we expect it
will be refined once people start using it in earnest. It's a brave new world,
with a lot of tooling possibilities.


It's important to note that this is not magic. Because it operates as the
interpreter evaluates values and expressions, including functions, it can't see
into the future. In order to get a complete trace, you have to evaluate
everything. That means we'll see tools which silently collect data, like a
profiler does, which can later be viewed and analyzed, perhaps up to the point
where an error occurred. This is an important aspect, and plays once again into
the power of Red as data. Your event handlers can easily collect data into any
structure or model you like. And because event handlers can filter events, you
can tailor them for specific needs. It should even be possible to build
interpreter level DTrace-like tools in the future. We also hope to build higher
level observability and monitoring tools, based on eventing systems, in the
future, but those are long term projects.


Event generation is not active by default, it is enabled using do/trace and by
providing an event handler function. For example, here's a simple logging
function:

  logger: function [
      event  [word!]                      ;-- Event name
      code   [any-block! none!]           ;-- Currently evaluated block
      offset [integer!]                   ;-- Offset in evaluated block
      value  [any-type!]                  ;-- Value currently processed
      ref    [any-type!]                  ;-- Reference of current call
      frame  [pair!]                      ;-- Stack frame start/top positions
  ][
      print [
          pad uppercase form event 8
          mold/part/flat either any-function? :value [:ref][:value] 20
      ]
  ]


Given this code:

  do/trace [print 1 + 2] :logger

It will output:

  INIT    none                    ;-- Initializing tracing mode
  ENTER   none                    ;-- Entering block to evaluate
  FETCH   print                   ;-- Fetching and evaluating `print` value
  OPEN    print                   ;-- Results in opening a new call stack frame
  FETCH   +                       ;-- Fetching and evaluating `+` infix operator
  OPEN    +                       ;-- Results in opening a new call stack frame
  FETCH   1                       ;-- Fetching left operand `1`
  PUSH    1                       ;-- Pushing integer! value `1` on stack
  FETCH   2                       ;-- Fetching and evaluating right operand
  PUSH    2                       ;-- Pushing integer! value `2`
  CALL    +                       ;-- Calling `+` operator
  RETURN  3                       ;-- Returning the resulting value
  CALL    print                   ;-- Calling `print`
  3                               ;-- Outputting 3
  RETURN  unset                   ;-- Returning the resulting value
  EXIT    none                    ;-- Exiting evaluated block
  END     none                    ;-- Ending tracing mode


Several tools are now provided in the Red runtime library, built on top of this
event system:
 * An interactive debugger console, with many capabilities (step by step
   evaluation, a flexible breakpoint system, and call stack visualisation).
 * A simple profiler that we will improve over time (especially on the accuracy
   aspects).
 * A simple tracer. The current evaluation steps are quite low-level, but
   @hiiamboris has already built an extended version, operating at the
   expression level that will soon be integrated into the master branch.

Full docs are here.





FORMAT



Boy, I really thought this was was going to be easy, or at least not too hard. I
couldn't have been more wrong. When I did my format experiments, I imagined at
least some of the code would be useful, requiring polish and more work of
course, providing a foundation to work from. It turns out that I missed a key
aspect, and my approach was just one of many possible. @hiiamboris and @giesse
both weighed in, and we chatted about specific parts. Then it sat idle for a
while, and I asked Boris to take it over to get it into production. He
identified the key missing piece, which would have limited its usefulness until
we eventually had to address it. Better now than later. He also made a strong
case for a different approach to the core masked-number and I told him to run
with it. That led to a lot of design chat about one aspect, which is as yet
undecided. It's not a fight to the death, but there has definitely been some
sparring. :^)


The missing piece I've alluded to is Localization (L10N). As an American who has
never had to develop software requiring Internationalization (I18N), I've been
blissfully ignorant of all the aspects that come into play when Globalization
(G11N) becomes part of the process. We have talked about how to implement L10N
in Red, and have system/locale for a months, weekdays, and currency codes. The
first two we inherited from Rebol's design, the latter was added when @9214
designed the currency! datatype. Thinking of locale data in a system catalog of
some kind is easy enough, but how to actually apply it (and not apply it when
necessary) is a different story entirely. And I mean entirely. Format forced us
to start down this path, and is a guinea pig feature that will guide future
plans for all future L10N work. But keep my complexity rant in mind. While we
want to make it as easy as possible for Reducers to write globally aware apps,
if you don't need it, don't do it. We don't yet know if we can make it so
magical that you can write your app ignoring that for the most part, and then
flip a switch, or simply include local data, and have it work. Don't get your
hopes up. There's a lot that can go wrong with that approach.


We agreed to start with masked numbers but, in order to do that, L10N R&D had to
be done. This led to broad and deep dives into unicode.org and other resources.
While they cover far more than we need, and is overly complex in many cases (or
just doesn't match our aesthetic sense for Red), the data they have there is
enormously valuable, and we deeply appreciate it being available. We just draw
the line around a smaller scope than they do, and no committees are involved
where people fight to get their own bits included. Well, we do that too, to some
extent. What Boris managed to do was identify the key elements needed for our
work, and then wrote tools (using Red of course) to extract and reformat the
data for use in Red. I can't stress how much work this was. Truly a heroic and
mostly thankless effort most people will never know about.


In order to test masked number formatting, and give others an easy way to play,
Boris created a Playground App and I can't tell you how important that was. You
see, a particular piece of behavior came up while I was playing with it and got
unexpected results. Unexpected to me, but Boris confirmed it was by design. I
will just say here that it's about a significant digits mode, and let you play
with the app from there. Named formats will be available, but everything will
likely boil down to wrappers around masks, which should cover almost any need.


Next up is date formatting. This time I knew locales would play a role because
some IETF RFCs specify that date elements be in English. So you may have
localized dates for some things, but if you use RFC2822 dates or HTTP cookie
dates, they must not be affected by any locale settings. Dates will use masks at
the core, like numbers, because masks are an easy to understand WYSIWYG format.
Well, easy if the masks make sense. If you look at printf and some other mask
syntax, it can be quite obscure. By trying to cram things into a limited syntax,
people end up using whatever low ASCII letters might be left over for some
elements. We hope to avoid that. 


Our main choices are what Boris termed the stuttering format. e.g.
MMDDYYYY/HHMMSS. Think in terms of "progressing in a hesitant or irregular way."
rather than stuttering in terms of human speech. I prefer to call this a
symbolic format, where the letters map to date elements. This, of course, isn't
perfect. e.g. is MM month or minute? Context is required. We don't want to be
case sensitive, or use other letters randomly to avoid that conflict. So there's
an alternate approach; a literal mask. e.g. 1-Jan-2022. We're not the first to
consider it, and it is in use elsewhere, but it's not a perfect solution either.
Do masks have to be written in English terms, or can they use any locale? How do
you disambiguate numbers (does 01-01 mean MM-DD or DD-MM, and how do you write
that without the separator to get MMDD?) Does it make code more or less
readable, because Red already has a literal date form, and it would add what
look like literal dates as strings in code.


Play with the app, give us feedback, and stay tuned. We think this will be a
crucial feature for a lot of users, and we want to make it the best it can be.




SPLIT



Like format, split seems a relatively simple subject at a glance. And if you
limit it to basic functionality, it is. That's what other languages do, though
some add a few extra features. See this table for examples. Wolfram appears
quite broad in scope, because there are multiple variants for each named
function. Something else common to all other languages is that they split only
strings and sometimes byte arrays. In Red we have blocks, and while `parse` is
great for string parsing, where it really shines is when applied to blocks to
build dialects. We knew split should be block aware for more leverage. I (Gregg)
helped design the version in R3, and used DiaGrammar to design a new dialected
interface that aimed to extend the functionality. Wanting to do more evidence
based language design, I also prototyped a small practice/playground app,
thinking we'd put it out and see what kind of feedback we could get. 


Toomas stepped up and suggested an alternative, refinement-based, interface. He
did a number of versions of that, and then we had to decide what to do next.
There was a great deal of design discussion, still going on, about behavior
details. Once you start adding options, it's easy for things to become confusing
for the user. We need to strike a balance between ease of use and flexibility.
Split is meant to handle the most common cases, and those with the most
leverage, not every case. And while a refinement-based interface seems natural
for Reducers, we also know how readable parse, draw, and VID dialects are. There
are pros and cons to each, but we don't want dual interfaces, which will be
confusing. If a function is dialected, any refinements should work in support of
that dialect. So the test app was reimagined by @GalenIvanov to compare the two
approaches.


Here's a screenshot of the test app, which we'll release to the community in
January.




We learned by doing this that it's hard to compare them side by side, without
having the user write full calls directly. That defeats some of the purpose, and
the DRY principle, so we'll put this one out, then revise it based on feedback.




MARKUP CODEC



Who knew that parsing HTML and XML would be the easy part? Well, many Reducers
would. What they, and we on the team, might not have guessed, is just how hard
it is to decide on a data format for the output. Red gives us many options, and
XML gives us many headaches. The two formats, while closely related, also have
some critical differences. Fortunately, once @rebolek set things up so we could
play, and made the emitter modular, we could look at real examples and dive even
deeper. What we discovered is that there is no perfect solution. No elegant
model to fit all uses and cases. Key to many insights was @dander's input, as he
works with XML a lot. Turns out, an infinitely extensible format is infinitely
challenging to nail down.


Should we emphasize path access? Being data driven, people probably shouldn't
hardcode their field names, but working with known data makes it a clear access
model. Should attributes come before or after the text/content for a tag? As we
learned, attributes aren't always small, so the locality argument isn't won
either way there. Is it better to provide an interface to the structure and tell
people to always use that, or to create a bland and obvious data structure that
is possible to access in many ways? Will these things all complicate HOF access,
which we know we want to leverage? How much do we need to care about efficiency?
We don't want to be wasteful without purpose, but if we're too miserly, users
may pay the price because it's harder to use. If we make more things implicit,
do we paint ourselves into a corner somehow?


What we settled on was a modular approach, so there will be more than one
standard emitter. What is yet undecided is how other emitters might be
supported. They will likely be quite custom, as the standard versions will cover
most needs. But is it worth making the system extensible? Once you have a
result, it's easy to post-process into your preferred format. For now that's our
recommended approach.




CLI MODULE



If you don't follow our channels on Gitter, you may not know about Boris' CLI
module. It's very slick, very Reddish, and will become a standard part of Red in
the near future. You won't believe how easy it is to create rich command line
interfaces for your Red apps with this feature. Huge thanks to @hiiamboris for
all his innovation and work on it.




IPV6 DATATYPE



It hasn't been merged to the mainline yet, but it's fully operational. You can
see the code here, and some lexer tests here. You may be impressed that it's
only a couple hundred lines of code, not counting the lexer changes, and think
it was easy. It wasn't. As usual, there was a lot of design chat and compromise
involved. For example, the name is not 100% finalized because, technically, the
datatype itself is more generally applicable, being simply a vector of numbers
internally. You can think of it like a tuple! on steroids. Less slots (8 vs 12),
but each slot can hold a larger value (tuple! slots are limited to byte values).


Just as tuple! is a general name, used both for IPv4 addresses and colors, but
also useful for other things, IPv6! could be used for things like GUIDs or
extended time values. But the lexical form for GUID/UUID values is quite
different, even ignoring the shortcut forms in the IPv6 specification. As you
probably know, lexical space is tight in Red, and the colon is an important
character in other places, and URL lexical forms were impacted, so this is a
deep change and commitment, in that regard. Why do it then?


Because IPv6 networking support was already in place in Red, and IPv6 is the
future. How often people will write literal URLs like
http://[FEDC:BA98:7654:3210:FEDC:BA98:7654:3210]:80/index.html we can't say. But
we do know that addresses often end up in config files as data and that modern,
dynamic systems generate addresses dynamically. They will appear in log files,
messages, and more. As with the value of other lexical forms in Red, it's an
important one that is part of our modern networking vocabulary.




GETTING NEAR



@dockimbel created a new branch here, which will interest almost every Reducer.
It's not ready yet, but expect it to be available in January. For those who used
R2, you may recall that errors gave you a Near field, to hint at where the error
occurred. Red will get this feature when the new branch is merged. e.g., in Red
today you get this:


    >> 1 / 0
    *** Math Error: attempt to divide by zero
    *** Where: /
    *** Stack:  


Where in R2 you got this:


    >> 1 / 0
    ** Math Error: Attempt to divide by zero
    ** Near: 1 / 0


A little extra information goes a long way. We're anxious to see all the virtual
smiles this features brings.




THE DAILY GRIND



We closed roughly 120 tickets in 2021, that's 10 per month. We also merged
almost 50 PRs. These numbers don't sound large, but when you consider how much
time and effort may go into the deep ones, along with all the other work done,
it's steady progress. We'd love for both tickets and pending PRs to be at zero,
but that's not practical for a project like Red. The deep core team must have
uninterrupted time for design and bigger, more complex tasks.




ROADMAP





Q4 2021 (RETROSPECTIVE)



 * We hoped to have `format` and `split` deployed, but they will push back to
   Jan-2022.
 * `CLI` module approved, needs to be merged, then refined as necessary.
 * `Markup Codec` took longer than expected due to extensive design chat on
   formats.
 * Interpreter instrumentation, with PoC debugger and profiler. Took longer than
   expected, but are out now.
 * Async I/O, out but some extra bits didn't make it in. One unplanned addition
   was `IPv6!` as a datatype. It's experimental, and subject to change.
 * @galenivanov did some great work on his animation dialect, but @toomasv's
   `diagram` dialect took a back seat and will move to Q1 2022.
 * Audio has 3 working back ends and a basic port implementation. Next up is
   higher level design, device and format enumeration, and device control. A
   `port!` may not be the way to go for all this, but it was step one.
 * Animation has more great examples all the time. Like this and this.
   @GalenIvanov is doing great work, and we are planning to make his dialect a
   standard addition to Red.






2022



I'm not going to list items in any particular order, because our plans often
change. This way you have things to look forward to, but still with an element
of surprise.



 * `Table` module, `node!` datatype and other REP reviews
 * Full HTTP/S protocol and basic web server framework
 * New DiaGrammar release
 * Animation dialect
 * New release process
 * New web sites updated and live
 * Red/C3 (Including ETH 2.0 client protocol)
 * Red Language Specification (Principles, Core Language, Evaluation Rules,
   Datatype Specs (including literal forms), Action/Native specs, Modules spec.
 * 64-bit support (LLVM was a possibility, but we learned from Zig that LLVM
   breaking changes can be quite painful for small teams to keep up with. We may
   be better off continuing to roll our own, though it's a big task.)
 * Android update
 * Red Spaces cross-platform GUI
 * Module and package system design
 * RAPIDE (Rapid API Development Environment)





RAPIDE, FROM REDLAKE TECHNOLOGIES



If you've used Postman or Insomnia, you know what the most popular tools in the
API IDE space look like today. If you haven't used them, but use APIs, they're
worth a look. For all that those tools do, and there are a few other players in
the space, there is a lot they don't do. We think we can add a lot of value in
the API arena, thanks to Red's superpowers and how important data-centric
thinking is. For example, testing a group or series of APIs together seems like
it could be greatly improved. Also, how APIs are found, and collaboration
possibilities.


While we haven't set a release date, the plan is to start work on RAPIDE in Q2
2022, after we wrap up some infrastructure pieces it will rely on. 






IN CONCLUSION



Happy New Year to all, and may 2022 see us all healthy, happy, and writing more
Red. :^)




Posted by Gregg Irwin at 9:23 PM 11 comments:




AUGUST 4, 2021


LONG TIME NO BLOG



 It's been almost a year since our last blog post. Sorry about that. It's one of
those things that falls off our radar without a person dedicated to it, and we
run lean so don't have anyone filling that role right now. We know it's
important, even if we have many other channels where people can get information.
So here we are.

Last year was a tough year all around, even for us. We were already a
remote-only team, but the effect the pandemic had on the world, particularly
travel, hit us too. We had some team changes, and also split our focus into
product development alongside core Red Language development. This is necessary
for sustainability, because people don't pay for programming languages, and they
don't pay for Open Source software. There's no need to comment on the exceptions
to these cases, because they are exceptions. The commercial goal, starting out,
is to focus on our core strengths and knowledge, building developer-centric
tools. Our first product, DiaGrammar for Windows, was released in December 2020,
and we've issued a number of updates to it since then. Our thanks to Toomasv for
his ingenuity and dedication in creating DiaGrammar. We are a team, but he
really accepted ownership of the project and took it from an idea to a great
product. Truly, there is nothing else like it on the market. 

We learned a lot from the process of creating a product, and will apply that
experience moving forward. An important lesson is that the product itself is
only half the work. As technologists, we're used to writing the code and maybe
writing some docs to go with it. We don't think about outreach, marketing,
payments, support, upgrade processes for users, web site issues, announcements,
and more. The first time you do something is the hardest, and we're excited to
improve and learn more as we update DiaGrammar and work on our next product.
We'll probably announce what it will be in Q4. One thing we can say right now is
that the work on DiaGrammar led to a huge amount of work on a more general
diagramming subsystem for Red. It's really exciting, and we'll talk more about
that in a future blog post.


SO WHAT HAVE WE BEEN DOING?

Since our last blog post we've logged over 400 fixes and 100 features into Red
itself. Some of these are small, but important, others are headline-worthy; some
are deep voodoo and some visible to every Reducer (what we call Red users). For
example, most people use the console (the REPL), so the fixes and improvements
there are easy to see. A prime feature being that the GUI console, but not the
CLI console, didn't show output if the UI couldn't process events. This could
happen if you printed output in a tight loop. The results would only show up at
the termination of the loop, when the system could breathe again. That's been
addressed, but wasn't easy and still isn't perfect. Red is still single
threaded, so there's no separate UI thread (pros and cons there). We make these
tradeoffs every day, and need feedback from users and real world scenarios to
help find the right balance. Less obvious are things like improvements to parse,
which not everyone uses. Or how fmod works across platforms, and edge cases for
lexical forms (e.g. is -1.#NaN valid?). The latter is particularly important,
because Red is a data language first.

JSON is widely used, but people may not notice that the JSON decoder is 20x
faster now, unless they're dealing with extremely large JSON datasets. JSON is
so widely used that we felt the time spent, and the tradeoffs made, were worth
it. It also nicely shows one of Red's strengths. Profiling showed that the codec
spent a lot of time in its unescape function. @hiiamboris rewrote that as a
Red/System routine, tweaked it, and got a massive speedup. No external compiler
needed, no need to use C, and the code is inlined so it's all in context. Should
your JSON be malformed, you'll also get nicer error information now. As always,
Red gives you options. Use high level Red as much as possible, for the most
concise and flexible code, but drop into Red/System when it makes sense.

Some features cross the boundary of what's visible. A huge amount of work went
into D2D support on Windows. D2D is Direct2D, the hardware-accelerated backend
for vector graphics on Windows. For users, nothing should change as all the
details are hidden. But the rendering behavior is not exactly the same. We try
to work around that, but sometimes users have to make adjustments as well; we
know because DiaGrammar is written in Red and uses the draw dialect heavily.
It's an important step forward, but comes at a cost. GDI+ is now a legacy
graphics back end, and won't see regular updates. Time marches on and we need to
look forward. As if @qtxie wasn't busy enough with that, he and @dockimbel also
pushed Full I/O forward in a big way. It hasn't been merged into the main branch
yet, but we expect that to happen soon. @rebolek has been banging on it, and has
a full working HTTP protocol ready to go, which is great. TLS/SSL support gets
an A+ rating, which is also a testament to the design and implementation. It's
important to note that the new I/O system is a low level interface. The higher
level API is still being designed. At the highest level, these details will all
be hidden from users. You'll continue to use read, write, save, load exactly as
you do today, unless you need async I/O. 

Another big "feature" came from @vazub: NetBSD support. The core team has to
focus on what stands to help the project overall, with regard to users and
visibility. Community support for lesser known platforms is key. If you're on
one of those platforms, be (or find) a champion. We'll help all we can, but
that's what Open Source is for. Thanks for this contribution @vazub!

We also have some new Python primers up, thanks to @GalenIvanov. Start
at Coming-to-Red-from-Python. Information like this is enormously important. Red
is quite different from other languages, and learning any new language can be
hard. We're used to a set of functionality and behaviors, which sometimes makes
the syntax the easiest part to learn. Just knowing what things are called is a
learning curve. Red doesn't use the same names, because we (and Carl when he
designed Rebol) took a more holistic view. That's a hard sell though. We feel
the pain. A user who found Red posted a video as they tried to do some basic
things. We learned a lot from watching it. Where other languages required you to
import a networking library, it's already built into Red. When they were looking
for request or http.get, and expecting strings to be used for URLs, they
couldn't find answers. In Red you just read http://.... It's obvious to us, but
not to the rest of the world. So these new primers are very exciting. We have
reference docs, and Red by Example, but still haven't written a User's Guide for
Red. We'll get there though. 


WHY DO THINGS TAKE SO LONG?

Even with that many fixes and features logged, and huge amounts of R&D, it can
still feel like progress is slow. The world moves fast, and software projects
are often judged by their velocity. We even judge ourselves that way, and have
to be reminded to stay the course, our course, rather than imitating others.
Red's flexibility also comes into play. Where other languages may limit how you
can express solutions, we don't. It's so flexible that people can do crazy
things or perform advanced tricks which end up being logged as bugs and wishes.
Sometimes we say No (a lot of times in fact), but we also try to keep an open
mind. We have to ask "Should that be allowed?", "Why would you want to do that
(even though I never have)?", and "What are the long term consequences?" We have
to acknowledge that Red is a data format first, and we never want to break that.
It has to evolve, but not breaking the format is fundamental. And while code is
expected to change, once people depend on a function or library it causes them
pain if we break compatibility. We don't want to do that, though sometimes we
will for the greater good and the long view. There are technical bandages we can
patch over things, but it's a big issue that doesn't have a single solution. Not
just for us, but for all software development. We'll talk more about this in the
future as well.

I'll note some internal projects related to our "slow and steady" process:



 * Composite is a simple function that does for strings what compose does for
   blocks. It's a basic interpolator. But the design has taken many turns. Not
   just in the possible notations, but whether it should be a mezzanine
   function, a macro, or both. Each has pros and cons (Side note: we don't often
   think about "cons" being an abbreviation for "consequences"). This simple
   design and discussion is stalled again, because another option would be a new
   literal form for interpolated strings. That's what other languages do, but is
   it a good fit for Red? We belabor the point of how tight lexical space is
   already, so have to weigh that against the value of a concise notation.
 * Non-native GUI. Red's native GUI system was chosen in response to Rebol's
   choice to go non-native. Unfortunately it's another case of needing both.
   Being cross platform is great for Red users, but Hell for us. Throw in mobile
   and it's even worse. Don't even talk about running in the browser. But every
   platform has native widget limitations. Once you move beyond static text,
   editable fields, buttons, and simple lists, you're in the realm of "never the
   twain shall meet". How do you define and interact with grids and tables or
   collapsible trees? Red already has its own rich-text widget, so you don't
   have to embed (even if you could) an entire web browser and then write in
   HTML and CSS. To address all this, with much research and extensive use case
   outlines, @hiiamboris has spent a lot of time and effort on Red Spaces. Show
   me native widgets that can do editable spiral text, put any layout inside a
   rotator, or define recursive UIs. I didn't think so. Oh, and the wiggling you
   see in the GIFs there are not mistakes or artifacts, they are tests to show
   that any piece of the UI can be animated.
 * Other projects include format, split, HOFs, and modules, each with a great
   deal of design work and thought put into them. As an example, look at Boris'
   HOF analysis. They are large and important pieces, based on historical and
   contemporary research, but not something we will just drop into Red, though
   we could. A simple map function is a no-brainer, and could have been there
   day one. But that's not how we work. It's not a contest to see how many
   features we can add, or how fast; but how we can move software forward, make
   things easier, and push the state of the art. Not just in technical features
   (the engineering part), but in the design of a language and its ecosystem.


NOT EVERYONE HAS THESE PROBLEMS

An important aspect of Red is being self-contained. We talk about this a lot.
Yes, we're considering LLVM as a target, but that has a big cost, not just
benefits. Using our own compiler for everything also has costs, like slowing the
move to 64-bit which is an issue for Mac users now. Workarounds like VMs and
Docker containers are just that. We want things to be easy for you, but that
doesn't mean they're easy for us. Here's an example.


Boris found a bug related to printing time values in Red/System. @dockimbel
finally tracked it down, and posted this investigative report:


> @hiiamboris It was a (R/S) compiler issue afterall. ;-)size? a was the guilty
> part. The compiler was wrongly generating code for loading a even
> though size? is statically evaluated by the compiler and replaced by a static
> integer value. Given that a was a float type, its value was pushed onto the
> x87 FPU stack, but never popped. That stack has a 7 slots limit. Running the
> loop 5 times was enough to leave only 2 slots free. When the big float
> expression is encountered in dtoa library, it requires 3 free slots on the FPU
> stack, which fails and results in producing a NaN value, which wreaks havoc in
> the rest of the code.

The fix in the compiler was trivial (fetch-expression/final vs fetch-expression)
but getting to that point was not. Understanding machine architectures at the
lowest levels isn't for everyone, but even though our compiler code will be
rewritten in the future, it's small and maintainable today. If we rely on GCC,
Clang, or other compilers, hitting a bug may mean hitting a wall. So while there
are costs to using our own compiler, there are also costs to depending on
others. Robert Heinlein popularized  TANSTAAFL, but the concept is not science
fiction. As a side note, just as we moved from GDI+ to D2D,  x87 for float
support was an early choice meant to support older platforms and we are planning
to switch to SSE.
If compilers are your thing, or you like system level programming, join our
community and get to know Red/System. See how our toolchain works, and consider
joining us.



THE BIG PICTURE

I just read 101 things I learned in architecture school (which I heard about via
Kevlin Henney, (though it may not have been that specific talk) and what struck
me the most about it is how we've commandeered the word "architecture" for
software but completely removed the human aspect. An architect does so much more
than we do. Software architects are really structural engineers. If a single
developer builds a complete app, they have to do the UI. They engineer a
building and then slap on whatever sheathing is at hand, cutting doors and
windows without concern for their location. And the app is viewed in isolation,
as if it's the only thing a user has on their system, without consideration for
its site, context, or relationships. What makes real architecture hard (and why
the author notes that many architects hit their stride when they are older), is
that you have to know so much. So many considerations, disciplines, and
constraints are involved, and you have to unify them. It's both creative and
scientific. What makes great architecture great is that it makes your experience
better. Maybe even wonderful. If we only think about the mechanical aspects, our
software will never be beautiful.


We haven't articulated this view for what we do, I think because we didn't
realize it. At least I didn't. We talk about the whole being greater than the
sum of its parts, and not just making everything libraries so it feels more
natural and less mechanical. How a REPL and single exe make it easier to get
started, and not having to use many tools is better. But we haven't explicitly
said "Here's how it's laid out, and why. Here's how it's put together; these are
the critical elements. Here's what it looks like from a distance, and when you
enter its space." Implicitly we do that every day, through the work, but we
don't talk about it. Or only once a year.




Posted by Gregg Irwin at 2:21 AM 9 comments:




AUGUST 20, 2020


RED/SYSTEM: NEW FEATURES



In the past months, many new features were added to Red/System, the low-level
dialect embedded in Red. Here is a sum up if you missed them.


SUBROUTINES

During the work on the low-level parts of the new Red lexer, the need arised for
intra-function factorization abilities to keep the lexer code as DRY as
possible. Subroutines were introduced to solve that. They act as the GOSUB
directive from Basic language. They are defined as a separate block of code
inside a function's body and are called like regular functions (but without any
arguments). So they are much lighter and faster than real function calls and
require just one slot of stack space to store the return address. 

The declaration syntax is straightforward:

    <name>: [<body>]

    <name> : subroutine's name (local variable).
    <body> : subroutine's code (regular R/S code).


To define a subroutine, you need to declare a local variable with the
subroutine! datatype, then set that variable to a block of code. You can then
invoke the subroutine by calling its name from anywhere in the function body
(but after the subroutine own definition).

Here is a first example of a fictive function processing I/O events:

    process: func [buf [byte-ptr!] event [integer!] return: [integer!]
        /local log do-error [subroutine!]
    ][
        log: [print-line [">>" tab e "<<"]]
        do-error: [print-line ["** Error:" e] return 1]
    
        switch event [
            EVT_OPEN  [e: "OPEN"  log unless connect buf [do-error]]
            EVT_READ  [e: "READ"  log unless receive buf [do-error]]
            EVT_WRITE [e: "WRITE" log unless send buf    [do-error]]
            EVT_CLOSE [e: "CLOSE" log unless close buf   [do-error]]
            default   [e: "<unknown>" do-error]
        ]
        0
    ]


This second example is more complete. It shows how subroutines can be combined
and how values can be returned from a subroutine:

    #enum modes! [
    	CONV_UPPER
    	CONV_LOWER
    	CONV_INVERT
    ]

    convert: func [mode [modes!] text [c-string!] return: [c-string!]
        /local
            lower? upper? alpha? do-conv [subroutine!]
            delta [integer!]
            s     [c-string!]
            c     [byte!]
    ][
        lower?:  [all [#"a" <= c c <= #"z"]]
        upper?:  [all [#"A" <= c c <= #"Z"]]
        alpha?:  [any [lower? upper?]]
        do-conv: [s/1: s/1 + delta]
        delta:   0
        s:       text

        while [s/1 <> null-byte][
            c: s/1
            if alpha? [
                switch mode [
                    CONV_UPPER  [if lower? [delta: -32 do-conv]]
                    CONV_LOWER  [if upper? [delta: 32 do-conv]]
                    CONV_INVERT [delta: either upper? [32][-32] do-conv]
                    default     [assert false]
                ]
            ]
            s: s + 1
        ]
        text
    ]
    
    probe convert CONV_UPPER "Hello World!"
    probe convert CONV_LOWER "There ARE 123 Dogs."
    probe convert CONV_INVERT "This SHOULD be INVERTED!"


will output:

    HELLO WORLD!
    there are 123 dogs.
    tHIS should BE inverted!


Support for getting a subroutine address and dispatching dynamically on it is
planned to be added in the future (something akin computed GOTO). More examples
of subroutines can be found in the new lexer code, like in the load-date
function.


NEW SYSTEM INTRINSICS



Several new extensions to the system path have been added.




LOCK-FREE ATOMIC INTRINSICS



A simple low-level OS threads wrapper API has been added internally to the Red
runtime as preliminary work on supporting parts of IO concurrency and parallel
processing in the future. In order to complement it, a set of atomic intrinsics
were added to enable the implementation of lock-free and wait-free algorithms in
a multithreaded execution context.


The new atomic intrinsics are all documented here. Here is a quick overview:
 * system/atomic/fence: generates a read/write data memory barrier.
 * system/atomic/load: thread-safe atomic read from a given memory location.
 * system/atomic/store: thread-safe atomic write to a given memory location.
 * system/atomic/cas: thread-safe atomic compare&swap to a given memory
   location.
 * system/atomic/<math-op>: thread-safe atomic math or bitwise operation to a
   given memory location (add, sub, or, xor, and).



Other new intrinsics
 * system/stack/allocate/zero: allocates a storage space on stack and zero-fill
   it.
 * system/stack/push-all: saves all registers to stack.
 * system/stack/pop-all: restores all registers from stack.
 * system/fpu/status: retrieves the FPU exception bits status as a 32-bit
   integer.





IMPROVED LITERAL ARRAYS

The main change is the removal of the hidden size inside the /0 index slot. The
size of a literal array can now only be retrieved using the size? keyword, which
is resolved at compile time (rather than run-time for /0 index access).

A notable addition is the support for binary arrays. Those arrays can be used to
store byte-oriented tables or embed arbitray binary data into the source code.
For example:

    table: #{0042FA0100CAFE00AA}
    probe size? table                      ;-- outputs 9
    probe table/2                          ;-- outputs "B"
    probe as integer! table/2              ;-- outputs 66

The new Red lexer code uses them extensively.




VARIABLES AND ARGUMENTS GROUPING

It is now possible to group the type declaration for local variables and
function arguments. For example:

    foo: func [
        src dst    [byte-ptr!]
        mode delta [integer!]
        return:    [integer!]
        /local
            p q buf  [byte-ptr!]
            s1 s2 s3 [c-string!]
    ]
 

Note that the compiler supports those features through code expansion at compile
time, so that error reports could show each argument or variable having its own
type declaration.


INTEGER DIVISION HANDLING

Integer division handling at low-level has notorious shortcomings with different
handling for each edge case depending on the hardware platform. Intel IA-32
architecture tends to handle those cases in  a slightly safer way, while ARM
architecture produces erroneous results silently typically for the following two
cases:



 * division by zero
 * division overflow (-2147483648 / -1)



IA-32 CPU will generate an exception, while ARM ones will return invalid results
(respectively 0 and -2147483648). This makes it difficult to produce code that
will behave the same on both architectures when integer divisions are used. In
order to reduce this gap, R/S compiler will now generate extra code to detect
those cases for ARM targets and raise a runtime exception. Such extra checkings
for ARM are produced only in debug compilation mode. In release mode, priority
is given to performance, no runtime exception will occur in such cases on ARM
(as the overhead is significant). So, be sure to check your code on ARM platform
thoroughly in debug mode before releasing it. This is not a perfect solution,
but at least, it makes it possible to detect those cases through testing in
debug mode.


OTHERS

Here is a list of other changes and fixes in no particular order:

 * Cross-referenced aliased fields in structs defined in same context are now
   allowed. Example:
   
       a!: alias struct! [next [b!] prev [b!]]
       b!: alias struct! [a [a!] c [integer!]]
   

 * -0.0 special float literal is now supported.
 * +1.#INF is also now supported as valid literal in addition to 1.#INF for
   positive infinite.
 * Context-aware get-words resolution.
 * New #inline directive to inline assembled binary code.
 * Dropped support for % and // operators on float types, as they were relying
   on FPU's relative support, the results were not reliable across platforms.
   Use fmod function instead from now on.
 * Added --show-func-map compilation option: when used, it will output a map of
   R/S function addresses/names, to ease low-level debugging.
 * FIX: issue #4102: ASSERT false doesn't work.
 * FIX: issue #4038: cast integer to float32 in math expression gives wrong
   result.
 * FIX: byte! to integer! conversion not happening in some cases. Example: i:
   as-integer (p/1 - #"0")
 * FIX: compiler state not fully cleaned up after premature termination. This
   affects multiple compilation jobs done in the same Rebol2 session, resulting
   in weird compilation errors.
 * FIX: issue #4414: round-trip pointer casting returns an incorrect result in
   some cases.
 * FIX: literal arrays containing true/false words could corrupt the array.
   Example: a: ["hello" true "world" false]
 * FIX: improved error report on bad declare argument.










Posted by Nenad Rakocevic at 12:54 PM 7 comments:
Labels: arrays, bugfixes, compilation, exceptions, features, floating point,
FPU, IA-32, literal arrays, math, pointers, red/system, runtime errors, VFP



AUGUST 3, 2020


A NEW FAST AND FLEXIBLE LEXER


A programming language lexer is the part in charge of converting textual code
representation into a structured memory representation. In Red, it is
accomplished by the load function, which calls the lower-level transcode native.
Until now, Red was relying on a lexer entirely written using the Parse dialect.
Though, the parsing rules were constructed to be easily maintained and not for
performance. Rewriting those rules to speed them up could have been possible,
but rewriting the lexer entirely in Red/System would give the ultimate
performance. It might not matter for most user scripts, but given that Red is
also a data format, we need a solution for fast (near-instant) loading of huge
quantities of Red values stored in files or transferred through the network.


The new lexer main features are:
 * High performance, typically 50 to 200 times faster than the older one.
 * New scanning features: identify values and their datatypes without loading
   them.
 * Instrumentation: customize the lexer's behavior at will using an
   event-oriented API.

The reference documentation is available there. This new lexer is available in
Red's auto-builds since June.


PERFORMANCE



Vastly increased performance is the main driver for this new lexer. Here is a
little benchmark to let you appreciate how far it gets.


The benchmarking tasks are:
 * 100 x compiler.r: loads 100 times compiler.r source file from memory (~126KB,
   so about ~12MB in total).
 * 1M short integers: loads a string of 1 million `1` separated by a space.
 * 1M long integers: loads a string of 1 million `123456789` separated by a
   space.
 * 1M dates: loads a string of 1 million `26/12/2019/10:18:25` separated by a
   space.
 * 1M characters: loads a string of 1 million `#"A"` separated by a space.
 * 1M escaped characters: loads a string of 1 million `#"^(1234)"` separated
   by a space.
 * 1M words: loads a string of 1 million `random "abcdefghijk"` separated by a
   space.
 * 100K words: loads a string of 100 thousands `random "abcdefghijk"` separated
   by a space.



And the results are (on a Core i7-4790K):

    Loading Task             v0.6.4 (sec)    Current (sec)    Gain factor
    ---------------------------------------------------------------------
			
    100 x compiler.r	      41.871            0.463	           90
    1M short integers	      14.295            0.071	          201
    1M long integers	      18.105            0.159	          114
    1M dates	              29.319	        0.389	           75
    1M characters             14.865            0.092             162
    1M escaped characters     14.909	        0.120             124
    1M words	                 n/a	        1.216	          n/a
    100K words	              23.183	        0.070	          331


Notes: 


- Only transcode is used in the loading tasks (system/lexer/transcode in 0.6.4).


- The "1M words" task fails on 0.6.4 as the symbol table expansion time is
exponential due to some hashtable bugs. That also explains the big gap for the
"100K words" task. Those issues are fixed in the current version and the symbol
table further optimized for speed. Though, the execution time increase between
100K and 1M words tests in new lexer is not linear which may be explained by a
high number of collisions in the internal hashtable due to limited input
variability.


- The 0.6.4's lexer can only process strings as input, while the new lexer only
processes internally only UTF-8 binary inputs. The input strings were converted
to the lexer's native format in order to more accurately compare their speed.
Providing a string instead of a binary series as input to the new lexer incurs
on average a ~10% speed penalty.




SCANNING



It is now possible to only scan tokens instead of loading them. Basically, that
means identifying a token's length and type without loading it (so without
requiring extra memory and processing time). This is achieved by using the new
scan native.

    >> scan "123"
    == integer!
    >> scan "w:"
    == set-word!
    >> scan "user@domain.com"
    == email!
    >> scan "123a"
    == error!



It is possible to achieve even higher scanning speed by giving up a bit on
accuracy. That is the purpose of the scan/fast refinement. It trades maximum
performance for type recognition accuracy. You can find the list of "guessed"
types in the table there.



    >> scan/fast "123"
    == integer!
    >> scan/fast "a:"
    == word!
    >> scan/fast "a/b"
    == path!




Scanning applies to the first token in the input series. When an iterative
application is needed in order to scan all tokens from a given input, the /next
refinement can be used for that. It will return the input series past the
current token allowing to get the precise token size in the input string. It can
be used in combination with /fast if required. For example:

    src: "hello 123 world 456.789"
    until [
        probe first src: scan/next src
        empty? src: src/2
    ]

Outputs:

    word!
    integer!
    word!
    float!





MATCHING BY DATATYPE IN PARSE



The new lexer enables also matching by datatype directly from Parse dialect.
Though, this feature is limited to binary input only.

    >> parse to-binary "Hello 2020 World!" [word! integer! word!]
    == true
    >> parse to-binary "My IP is 192.168.0.1" [3 word! copy ip tuple!]
    == true
    >> ip
    == #{203139322E3136382E302E31}
    >> load ip
    == 192.168.0.1


Notice that the whitespaces in front of tokens are skipped automatically in this
matching mode.




INSTRUMENTATION



Lexers in Red and Rebol world used to be black boxes, this is no longer the case
with Red's new lexer and its tracing capabilities. It is now possible to provide
a callback function that will be called upon lexer events triggered while
parsing tokens. It gives deeper control to users, for example allowing to:
 * Trace the behavior of the lexer for debugging or statistical purposes.
 * Catch errors and resume loading by skipping invalid data.
 * On-the-fly input transformation (to remove/alter some non-loadable parts).
 * Extend the lexer with new lexical forms.
 * Process serialized Red data without having to fully load the input.
 * Extract line comments that would be lost otherwise.



Lexer's tracing mode is activated by using the /trace refinement on transcode.
The syntax is:

    transcode/trace <input> <callback>

    <input>    : series to load (binary! string!).
    <callback> : a callback function to process lexer events (function!).


That function is called on specific events generated by the lexer: prescan,
scan, load, open, close, error. The callback function and events specification
can be found there. 


A default tracing callback is provided in system/lexer/tracer:

    >> transcode/trace "hello 123" :system/lexer/tracer
    
    prescan word 1x6 1 " 123"
    scan word 1x6 1 " 123"
    load word hello 1 " 123"
    prescan integer 7x10 1 ""
    scan integer 7x10 1 ""
    load integer 123 1 ""
    == [hello 123]


That tracing function will simply print the lexer event info. If a syntax error
occurs, it will cancel it and resume on the next character after the error
position.


Several more sophisticated examples can be found on our red/code repository.








IMPLEMENTATION NOTES



This new lexer has been specifically prototyped and designed for performance. It
relies on a token-oriented pipelined approach consisting of 3 stages:
prescanning, scanning and loading.


Prescanning is achieved using only a tight loop and a state machine (FSM). The
loop reads UTF-8 encoded input characters one byte at a time. Each byte is
identified as part of a lexical class. The lexical class is then used to
transition from one state to another in the FSM, using a big transition table.
Once a terminal state (state names with a `T_` prefix) or input's end is
reached, the loop exits, leading to the next stage. The result of the
prescanning stage is to locate a token begin/end positions and give a pretty
accurate guess about the token's datatype. It can also detect some syntax errors
if the FSM cannot reach a proper datatype terminal state. This approach provides
the fastest possible speed for tokens detection, but it cannot be fully
accurate, nor can it validate deeply the token content for some complex types
(e.g. dates). 


Adding more states would provide greater accuracy and cover more syntatic forms,
but at the cost of growing the transition table a lot due to the need to
duplicate many state. Currently the table weights 2440 bytes, which is already
quite big to be kept entirely in the CPU data cache (usually 8, 16 or 32KB per
core, the lexical table uses 1024 bytes and there two other minor tables used in
the tight loop). The data cache also needs to handle the parsed input data and
part of the native stack, so the available space is limited.


The tight loop code is also optimized for keeping branch mispredictions as low
as possible. It currently only relies on two branchings. The loop code could be
also further reduced by, for example, pre-multiplying the state values to avoid
the multiplication when calculating the table entry offset. Though, we need to
wait for a fully optimizing code generation backend before trying to extract
more performance from that loop code, or we might be taking wrong directions.


Scanning stage happens when a token has been identified. It consists in
eventually calling a scanner function to deep-check the token for errors and
more accurately determine the datatype. Loading stage then follows (unless only
scanning was requested by the user). It will eventually call a loader function
that will construct the Red value out of the token. In case of any-block series,
the scanners will actually do the series construction on reaching the ending
delimiter (which requires special handling for paths), so no loader is needed
there. Conversely, loaders can be invoked in validating mode only (not
constructing the value), in order to avoid code duplication when complex code is
required for decoding/validating the token (e.g. date!, time!, strings with
UTF-8 decoding,...).


For the record, there was an attempt at creating specific FSM for date! and
time! literal forms parsing, to reduce the amount of rules that need to be
handled by pure code. The results were not conclusive, as the amount of code
required for special case handling was still significant and the performance of
the FSM parsing loop was below the current pure code version. This approach can
be reexamined once we get the fully optimizing backend.


The FSM states, lexical classes and transitions are documented in
lexer-states.txt file. A simple syntax is used to describe the transitions and
possible branching from one state to others. The FSM has three possible entry
points: S_START, S_PATH and S_M_STRING. Parsing path items requires specific
states even for common types. For curly-braced strings, it is necessary to exit
the FSM on each occurrence of open/close curly braces in order to count the
nested ones and accurately determine where it ends. In both those path and
string cases, the FSM needs to be re-entered in a different state than S_START.


In order to build the FSM transition table, there is a workflow that goes from
that lexer-states.txt file to the final transition table data in binary. It
basically looks like this:

    FSM graph -> Excel file -> CSV file -> binary table

The more detailed steps are:
 1. Manually edit changes in the lexer-states.txt file.
 2. Port those changes into the lexer.xlsx file by properly setting the
    transition values.
 3. Save that Excel table in CSV format as lexer.csv.
 4. Run the generate-lexer-table.red script from Red repo root folder. The
    lexer-transitions.reds file is regenerated.

The lexer code relies on several other tables for specific types handling like
path ending detection, floating point numbers syntax validation, binary series
and escaped characters decoding. Those tables are either manually written (not
planned to be ever changed) or generated using this script.


Various other points worth mentioning:


- The lexer works natively with UTF-8 encoded binary buffers as input. If a
string! is provided as input, there is an overhead for converting internally
such string to binary before passing it to the lexer. A unique internal buffer
is used for those conversions with support for recursive calls.


- The lexer uses a single accumulative cells buffer for storing loaded values,
with an inlined any-block stack.


- The lexer and lexer callbacks are fully recursive and GC-compliant. Currently
callbacks can be function! only, this can be extended in the future to support
routines also for much faster processing.



Posted by Nenad Rakocevic at 2:06 PM 2 comments:
Labels: callback, implementation, lexer, load

Older Posts Home

Subscribe to: Posts (Atom)

 



    



Follow @red_lang on Twitter or on /r/redlang subreddit.

Chat with us in our Gitter room.



SEARCH THIS BLOG






TOTAL PAGEVIEWS

055173260350449555610077287096610621167127613931475157316821771188519732063216422752368246125562658276628612922



3,682,916


 * FIX: about function crashes on some Linux distros. - qtxie
 * FIX: issue #5305 (Stream output is not getting flushed on Linux) - qtxie
 * FIX: Linux: bugs in image/copy - qtxie
 * FIX: issue #5271 ([View] junk in the pane hangs the event loop) - qtxie
 * FIX: issue #5297 (Drop-down data being corrupted after 'selected set) - qtxie




BLOG ARCHIVE

 * ▼  2022 (2)
   * ▼  July (2)
     * New Red binaries
     * The Road To 1.0

 * ►  2021 (2)
   * ►  December (1)
   * ►  August (1)

 * ►  2020 (4)
   * ►  August (2)
   * ►  March (1)
   * ►  January (1)

 * ►  2019 (10)
   * ►  December (1)
   * ►  November (1)
   * ►  October (2)
   * ►  September (1)
   * ►  July (2)
   * ►  June (1)
   * ►  February (1)
   * ►  January (1)

 * ►  2018 (20)
   * ►  December (1)
   * ►  November (1)
   * ►  October (3)
   * ►  September (3)
   * ►  June (1)
   * ►  May (1)
   * ►  April (1)
   * ►  March (4)
   * ►  January (5)

 * ►  2017 (3)
   * ►  December (1)
   * ►  July (1)
   * ►  March (1)

 * ►  2016 (6)
   * ►  December (2)
   * ►  July (2)
   * ►  June (1)
   * ►  March (1)

 * ►  2015 (6)
   * ►  December (1)
   * ►  June (1)
   * ►  April (2)
   * ►  March (1)
   * ►  January (1)

 * ►  2014 (6)
   * ►  December (2)
   * ►  August (1)
   * ►  April (1)
   * ►  February (1)
   * ►  January (1)

 * ►  2013 (5)
   * ►  November (1)
   * ►  September (1)
   * ►  August (1)
   * ►  March (1)
   * ►  January (1)

 * ►  2012 (8)
   * ►  December (1)
   * ►  October (1)
   * ►  September (2)
   * ►  August (1)
   * ►  March (2)
   * ►  February (1)

 * ►  2011 (15)
   * ►  December (1)
   * ►  September (2)
   * ►  August (1)
   * ►  July (2)
   * ►  May (3)
   * ►  April (1)
   * ►  March (4)
   * ►  February (1)




TAGS CLOUD

2D (1) alias (1) alpha (1) Android (3) announce (7) ARM (4) armhf (1) arrays (2)
AST (1) beta (1) binaries (1) binary form (1) bitset (1) blog (1) booleans (1)
bridge (2) bugfixes (4) callback (2) casing (1) China (2) codecs (1) collation
(1) community (1) compilation (6) compiler (2) conference (2) console (8)
context (2) contribution (1) CSDN (1) date (1) Debian (1) demos (3) development
(1) dialect (1) dll (1) donations (1) download (1) Draw (1) DSL (1) dynamic
calls (1) ELF (1) encap (1) enum (1) exceptions (3) explorable explanations. (1)
features (2) floating point (3) floats (2) FPU (4) freebsd (1) functions (1)
gpio (1) GTK+ (1) GUI (7) hash (1) I/O (1) IA-32 (2) IEEE-754 (1) implementation
(2) interpreter (1) iOS (1) IW (1) java (2) jni (1) lexer (2) libc (1) libRed
(2) libRedRT (1) linker (1) linux (2) literal arrays (2) livecoding (1) load (2)
loader (1) macOS (1) MacOSX (2) macros (2) major (1) map (1) math (2) meeting
(1) namespaces (1) native (1) objective-c (1) objects (3) ownership (1) pair (1)
parse (5) path notation (1) paths (1) percent (1) pointers (2) port (2)
preprocessor (2) presentation (2) QEMU (1) questions (1) RaspberryPi (3) react
(1) reactive (5) Red (2) red/system (4) redbin (1) release (12) REPL (2) routine
(1) rules engine (1) runtime errors (3) sets (1) SFD (1) shared libraries (2)
sorting (1) sources (1) specs (2) stack (1) startups (1) stats (1) strings (1)
sum up (1) SVG (1) tests (4) time (1) toolchain (1) tuple (1) tutorial (1)
type-checking (1) typeset (1) Unicode (4) update (1) user group (1) vector (2)
VFP (2) VID (2) View (2) widgets (1) x87 (1)




Copyright 2011-2020 Nenad Rakocevic & Red Foundation. Powered by Blogger.



Diese Website verwendet Cookies von Google, um Dienste anzubieten und Zugriffe
zu analysieren. Deine IP-Adresse und dein User-Agent werden zusammen mit
Messwerten zur Leistung und Sicherheit für Google freigegeben. So können
Nutzungsstatistiken generiert, Missbrauchsfälle erkannt und behoben und die
Qualität des Dienstes gewährleistet werden.Weitere InformationenOk