Open to Collab

19 6 23

AbstractPhila PRO

AbstractPhil

https://civitai.com/user/AbstractPhila

AbstractEyes

AI & ML interests

datasets, research papers, experimentation, vision, classification, text encoders, tokenization, llms, diffusion, distillation, and more.

Recent Activity

updated a model 3 days ago

AbstractPhil/geolip-svae-text

updated a model 3 days ago

AbstractPhil/geolip-SVAE

repliedto their post 4 days ago

By trying to disprove the Omega H2 battery I have discovered; * Each topology formed by the H2 battery is deviant, none have a uniformly shared substrate of behavior. They are each uniquely independent per training set all with perfect recon. * Image recon can be tracked and mapped, yielding a consistently mapped and response 16.77m vocabulary potential. In the current spectrum testing at around 5 million unicode bytes. * The model scale shows patch size is related to how much data you want the model to represent within the model itself, and this has yet to see a capacity to this day. The MSE recons and yields - and the more data fed, the more they yield. * The scaling principle shows that the model indefinitely scales upward and each level of the model can be iteratively captured upward to form deviant and uniformly consistent repeatable pathways of implicit codewise response, not just arbitrary bitwise recall. Meaningful implicit learned utility. * Image recon patch size should match the slice of image you want to represent, as it uses patch smoothing per patch internally from identity. * byte trigrams are channel-agnostic, they do not require a channel count just a formula for recall at nGram recall 99.6% for byte-by-byte representations. With those comes an adjacently capable codebook. * sentencepiece preliminary tests show validity and reconstruction just like the byte trigrams, using the new byte trigram this would be arbitrarily convenient to recon a codebook for the structure. * binary trees learn a uniformly potent and powerful gating mechanism that required further exploration, each of them produces direct responsive independent capacity and the responses are controllable. * ternary experiments show the models are directly responsive to -1, 0, +1 behavior, so the quantization is very much a valid potential. * preliminary tests with the H2O1 series of batteries show the models are responding similar to natural universal elements in the universe itself

View all activity

Organizations

replied to their post 4 days ago

After tomorrow I'm taking a few days to recover from overwork and relax, so over the weekend I'll likely set up a few experiment sweeps or run some sweeps on the weekend if I don't start them on Monday or Tuesday.

Burning the daytime oil and midnight oil has been taxing, I need to relax for a bit.

replied to their post 5 days ago

I believe I've discovered the first needs-based component from this structure.

Centrifuged dissonance rupture sampling.

Magnitude testing with ripter shows multiple reduced magnitudes will coalesce less infinites and still converge, while the squared magnitude standard will introduce a type of unilaterally conjunctive series of ripter identified infinites.

For the baseline H2 trigram training:
The adjudication principality of compartmentalized information shows those infinites are nearly fully encompassing the state of the model, they are 22 identifiable omegas with only one finite and two persistent features, while the structure when analyzed shows the structure is predominantly nearly fully utilized. The only way for this to be possible is if the model is directly utilizing the infinites to make decisions.

https://huggingface.co/AbstractPhil/geolip-SVAE/blob/main/byte_trigram_proto_64_radmag_2/codebooks/sixteen_noise__topology.json

I have a working hypothesis based on omegas created by the H2 battery.

So far I've identified four types of potential Omega states using ripter which have been codified.

Using the default config magnitude of 2 on the baseline trigram battery we have generated;

"persistent_infinity_field" as the codified response.

https://huggingface.co/AbstractPhil/geolip-SVAE/blob/main/byte_trigram_proto_64_radmag_2_codebook/codebooks/sixteen_noise__topology.json

While pi magnitude sings a different tune entirely.

https://huggingface.co/AbstractPhil/geolip-SVAE/blob/main/byte_trigram_proto_64_pi_radmag_pi_codebook/codebooks/sixteen_noise__topology.json

"rupture_coalescence_field",

            "persistent_infinity_field": (
                "Most H0 classes remain infinite under the threshold; "
                "axis residents are mostly separated and persistent."
            ),
            "rupture_coalescence_field": (
                "Many axes persist, but several finite H0 deaths indicate "
                "controlled local coalescence/rupture."
            ),
            "infinity_pair_field": (
                "Mostly persistent infinity axes with small pair components."
            ),
            "finite_carrier_field": (
                "Finite merge activity is high enough to indicate a carrier "
                "rather than pure infinity field."
            ),

Each is indicated to be a potential omega and this will require heavy ablation to determine if this is true or grasping at noise.

replied to their post 7 days ago

Transformers default sentencepiece for the t5-base can't be manually swapped to the fast version, I had to manually load a fast version of sentencepiece rust to actually make use of it.

This was an inconvenience that cost quite a bit of debugging time, but now that it's resolved I can start the full trains.

Prelim 50 epoch shows full recon is possible.

The most confused pairs may have a mathematical rounding fault to be addressed, I'll figure that out when the 100 epoch finishes.

replied to their post 7 days ago

Organizing the branch system:
I didn't like my first design, I'll be keeping the prototypes in a separate package in the same repo. This is cleaner.

geolip_svae
svae_proto

Basically separate readmes, separate requirements, and so on. This will be cleaner and will scale naturally.

Musings:
In a vain attempt to keep things organized, I'll be segmenting experiments to another branch, but I need to be strict about my own behavior. Forming an experimental branch won't matter at this point if I can't keep the repos organized, much of the repo WILL BE experimental prototypes within the geolip_svae.prototypes package - but without any explicit guards for my own behavior. I need to clean things up a bit and prevent my own regular habits from causing chaos.

I've done this before, where I go off on huge experiment lines and create massive tangents of 20-200 experiments, so I'll be forming an experimental branch which uses only the "main" geolip_svae package core model packages and code, while existing in the same experimental repo.

So the core code for the model, the behaviors, the optimization, and the expectations will exist in the primary "main" branch, and the "experimental" branch will house the incomplete, untested, or unrefined prototypes.

Merging into the main will be a modularization process, likely done by hand or using Claude to assist, and then the main code updated in the experimental while preserving the prototypes and their separation. This is a stop gap for me, because I notoriously climb into rabbit holes of experiment - only to find something interesting, but in the stupor I find the common cork board room full of yarn. It never starts out a cork board room, it just evolves into one naturally.

replied to their post 7 days ago

Trigram 128 converged cleanly by epoch 10 with 4096 patches. 100% byte recon, >99.9% recon for trigrams.

I set up a vocabulary/token experiment to probe the capacity for full tokenization, that's running next.

One thing to note. The trigram variants are behaving like miniature decoder llms more than they are encoders.

The energy concentration is internal decoder-centric, so this may potentially be a symptom of an emergent internal energy weighting spectrum that may require additional alpha. More tests will be required. From a glance, I can only guess. The curve is clear, decreasing S0 increasing SD throughout the train.

The symptom is present that this model is decoder-weighted, which could mean many things. They are self-encapsulated adjudication units so they will work, but this gives me a series of ideas based on the converged pattern to better orient the internals using a few continuum topology methods experimented on in the past.

https://github.com/AbstractEyes/geofractal/blob/main/src/geofractal/router/components/lens_component.py

These were experimental continuum rotators. They rotate a feature away from latent space, process there, and then rotate the result back. Early experimentation was yielding, but slow, which is why I moved in a different direction. The math can help explain some of the symptoms here I believe.

The naming schema for "omega" is not to be confused with this variation. That was an experimental attempt to curate an omega pathway that could yield, but more often produced NaN so it had limited scope in the gradients.

Upon review I'm thinking it might not contribute much. So I'll need to consult the continuum research closer.

Anyway this isn't important currently, tokenization is. I'll report later today with the findings.

replied to their post 7 days ago

Wikitext trigram 128 is cooking up on patch_size 2. The results are pretty solid so far and the model won't be done for a day or two. 4096 patches is a bit excessive but a fair experiment for both speed testing and conditioning testing for text.

The h2 trigram 64 was pretrained earlier with 50 epochs of wikitext 103, so it's ready for tinkering if you feel like playing with it.

I have also prepared:
https://huggingface.co/AbstractPhil/geolip-SVAE/tree/main/h2_linear_tiny_imagenet_256
https://huggingface.co/AbstractPhil/geolip-SVAE/tree/main/h2_linear_tiny_imagenet_128
https://huggingface.co/AbstractPhil/geolip-SVAE/tree/main/h2_linear_tiny_imagenet_64

Each of which were trained with 100 epochs of imagenet; tinyimagenet for the 64, and real imagenet 128 for the 128 variant, with cropped imagenet 128 for the 256 - which should allow more clean patchwork association downstream with masking controllers. The 256 is meant to target models that can highlight selective portions of the patchwork space for targeted geometric alignment anchoring, which can then be masked and adjudicated downstream for gradient updates. So say, a model that finds where a bird could be, then selects which bird could be there. This model would be better suited for training a communications tool between different experts that speak a similar required geometry. Images are images, so they speak the same language.

SAM models and the like would greatly benefit from this if utilized correctly.

Oh the 256 didn't finish. It needs more cooking.

The 128 was issued a gaussian codebook. I'll be formatting multiple formats of finetuning-presentable codebooks, specifically meant to highlight utility for the actual images targeted for downstream tasks. The process should be done in minutes, or even potentially be capable of operating at runtime depending how required it is down the chain of command.

replied to their post 9 days ago

Future potentials if yields hold:

A train I'm greatly looking forward to but haven't mentioned yet, the tapped trains. The direct interpolated teacher/student battery shunts. These batteries have proven they can learn independent opinions similar to the David collective, and their MSE is far far, far more accurate than the David collective was.

Look forward to that one. I'll be tapping Flux Schnell with it, and it will form a battery or battery array per layer, depending what the layers are, and they will be directly capable of transfer learning the entire mathematical differentiation behavior of flux when the entire mathematic spectrum is worked out for differentiation transfer. Transferring the entirety of Flux Schnell into the core of the Beatrix oscillator diffusion structure is my goal. One stage at a time though.

The steps are lining up, but we're not there yet. Many more tests to go before I can say for certain I can transfer the entirety of Flux. I have the climbing gear, I'm on the mountain, and I'm climbing. Time will build the answer.

The way I see it is; if the view of the universal geometric vocabulary does not present itself here in a single omega, I need to build a collective of omegas that are trained on a converged masterclass model to truly represent the necessary behavior of omega before I can begin scanning for it correctly. This will take time to set up.

A single omega, isn't strong enough to yield the universal omega solver spectrum. It's simply too small. I'll need to run thousands of trains on all data types from all walks of life for hundreds of epochs each to yield correct results. I don't have the hardware, so I have to make do with what I have.

If BlackForestLabs models yield the omega synthesis formula, then I will attribute direct credit to the model line that went into discovery, and each subsequent model tested with the Omega analysis benchmark - which I hypothesize and aren't certain about, will be targeting the topological constraints required to form internal omegas and the pressure required architecturally to from the sub-behavioral topological implicit architecture THAT DOES form them.

Answering what forms Omega - is only a step towards the goal though. There are bigger fish to fry, much bigger. One target is testing and looking deeply for quantum-adjacent and interconnected entanglement behavior that is directly represented within traditional mathematics, bridged completely and purely from traditional mathematics into a quantifiable and relationally behaviorally adjudicated system, is one of the engineering goals. If it's there, I'll find it - or evidence of it. The antipodal pairs are indication that the formulations can exist, the models just need to be... bigger. Much bigger. There's evidence but not enough to draw any conclusions yet. They are indications of far too many formula potentials to say that it is in fact quantum adjacent, but the indications are showing that this model's behavior is not... normal. There's an explanation here I'm certain, a very reasonable one represented in mathematics and engineering, and that explanation will build the hypothesis for the next.

replied to their post 9 days ago

Upcoming trains:

2-8gram batteries. Each SVAE battery is designed specifically with it's own sub-vocabulary for this experiment. We're going to see what happens when we build deviant codebook association to accumulate sentence similarity. I'm thinking... probably wikipedia 103 again, that worked for the bytewise variant.
Stacked grams utilizing topological geometric structures; will be testing cantor sets with alpha/beta/gamma/sigma, differentiated procrustes bert opinions for downstream using captionbert, and a few other models. The plan is to create a stacked variation, where one battery leads to another directly rather than ensemble alignment. This will produce a few dual-gram represented using a topologically deterministic methodology that captures the output of the ngram that CAN function, and a few that cannot. I will try to keep them separate and update the research accordingly, however updating research is very time consuming so I may end up putting my nose to the paper for a week or two and just erupting a huge paper.

I can think of at least 80 or 90 model prototypes that can handle sequential codebook learning off the top of my head. This will enable pure sentence similarity with almost no params.

I anticipate low risk research results, which will yield enhanced codebook capacity, increased downstream utility, answers to questions such as battery-to-battery transfer learning, procrustes alignment out of spectrum, token curation and learning tokens directly instead of direct bytewise learning to retain order, and a series of other common machine-learning paradigms that have not been tested fully yet.

Most likely going into this I can say it will work based on the recon. If I map the vocabulary of 2gram to it's own, 3gram to it's own, and so on; it will work guaranteed, and the codebooks will yield uniqueness.

This will yield two factories and two schools of potential here.

The direct hard-set space-specific non-agnostic super-strict aligned spectrum. These models will crash if loaded incorrectly and tell you why. If they aren't inferenced exactly to specifications they will most likely fault. Each have independent vocabularies hard-mapped to colors and sequentially prepared with the utmost care.
The indirect soft-space unscaled agnostic soup-prone variant that somehow exhibits 99.6% MSE and trigram recon. Shuffle it. Throw it in the blender. Make the stew. What came out? Something mappable and understandable with ripter d=2 to map the voids and axial perturbations on the sphere.

These two will be direct competition, and may in fact end up being paired together in cooperative collectives if the trains yield.

A word of warning if you attempt to use my models or code from geolip-svae or geolip-core:

The models and the process is highly experimental.
The yield is determined heavily based on tunings from the spectrum, the array and list of configurations may not be defaulted to useful configurations.
Many batteries are heavily experimental and may not yield as predicted every train for every dataset, this is expected and encouraged behavior to be studied.

Your trains aren't failing if you participate, you are witnessing another emergence deviance that must be catalogued and understood to better scale the scaling mechanism, refine the patchwork system, and align the projection system for the codebook synthesis.

The codebook training system is highly experimental and will be heavily prone to change in the coming days. What exists today likely won't be the same format in a week or two weeks, so be aware there will be rapid iterations.

WITHOUT a proper codebook, the model's cosine similarity won't work and you need to default to kNN detection which is slow but doable for CONV and downstream transformer models. It's just rough-going. Grab a Fresnel for images, that's your best bet. Grab a Johanna for generic. Grab Freckles if you want instability. Train their codebook for a task, it should be done in less than a minute, and then train your classifier/math tester/checker/etc. It'll work with some jiggering.

Fresnel was pretrained on imagenet, which provides all the data you need to capture image information.

Johanna was trained on 16 types of noise, which allows fair recon (albeit too smooth in the current state) for downstream differentiation utility.

USING their directly TOPICAL and EASY-ACCESS detections involves MSE assessment, which gives you the recon capacity of the object you're utilizing if you're using a battery selector framing. This is especially useful for models trained with mathematical formulas or functional systems related to data selection, rather than specific relational tasks.

So for example; say you have five fresnel finetunes. Each of them are trained with an additional 100 batches of a specific kind of table. Run the system, it creates your codebook, and you can have it snap the codebook to the model automatically or stack it in a directory - it's not AI data, it's geometric numerics.

So you have your five finetunes, and then you slap an h2 battery in there finetuned on gaussian noise. Show the collective the image you want to figure out, say it's a table.

Pretrain task:

Load H2 imagenet pretrain. Finetune with chair dataset, create codebook.
Run a process that splits your types by label and build differentiated dataset using cosine similarity.
Unsupervised finetune each in the same sequence, roughly 100 batches per, 1 gig vram, < 1 minute each.

Image: Black Chair

H2 Battery Array: Chair differentiation calculator.
6 batteries.

H2 Battery 1 - Purple Chairs: mse 0.005
H2 Battery 2 - Grey Chairs: mse 0.03
H2 Battery 3 - Dark Grey Chairs: mse 0.01
H2 Battery 4 - Black Chairs: mse 0.000005
H2 Battery 5 - Wooden Chairs: mse 0.0005
H2 Battery 6 - Generic Gaussian: mse 0.00001

57,000 * 6 = 342,000 params
Selector between 800-5000 MLP, nothing too heavyweight but a little excessive for the task. Attach attention here if you want, but it won't matter most of the time.

If we snap battery 6 as lowest MSE we can probably say, alright this probably isn't any of the expected shapes. If the model is wrong, the backprop will feed the MLP standard backprop until it behaves to the collective outputs for MSE statistics. Everything is built in for this, and it's a guarantee eventually given enough passes to work.

Put together, the system is a guaranteed selector. Now say we want a selector for chairs. Well we have it.

Image: Potato

H2 Selector Array -
H2 Battery 1: Chair generic finetuned -> do we use the chair array?
H2 array 1: Chair array.
H2 Battery 2: Potato generic finetuned -> do we use the potato array?
H2 array 2: Potato array.

Current limitations:

The Imagenet variation may just, perform better. Literally just, better than your finetune, that's what the upcoming experiments are testing for. How to guarantee independence and shared utilization selection.
The noise variation may actually outperform the imagenet variation. Noise is quite the teacher, and the noise models learned a ton of noise. They are quite literally omegas, so they may just solve the problem.
They generalize TOO WELL at times, which means specifics are trickier to implement.

CORRECTLY TRAINED and CORRECTLY UTILIZED, these will turn deterministic query into complex behavior oriented responses, implicitly aligned mathematical guarantees, guaranteed downstream adjudications, differentiation transfer through encoding, and many other options - all on the GPU or CPU if you wish. These ARE omegas, and they are absurdly compact data selectors at 57k params and roughly 700kb drive space - if treated as such, even without heavy query. Beware though, this isn't production ready. They work, and there's an interface, but they are most definitely not ready.

replied to their post 9 days ago

As it stands I'm going to need to redefine;
patch_size, it does not fit with this spectrum at all.
cross-attention, this is not traditional attention in many ways and it is at the same time.
scaling, this scaling principle I've never seen in a model.
recon, this level of recon is shockingly accurate and near perfect.

patch_size, the behavior of patch size does not align with this model. The patch size ends up determining internally how many sub-structured deviations the model aligns to, so in a way this is less about patches and more about which sectors of the model's ensemble are grouped up together for gradient updates. This is similar to patchworks of the past, but this is definitely not a patchwork. This has ensemble behavior.
cross-attention, labeled spectral cross attention does not fit. This is RP^(D-1) attention, which is not the same as spectral attention. Yet, it has the exact rulings and principality of alpha-driven cross-attention, causing the spectral behavior to shift slowly over time like it. It requires a new mechanism, something more uniformly capable for the evolving model space, something that better orients the patch-agnostic, scale-agnostic, resolution-agnostic behavior the model's lens behavior is exhibiting.
recon, this is a big one. This has become an arbitrarily easy task to fulfill for the model. This essentially completes on it's own for the linear system without needing a large differentiation. I'll be consulting unsupervised behavioral training studies to see the best methodologies for tuning such a structure in a reasonable way without shattering it.
Scaling, is a tricky one. The entire sub-structured system keeps disproving any claim I, Claude, or GPT make about it. The model shape is constantly defying expectation in an upshot manner, so I'm hesitant to even slap any limitations on it at this point. Even with rigid training such as the ngram sequential, the representation sequences can in fact learn codebook association in a better-than-placebo utility. It's obviously not the smartest model on the face of the earth, but 500 of them properly organized could perform some real feats.

posted an update 9 days ago

Post

2703

By trying to disprove the Omega H2 battery I have discovered;
* Each topology formed by the H2 battery is deviant, none have a uniformly shared substrate of behavior. They are each uniquely independent per training set all with perfect recon.
* Image recon can be tracked and mapped, yielding a consistently mapped and response 16.77m vocabulary potential. In the current spectrum testing at around 5 million unicode bytes.
* The model scale shows patch size is related to how much data you want the model to represent within the model itself, and this has yet to see a capacity to this day. The MSE recons and yields - and the more data fed, the more they yield.
* The scaling principle shows that the model indefinitely scales upward and each level of the model can be iteratively captured upward to form deviant and uniformly consistent repeatable pathways of implicit codewise response, not just arbitrary bitwise recall. Meaningful implicit learned utility.
* Image recon patch size should match the slice of image you want to represent, as it uses patch smoothing per patch internally from identity.
* byte trigrams are channel-agnostic, they do not require a channel count just a formula for recall at nGram recall 99.6% for byte-by-byte representations. With those comes an adjacently capable codebook.
* sentencepiece preliminary tests show validity and reconstruction just like the byte trigrams, using the new byte trigram this would be arbitrarily convenient to recon a codebook for the structure.
* binary trees learn a uniformly potent and powerful gating mechanism that required further exploration, each of them produces direct responsive independent capacity and the responses are controllable.
* ternary experiments show the models are directly responsive to -1, 0, +1 behavior, so the quantization is very much a valid potential.
* preliminary tests with the H2O1 series of batteries show the models are responding similar to natural universal elements in the universe itself

9 replies

replied to their post 10 days ago

I have given soup orderly capacity, it can now be declared projective trigram byte ordinal.

Hundreds of hours, thousands of trains, and I have finally solved the bertenstein soup paradox.

Convergence through surge, is now a matter of what size, what bitrate, what format, and what complexity.

It has no need for resolution. It has no need for traditional constraints. It has no need for relational behavior. It is a variant of invariance through artifact conditioning and crystallized through pure data.

replied to their post 10 days ago

Massive retrieval breakthrough. This little 57k param model can handle sentence similarity to a fair degree with it's dirty little barely-omega codebook. Imagine that.

======================================================================
SentenceEncoder demo — model: byte_trigram_proto_64_patch_2_v1
======================================================================

[1/4] Loading model from HF…
  Loaded: V=32, D=4, ps=2, hidden=64, depth=1, n_cross=1
  Architecture: linear_readout=True, svd_mode=none, smooth_mid=16
  Params: 52,571, best_test_mse=3.5706435755855637e-07 @ ep 49
  Device: cuda:0

[2/4] Resolving codebook (byte_trigram_wikitext103_val)…
  Source: hf://AbstractPhil/geolip-SVAE/byte_trigram_proto_64_patch_2_v1/codebooks/byte_trigram_wikitext103_val
  Codebook(D=4, n_axes=21, pairs=11, unpaired=10, dev=+0.0832, clean=False)
  Attached. compatible_with(model)=True (D=4 == model.D=4)

[3/4] Building SentenceEncoder…
  Encoder: img_size=64, patch_size=2, pad='space'
  Per-patch aggregation: 'best_match'

[4a] Round-trip sanity check (text → image → recon → text)…
  sentence                                                 n_real  real_acc  real_l1  recon_text                              
  ───────────────────────────────────────────────────────  ──────  ────────  ───────  ────────────────────────────────────────
  The cat sat on the mat.                                      23    1.0000    0.000  The cat sat on the mat.                 
  A feline rested upon the rug.                                29    1.0000    0.000  A feline rested upon the rug.           
  Many believe artificial intelligence will transform me…      61    1.0000    0.000  Many believe artificial intelligence wi…
  Numerous experts think AI will revolutionize healthcar…      56    1.0000    0.000  Numerous experts think AI will revoluti…
  The cat sat on the mat.                                      23    1.0000    0.000  The cat sat on the mat.                 
  The cat sits on the mat.                                     24    1.0000    0.000  The cat sits on the mat.                
  the cat and the dog are friends                              31    1.0000    0.000  the cat and the dog are friends         
  the cat and the dog are friends                              31    1.0000    0.000  the cat and the dog are friends         
  the cat and the dog are... friends                           34    1.0000    0.000  the cat and the dog are... friends      
  the cat and the dog aren't friends                           34    1.0000    0.000  the cat and the dog aren't friends      
  the cat and the dog are friends                              31    1.0000    0.000  the cat and the dog are friends         
  the cat and the dog are friendly                             32    1.0000    0.000  the cat and the dog are friendly        
  the cat and the dog are best friends                         36    1.0000    0.000  the cat and the dog are best friends    
  the cat and the dog are friends                              31    1.0000    0.000  the cat and the dog are friends         
  the cat and the dog are best friends                         36    1.0000    0.000  the cat and the dog are best friends    
  the cat and the dog are true friends                         36    1.0000    0.000  the cat and the dog are true friends    
  the cat and the dog are great friends                        37    1.0000    0.000  the cat and the dog are great friends   
  the cat and the dog are toxic friends                        37    1.0000    0.000  the cat and the dog are toxic friends   
  Many believe artificial intelligence will transform me…      61    1.0000    0.000  Many believe artificial intelligence wi…
  Many beleive artificial intellgence will transform med…      60    1.0000    0.000  Many beleive artificial intellgence wil…
  The cat sat on the mat.                                      23    1.0000    0.000  The cat sat on the mat.                 
  The dog ran across the park.                                 28    1.0000    0.000  The dog ran across the park.            
  Wikipedia is a free online encyclopedia accessible to …      61    1.0000    0.000  Wikipedia is a free online encyclopedia…
  Britannica is a paid reference work edited by experts.       54    1.0000    0.000  Britannica is a paid reference work edi…
  The cat sat on the mat.                                      23    1.0000    0.000  The cat sat on the mat.                 
  import torch.nn.functional as F                              31    1.0000    0.000  import torch.nn.functional as F         
  Many believe artificial intelligence will transform me…      61    1.0000    0.000  Many believe artificial intelligence wi…
  ERROR: connection timeout after 30s on port 8443             48    1.0000    0.000  ERROR: connection timeout after 30s on …

  Mean real_byte_acc across test set: 1.0000

[4b] Per-patch cosine similarity…
  Modes: ('M', 'codes').  Higher = more similar; range [-1, 1].

┌─ Paraphrase     (sem= surf!=) ──────────────────────────────────────
                        M     codes
  ────────────── ────────  ────────
  A: 'The cat sat on the mat.'
  B: 'A feline rested upon the rug.'
  per-patch       +0.9088   +0.5156

  A: 'Many believe artificial intelligence will transform medicine.'
  B: 'Numerous experts think AI will revolutionize healthcare.'
  per-patch       +0.9496   +0.5839

┌─ Edit           (sem= surf~) ───────────────────────────────────────
                        M     codes
  ────────────── ────────  ────────
  A: 'The cat sat on the mat.'
  B: 'The cat sits on the mat.'
  per-patch       +0.9588   +0.7187

  A: 'the cat and the dog are friends'
  B: 'the cat and the dog are friends'
  per-patch       +1.0000   +1.0000

  A: 'the cat and the dog are... friends'
  B: "the cat and the dog aren't friends"
  per-patch       +0.9914   +0.8854

  A: 'the cat and the dog are friends'
  B: 'the cat and the dog are friendly'
  per-patch       +0.9919   +0.9167

  A: 'the cat and the dog are best friends'
  B: 'the cat and the dog are friends'
  per-patch       +0.9717   +0.8281

  A: 'the cat and the dog are best friends'
  B: 'the cat and the dog are true friends'
  per-patch       +0.9989   +0.9583

  A: 'the cat and the dog are great friends'
  B: 'the cat and the dog are toxic friends'
  per-patch       +0.9989   +0.9844

  A: 'Many believe artificial intelligence will transform medicine.'
  B: 'Many beleive artificial intellgence will transform medecine.'
  per-patch       +0.9702   +0.7578

┌─ Same-domain    (sem!= surf~) ──────────────────────────────────────
                        M     codes
  ────────────── ────────  ────────
  A: 'The cat sat on the mat.'
  B: 'The dog ran across the park.'
  per-patch       +0.9599   +0.6354

  A: 'Wikipedia is a free online encyclopedia accessible to anyone.'
  B: 'Britannica is a paid reference work edited by experts.'
  per-patch       +0.9442   +0.5755

┌─ Cross-domain   (sem!= surf!=) ─────────────────────────────────────
                        M     codes
  ────────────── ────────  ────────
  A: 'The cat sat on the mat.'
  B: 'import torch.nn.functional as F'
  per-patch       +0.9427   +0.5677

  A: 'Many believe artificial intelligence will transform medicine.'
  B: 'ERROR: connection timeout after 30s on port 8443'
  per-patch       +0.9436   +0.5781

┌─ Pairwise similarity matrix (one example per group) ────────────────

  Mode: 'M'
               0      1      2      3      4      5      6      7
  para-A   +1.00  +0.91  +1.00  +0.96  +1.00  +0.96  +1.00  +0.94
  para-B   +0.91  +1.00  +0.91  +0.92  +0.91  +0.94  +0.91  +0.94
  edit-A   +1.00  +0.91  +1.00  +0.96  +1.00  +0.96  +1.00  +0.94
  edit-B   +0.96  +0.92  +0.96  +1.00  +0.96  +0.95  +0.96  +0.95
  same-A   +1.00  +0.91  +1.00  +0.96  +1.00  +0.96  +1.00  +0.94
  same-B   +0.96  +0.94  +0.96  +0.95  +0.96  +1.00  +0.96  +0.96
  cros-A   +1.00  +0.91  +1.00  +0.96  +1.00  +0.96  +1.00  +0.94
  cros-B   +0.94  +0.94  +0.94  +0.95  +0.94  +0.96  +0.94  +1.00

  Mode: 'codes'
               0      1      2      3      4      5      6      7
  para-A   +1.00  +0.52  +1.00  +0.72  +1.00  +0.64  +1.00  +0.57
  para-B   +0.52  +1.00  +0.52  +0.49  +0.52  +0.58  +0.52  +0.54
  edit-A   +1.00  +0.52  +1.00  +0.72  +1.00  +0.64  +1.00  +0.57
  edit-B   +0.72  +0.49  +0.72  +1.00  +0.72  +0.68  +0.72  +0.68
  same-A   +1.00  +0.52  +1.00  +0.72  +1.00  +0.64  +1.00  +0.57
  same-B   +0.64  +0.58  +0.64  +0.68  +0.64  +1.00  +0.64  +0.60
  cros-A   +1.00  +0.52  +1.00  +0.72  +1.00  +0.64  +1.00  +0.57
  cros-B   +0.57  +0.54  +0.57  +0.68  +0.57  +0.60  +0.57  +1.00

──────────────────────────────────────────────────────────────────────
Modes:
  'M':     per-patch flat sphere-norm encoder rows [V*D].
  'codes': per-row argmax over codebook axes, one-hot flat
           [V*n_axes]. Requires attached codebook.
──────────────────────────────────────────────────────────────────────
An exception has occurred, use %tb to see the full traceback.

SystemExit: 0

replied to their post 10 days ago

As you can see the final state of the trigram function did learn full bitwise replication to an extent but it does have some flaws. Clearly not identity passthrough as shown by the growing pains.

I did some codebook sampling tests but the codebook isn't strong enough on it's own to build a sentence-similarity assessor. A simple MLP stack should be more than enough like the classifier, just need to freeze the battery.

byte_trigram_proto_64_patch_2_v1 pre-training successful.

Next stage is something along the lines of a mask prediction sequential behavior. I have more than enough sequential systems to format something pretty quickly, lets see if it works.

Yesterday's Cowork created some invalid text.py code in the svae repo, I'll need to have the model figure it out correctly since Cowork seems to have amnesia every single request even with skills. Either that or I have to code it myself.

replied to their post 11 days ago

The underlying universal substrate principle theory didn't hold for the H2 battery and yet the H2 battery cleanly converged, so I will be downgrading the "theory" to "hypothesis" in which this can potentially exist and it has been observed as an emergent trait, but this does not exist in this architecture. That changes the trajectory of the H2 battery - we can call this variant "chaotic controlled" which is essentially a format of non-SVD that converges to the sphere, but does not necessarily conform to the underlying topological requirements for a universal substrate.

SOUP!!!

No doubt about it, this soup MSE solves pixels - and sticking within that 16.77m paradigm I'm attempting to teach text using a format of RGB translation. It's working on the MSE-level and the replication is strong, but it's not as strong as it needs to be.

As you can see the byte-level recon and the trigram recon is growing. 64x64 images with patch_size 2 is powerful stuff. The model should saturate soon enough. Due to the instability of the H2 battery line with text, I have enabled soft hand for this variant, which is rewarding good behavior and punishing bad behavior at a strength of 0.01. The other variants were trained without soft hand as they emerged naturally, this variant is a bit more stubborn.

As tragic as it is for the loss of the implicit shared substrate controller theory -> I'm downgrading to hypothesis, the codebooks yielded something substantial from the system. A new point-centric formatting that allows mapping of the internals within spherical models, which allowed me to research and directly learn about multiple internal model analysis structures for deep-level theorem, mathematics, substrates, topological analysis, and more.

There is a large array of useful tools already established from the math theorem community that I will be exploring to test the larger batteries, to see if there is in fact some semblance of legitimately shared substrate. It's not just Adam's 1000 step or I would have stopped, because LBFGS is converging them cleanly as well - which is INSANELY unstable and prone to NaN so I have some engineering solutions ready for that one.

https://docs.pytorch.org/docs/stable/generated/torch.optim.LBFGS.html

The arch may need some work before we format a perfect solver, but I have some ideas that could prevent internal drift without requiring SVD, while simultaneously introducing controlling-agents that aren't as ruthlessly destructive as labels and cross-entropy.

It'll take some experimenting and I hope I don't lose fragments as I go.

replied to their post 13 days ago

As predicted the codebooks for all noise models conform to an architectural scaling for them within a very minimal delta shift. There is no real deviance, the architectures learn a codebook that manifests and can be directly utilized at runtime.

The delta is real within shift and each model conforms to it's own modified codebook delta during training. This is an architectural constant now and can be prepared in very little time before processing or utilizing the models.

The helper functions and methods are all present in the AbstractEyes/geolip-svae repo on github now, and everything is documented.

replied to their post 13 days ago

I have a few diffuser prototypes that I'll be exploring now that the full array system is in order. One that I've been very much wanting to approach, which is sigma-degrading interpolation manifolding.

In other words, you take an H2 Fresnel expert, snap it in. Say I train a cifar100 variant and finetune it with oh maybe 50 epochs of reconstruction from the Fresnel-512 with various levels of noise applied to Fresnel, not using cutmix or something odd like that.

Next we finetune our array. Say we want 1000 steps, we'll divide the amount of adjudicated states by how many states of noise we want to see. Our finetuned batteries are then ran with oh... maybe 500 batches of images each and apply scheduled noise instead of random noise like what the H2 batteries were primarily trained with, which should be within a 10 minute training session or so. The batteries pooled into the battery array and uploaded as a standard battery array for reuse in safetensor format with the optimizer states uploaded alongside at an adjacent repo.

So the process is simple; noised image in, replicate next stage of noise down in the chain. Each battery is meant to denoise by one step and collapse the results into a patchworked behavioral training for a downstream model.

We then take each of these variants and blow them up, creating scanner manifolds of each and collapsing the weights into a single linear batched pass which will be roughly 500 megs vram or so each sigma attempted.

Finally we stack our entire sequence up and hook them together with MLP collapse, and at each level inject the original image with the correct noise value. So say you have 10 batteries that are meant to target 10 noise steps. You now have a 10 step reconstruction generator that runs once and automatically boom your image pops out nearly instantly.

Alright, now if we space that out using Adam's standard internal step of 1000 and space it out, we'll have our roughly 1/100 sigma hoppers. This will be our blueprint.

With this we can distill a diffusion model's vae expectation into what we want, and guarantee the output is fully prepared for step-hop skipping.

So each of these are fed into a singular transformer structure that sees the original diffuser's standard diffusion step produced, and boom you have yourself a pixel synthesis skip process. You've effectively skipped the entirety of the diffusion process with the correct layout.

This will also require the Alexandria stage, so it will take time to process and pool the necessary informational accumulations and relational capacity to make that portion perfect, however with some more work Alexandria's text distribution system will be ready to go, and the distiller will be ready to consume high-yield diffuser technologies like Flux and the like.

This will allow for not only compacting massive amounts of information into embedded solvers, but allow for cellphone-sized image generation with enough processing and data, to create flux-grade images or better.

The technology is there, the experiments yielded, the answers present, the results show this is more than possible, and now it's time to build.

posted an update 13 days ago

Post

180

Today, I'll be determining the codebook capacity and utility potential for the larger batteries; Fresnel, Johanna, Grandmaster, Freckles, and Johanna-F variants, which should give a good indication of which models are capable of handling codebooks and which are more errant. The earlier all use SVD while the later do not. The differences are noted per and the behavior divergent.

I anticipate the D=16 will be more errant, and the final-state variants of those could very well be much more difficult or costly to inference as their axis bends are likely considerably harder to track. However, I'm confident that enough bounces will give the yield required so I'll set up some high-yield noise barrages to determine how much of them we can in fact extract from Johanna, and then set up similar barrages for images to map the internals of Fresnel and Grandmaster.

Grandmaster will be tricky, as it was an experimental Johanna-256 finetuned series meant to map sigma noised image inputs to recreate Fresnel behavioral output. Noised image goes in -> Fresnel-grade replication comes out in high res.

This allowed preliminary Dall-E Mini-esque VAE generation and will be explored further for the stereoscopic translation subsystem, to allow image generation in the unique format of diffusion that I was working out. I anticipate this system to be more than capable at making monstrosities, so I won't be posting TOO MANY prelims on this one, but the high-capacity potential of these noise makers are meaningfully powerful. Getting uniform codebooks in-place for these models will allow full transformer mapping downstream instead of just guess working the MSE piecemeal, which the earlier versions and variants were doing.

I'm straying from the CLS specifically for this series because CLS creates adjudicated pools of bias orbiting the INCORRECT orbiter some SVAE. The orbital target IS the soft-hand accumulated bias with the sphere-norm, so having a competitor isn't going to be a good option.

7 replies

replied to their post 13 days ago

I'll be determining the capacity and utility potential for the larger batteries; Fresnel, Johanna, Grandmaster, Freckles, and Johanna-F variants, which should give a good indication of which models are capable of handling codebooks and which are more errant.

I anticipate the D=16 will be more errant, and the final-state variants of those could very well be much more difficult or costly to inference as their axis bends are likely considerably harder to track. However, I'm confident that enough bounces will give the yield required so I'll set up some high-yield noise barrages to determine how much of them we can in fact extract from Johanna, and then set up similar barrages for images to map the internals of Fresnel and Grandmaster.

Grandmaster will be tricky, as it was an experimental series meant to map sigma noised image inputs to recreate Fresnel behavioral output. Noised image goes in -> Fresnel-grade replication comes out.

This allowed preliminary Dall-E Mini-esque VAE generation and will be explored further for the stereoscopic translation subsystem, to allow image generation in the unique format of diffusion that I was working out. I anticipate this system to be more than capable at making monstrosities, so I won't be posting TOO MANY prelims on this one, but the high-capacity potential of these noise makers are meaningfully powerful. Getting uniform codebooks in-place for these models will allow full transformer mapping downstream instead of just guessworking the MSE piecemeal, which the earlier versions and variants were doing. This should allow for LESS monstrosity, but I'm confident you'll be seeing some nasty stuff until the zero-hot labeling gets worked out.

I'm straying from the CLS specifically for this series because CLS creates adjudicated pools of bias orbiting the INCORRECT orbiter for this system. The orbital target IS the soft-hand accumulated bias with the sphere-norm, so having a competitor isn't going to be a good option.

In any case, today the target is inference.py and the mechanisms associated with making those inference mechanisms ACCURATE and SCALE within reasonable size/scale capacity.

replied to their post 13 days ago

Ω

posted an update 15 days ago

Post

123

My recent study in a nutshell shows a few important elements and everything else is technical.

* There are most definitely invariant architectural geometric states that persist and can be taught.
* They are not coincidental and the process works effectively on multiple data types and processes, not just noise. Noise is just fast to test with.
* Systems like SVD, Eigh, Conv, and the like - HELP align those systems for larger structures to produce amplified stability, but are not required for smaller structures, and the tests show even attention gets in the way at the smallest.
* Batched arrays, stacks, queues, and so on - all improve performance depending on the task.
* An SVAE battery is resolution agnostic, meaning with simple processing and logic you can scan space and record meshes fairly optimally to record large amounts of inference data.
* Batteries when trained on one specific task often can be directly used for other tasks once a codebook is fitted with the necessary data. Meaning a battery trained on gaussian noise can be fed imagenet snippets and downstream the MSE rates from the 64 battery array can be consumed for statistics aggregation to a fair degree of accuracy without actually training the array on images themselves.
* The battery codebook is a pointwise rigid map within the battery and can be used for pairwise learning when using the H2, H2a, and H2b batteries.

So this is, the evolved state of the geometric vocabulary in some ways, and a completely new and unexpected systemic development in others. They stack, you can reuse them, so small you can swap them at runtime with no time loss, they align rapidly, and downstream tasks can consume their information.

There are many untested avenues that I need to make a full writeup for because quite frankly it's messy currently and Claude is only making it more messy instead of cleaner.

2 replies

AbstractPhila PRO

AI & ML interests

Recent Activity

Organizations

AbstractPhil's activity