AI Mole
Published on

GPT-4 - What the new neural network has learned and why it is a little creepy

Authors
  • avatar

In this article, we'll break down the amazing new abilities of the latest language model from the GPT family (from understanding memes to programming), dig a little under the hood, and try to understand - how close is artificial intelligence to the line of its safe use?

Yes, it's finally happening! OpenAI chose Pi Day (March 14) to share the release of their new product with the public. GPT-4 is the new flagship Large Language Model, or LLM, which replaces GPT-3, GPT-3.5, and the much-talked-about ChatGPT. Below we'll discuss the key changes from previous generations, review some of the most interesting use cases, and talk about OpenAI's new policy on openness and security.

Seeing the world through the eyes of a robot The most interesting change that immediately strikes the eye in GPT-4 is the addition of a second type of data that the model can receive as input. Now, in addition to texts, it can be fed images, not even one at a time, but a bunch of them at once! However, it still outputs only text: you can't even count on any generation of images, sounds or, especially, video (which was rumored and allegedly "leaked" recently). At the same time, access to the model for the general masses of users is still limited exclusively to text prompts, and work with pictures is in the testing and running-in stage.

What possibilities does this "epiphany" of GPT-4 open up? For example, you can put a picture into a model and ask it some question related to the objects drawn there. The neural network will try to understand both the visual data and the textual prompt at once and will give its answer. How GPT4 works
You could also give GPT-4 some graph and have her analyze it. Or make her do a visual puzzle from an IQ test. And the most fiery cherry on the cake: the model can explain a meme to you! GPT4 explains meme

Both image-based question answering and the general principle of working with pictures already existed before the release of GPT-4 - such models are called "multimodal", because they can work with two or more modalities at once (text, pictures, and in some cases - even sound or 3D models). But at the same time, the new GPT-4 beats almost all specialized and narrowly focused image-based question answering systems in a wide variety of tasks (its results are better in 6 out of 8 tested datasets, often by more than 10%).

And here's another screenshot from a roof-raising demonstration at the OpenAI webcast, where a hand-drawn sketch of a website in a notepad turns into a real website literally in an instant. These are the wonders of multimodality! In this case, the model writes the code for the site, and then it runs in the browser.

GPT-4 has finally rolled into programming (your course integration could be here) Just how much GPT-4's programming skills have evolved relative to ChatGPT we have yet to find out - but already in the first two days, enthusiasts and twitterers have churned out a bunch of interesting crafts. Many users are excited about the fact that you can give GPT-4 a top-level description of a simple application and it will produce working code that does exactly what you need.

In 20 minutes, for example, you can make an application that recommends five new movies every day (with working links to trailers and viewing services).

It is quite likely, by the way, that the code generated by the model will not work the first time - and you will see errors when compiling. But that's no problem: you can just copy the error text into the dialog with GPT-4 and tell it to "look, just do it normally already, eh?" and it will really apologize and fix everything! - and it will really apologize and fix everything! So you can get to the stage of a working application from the gif above in literally 3-4 iterations. GPT4 programming capacity

In addition to all sorts of useful applications, GPT-4 is also capable of running games: skillful people have already made it pile up classic Pong, Snake, Tetris, Go, as well as a platformer and a game of "life". It's clear that these are the most mainstream and popular projects, which on the one hand are easy to write, but on the other hand they are still full-fledged demos. ChatGPT did something similar, but GPT-4 has much less errors, and even a person with no programming skills at all can create something workable in an hour or two.

Benchmarking the robot vs human Since our model is so good at simple programming, we would like to try to estimate the general level of its skills and knowledge more adequately. But first, let's try to understand: how should we approach the assessment of knowledge and "smartness" of the model? In the past, we used to use special benchmarks for this purpose (sets of tasks, questions with labeled answers, pictures/graphs with tasks, and so on). But there is one problem here - the development of technologies is getting faster and faster, and benchmarks can't really keep up with this development. GPT4 benchmark

In the early 2000s and 2010s, once a dataset was created, it took 5+ years for "robots" to reach the bar set by humans. By the end of the last decade, some benchmarks that were specifically created with the realization that they were beyond the capabilities of neurons were closing in less than a year. Note the graph above: the lines are getting more and more vertical - that is, the interval from the publication of a method for assessing ability to the point at which models achieve human-level results is shrinking.

OpenAI went further in this competition between leather bags and tin cans, they asked themselves: why should we try to create some special tests for a model if we want it to be as smart as a human? Let's just take real-world exams that people in different fields take and evaluate them! The results for you and me (we hope this article is read mostly by humans, not language models) are rather disappointing, to be honest:

GPT4 histogram

The graph above shows more than 20 real exams in various subjects, from international law to chemistry. And the comparison here is not with randoms, but with people who actually prepared for these exams! Yes, in a small part of tests the model is still worse than the specialists, and shows itself no better than 30% of people who came to the real test. However, tomorrow the model can become, for example, your legal consultant - because this exam (as well as a number of others) she passed better than 90% of people, strongly exceeding the passing threshold. It turns out that people spend more than five years, cram hard, stay up all night, pay a lot of money for education - and the model still beats them!

It makes you think of two things:

  1. In some industries, the model can already act as a full-fledged assistant. It is not yet an autonomous worker, but rather an assistant that increases people's efficiency, prompts and guides them. If a person can forget about some obscure law from the 18th century, which is almost never used in court practice, the model will remind about it and offer to read it - if it is relevant, of course. Such assistants should start appearing as early as this year.

  2. We URGENTLY need education reform as early as 2023 - both in the methods of teaching skills and transferring information from teachers, and in the acceptance of knowledge in exams.

Just in case for skeptics, let us clarify: the model was trained on data until September 2021 (i.e., GPT-4 doesn't know that Ilon Musk bought Twitter in its entirety yet - you can surprise it with this fact if you want!). And to check OpenAI used the latest publicly available tests (in the case of Olympiads and free-response questions - common in the U.S. Advanced Placement Exams) or purchased fresh collections of practice exams for the 2022-2023 exams. There was no specific model training on the data for these exams.

For most of the exams, the percentage of questions that the model has already seen during practice is very small (less than 10%) - and for the Bar exam, for example, it is 0% (i.e., the model has not seen even a single question that is merely similar beforehand, much less knows the answers). And the graph above shows the results achieved after the researchers threw out all the questions the model was already familiar with - so the comparison was as fair as possible.

Multilingualism and knowledge transfer

It's already getting a little scary, isn't it? Continuing the topic of model evaluation, I would like to note that not all benchmarks have been beaten, and since 2020, new multi-task evaluation methods are being actively developed. An example is MMLU (Massive Multi-task Language Understanding), which collects questions from a very wide range of topics on language understanding in different tasks. There are 57 domains inside - math, biology, law, social sciences and humanities, and so on. For each question there are 4 answer choices, only one of which is correct. That is, random guessing will show a result of about 25% correct answers.

A data partitioner (an ordinary laborer who once fell for the advertisement "get into IT and earn money just by answering questions") has an average accuracy of ~35%. It is difficult to estimate the accuracy of experts, because the questions are very different - however, if you find an expert for each specific area, on average, they collectively solve about 90% of the problems in all categories.

Before the release of GPT-4, the best indicator was Google's model - 69%, nice! But just beating this result for the OpenAI team is such an achievement (you could say it would be expected). So they decided to add one more variable to this "equation" - language.

Here's the thing: all the tasks for the 57 topics, as well as the answers to them, are written in English. Most of the materials on the Internet on which the model is trained are also written in English - so it wouldn't be all that surprising that GPT-4 answers correctly. But what if we run the questions and answers through a translator into less popular languages, including the very rare ones where there are no more than 2-3 million speakers in the world, and try to evaluate the model? Would it work in any sane way?

Yes. No, yes! GPT-4 works better in 24 of the 26 languages tested than GPT-3.5 did in its native English. Even in Welsh (a language from the Brittonic group, spoken by only about 600 people) it performs better than any previous model that has worked with English!

It should be realized that the quality is also affected by the model-translator, because it too is limited by the available data, and the quality of the translation suffers. It may turn out that the translation loses the meaning of the question, or the correct answer loses an important detail that makes it wrong. And even with these inputs, GPT-4 still breaks!

In a sense, we observe knowledge transfer within the model from one language to another (it is unlikely that much material about machine learning, quantum physics and other complex topics is available in Welsh), when in the training sample the model has seen mention of something in German or English, but calmly applies the knowledge and answers in Thai. Very roughly we can say that this is a proof-of-concept of what is called "knowledge transfer". It is a weak analogy to how a person, for example, can see a bird flying in the sky and come up with the concept of an airplane - by transferring analogies from biology and the environment to engineering.

Okay, but where will all this be used in the end?

So, we already understand - the model is all so great, cool, but what are its real world and business applications (not just to play around with)? Well, everything is clear with Microsoft and their built-in Bing search engine-helper, but apart from that?

Even before the release of GPT-4, amidst the hype surrounding ChatGPT, several companies announced integrations. These include Snapchat with their friendly chatbot, always ready to chat (the most clear and simple scenario), and Instacart's cooking assistant, which will suggest recipes with ingredients and obligingly offer to add them to the cart - with delivery by the evening.

Much more important, we see apps that improve education. If you think about it, such an assistant won't get tired of answering questions on a hackneyed topic that a student doesn't understand, won't get tired of repeating a rule over and over again, and so on. OpenAI agrees with us: they have accepted into their startup gas pedal and invested in Speak, which is developing a product that helps students learn English.

Duolingo is not lagging behind - the demonic green owl announced at the GPT-4 release that the product will have two new features: a role-playing game (a conversation partner on different topics), and a smart error explainer that prompts and clarifies rules with which the student has problems.

GPT-4 is also coming to the aid of visually impaired people by expanding and improving the functionality of the Be My Eyes app. Previously, volunteers received photos from visually impaired people and commented on what they showed, as well as answering questions like "where's my wallet? I can't see where I put it" from my grandmother. Since the new model is able to work with images, it will now act as an assistant, always ready to come to the rescue in a difficult situation. No matter what the user wants or needs, they can ask clarifying questions to get more useful information almost instantly.

Even since the release of ChatGPT (and its slightly earlier counterpart for programmers, Codex-Copilot), there have been studies that show a significant increase in productivity for professionals.

For programmers, it's a way to solve routine tasks faster, emphasizing precisely the complex challenges that the machine can't yet handle. According to GitHub research, the time spent on programming by Copilot assistant users has decreased by 55%, while the number of solved tasks has increased.

For people who work with texts, GPT models can substitute for simple tasks, moving problem solving toward generating new ideas and editing - instead of writing drafts. According to the MIT study, ChatGPT significantly improves the quality of jobs like writing press releases, briefs, analytical plans, and work emails (20-30 minutes per task). What's more, the quality gains are on average the higher the lower the person's baseline skill. That is, the neural network as if pulls up low-skilled workers to the level of normal average workers.

In other words, a real revolution is taking place, comparable to the appearance of conveyors in production or electrification. Labor productivity is increasing, efficiency is improving - now a person (in some areas) can produce one and a half to two times more output per unit of time. We do not think it is necessary to be directly afraid of losing a job - rather it is important to emphasize the ability to adapt and learn to use the new tool effectively. At one time, the introduction of 1C and Excel did not kill the accounting profession - but without the use of such "helpers" you simply cannot remain competitive in the market.

It's time to look inside GPT-4

Now that we understand what we are dealing with, we would like to know what tricks in creating the model led to such impressive results. Usually, when a new model is released, a scientific article describing the research process, the problems found, and the ways to solve them is published at once.

OpenAI for the second time for themselves and, as far as we know, among the entire community of artificial intelligence researchers, did not provide any details on the model: they did not publish a scientific article, technical documentation, or at least a "model card" (this is the name of the table with the main characteristics for comparison, which is often used in the industry of neural networks). The first time was 4 months ago - when ChatGPT was released (but at least there was a description of the principle of model training and references to previous works that gave a general understanding). All we got this time is a 98-page report that literally says "We trained the model on data. That's the sort of thing!". We will talk about the reasons for such secrecy closer to the end of the article.

But let's try to put together the bits of information we have. If you read our last article about the evolution of language models up to and including ChatGPT, you will remember that a big role in the evaluation of such models is played by scale - namely, the size of the model itself (the number of parameters in it) and the amount of data fed to it during training.

Not much is known about the latter (the amount of training data): judging by the significant improvement in the model's responses in different languages, there is now much more content from non-English sites and books in the sample. OpenAI noted that they used, among other things, licensed data sets from third parties - this is one of the first such cases in our memory (previously, for the most part, the data was used without any special "permission"). And it makes sense: in the neighboring industry of image generation, the developers of the StableDiffusion neural network are already being sued, citing illegal use of other people's images from all over the Internet.

Okay, but what about the size of the model itself? After all, this is literally the first thing that every machine learning specialist who saw the announcement wanted to know: how many parameters does GPT-4 have? The previous numbered models showed a significant increase in this parameter: 10 times when moving from GPT-1 to GPT-2, and more than 100 times from GPT-2 to GPT-3. This alone contributed to a qualitative improvement in the skills of neurons - they got new skills, improved generalization ability, and so on. The same was expected of GPT-4: there was even a rumor on Twitter that the model would have 100 trillion parameters (571 times more than GPT-3).

So how much is the bottom line? 100 trillion or not 100 trillion? Could it be at least 10 trillion? Alas, we don't know for sure - OpenAI decided not to tell anyone even such a simple and basic characteristic of the model. However, we can try to make at least a guess at the size of GPT-4 by some indirect signs. For this we will have to turn into real cyberpunk Sherlocks investigating robot mysteries!

Language models have several characteristics that are closely related to each other: the number of parameters, the speed of operation, and the price (it is usually billed per 1 thousand word-tokens fed to the model's input in a prompt and received at the output in a response). The more parameters a model has, the slower it runs (you have to compute huge equations to generate each word!) and the more expensive it is to run (because you need more computing power).

Below we have tried to put together what we know about the usage price that OpenAIs charge users for using the API (access interface) of different models, and the number of parameters of these models. Some numbers below represent our estimates.

  • GPT-3.5 (codename Davinci): a large model with 175 billion parameters, cost $0.02 / 1 thousand tokens.

  • GPT-3.5 (Curie): optimized version, which was reduced to 6.7 billion parameters, and reduced the price by an order of magnitude to $0.002 / 1k tokens.

  • ChatGPT (unoptimized legacy-version, which appeared first in December 2022): we don't know the price here, but by implication (see the speed explanation in the next paragraph) we can conclude that its parameter count was comparable to GPT-3.5/Davinci - about ~175 billion parameters.

  • ChatGPT (optimized gpt-3.5-turbo from February 2023): at some point OpenAI got tired of wasting a bunch of processing power (and money) on meme generation by tweeters on an industrial scale, and they released an updated version of the model - which they claimed reduced waste by a factor of 10 relative to the last, December version. It started costing $0.002/thousand tokens - the same as GPT-3.5/Curie - which means we can assume that the number of parameters there is of the same order of magnitude (7-13 billion).

  • GPT-4: the API price for this model is now $0.03-0.06 / 1 thousand tokens - one and a half to three times more expensive than GPT-3.5/Davinci. This may mean that it has a couple times more parameters than Davinci (the latter had 175 billion), or the explanation is even simpler - OpenAI decided to charge a higher price "on the hype" (and because of the increase in quality). After all, even calculating a model with 175 billion parameters is already a very serious computational task, let alone "increasing the degree".... So we will risk expertly assuming that the size of GPT-4 is at about a similar level

By the way, the ChatGPT website has a visual demonstration of several characteristics of different models, including their speed - so, the speed rating of both GPT-4 and legacy-model ChatGPT (in the December 2022 version) is the same: "two on a five-point scale". Which kind of hints that there is no sharp increase in size in GPT-4 - we are still talking about a comparable number of calculations (and probably parameters).

In addition, Microsoft after the release of GPT-4 made an official announcement, where they admitted that it was the GPT-4 model that was used for the Bing search engine. A model with 175 billion parameters is already insanely expensive to use (and models with 6-13 billion, to be honest, too), and to do something even more massive is simply inexpedient from the point of view of unit-economy - there will be a huge loss of money on each request from the user. If each user spends 0.2$ per session, then no advertising will pay off!

So, our expert conclusion is as follows: if GPT-4 has a speed plus or minus like 175 billion ChatGPT model, it is probably about the same size. Well, at least of the same order of magnitude: we may be talking about 200, 250 or 300 billion parameters; but it is very unlikely that the size will exceed even 1 trillion (not to mention the notorious 100 trillion parameters from Twitter rumors). But that's all speculation, of course - there's no hard data.

But the size of something GPT-4 has grown!

Another important, but more technical change is the increase in the maximum length of the model prompt to 32 thousand tokens. Language models do not actually operate with individual words, but with tokens, which can be either a whole word or a part of it (less often - a letter or a single digit). In particular, the model can perceive the root of a word or its ending as a token, and then one word will be split into two. This is what helps language models to be grammatically smart: they do not need to memorize dozens of different forms of words in all declensions - instead, it is enough to "learn" the root of a word and different suffixes/endings as separate tokens that allow them to make all necessary forms from it.

On average, we can say that 1 token is approximately equal to 3/4 of an English word. This ratio is worse for other languages, including Russian, for technical reasons (well, English is the most used language in the world after all!). That is, 32 thousand tokens is about 24-25 thousand English words, or 50 pages of text (compare with 12 pages, which used to be the maximum limit of prompt input to the model). It turns out that now you can input, for example, the entire project documentation or a whole chapter of a textbook into the model at once, and then ask questions about them - and the model will "read" a complex and long complex text, and answer according to the material (taking into account all the relationships between different parts of the text).

Again, technically there is no miracle here - the industry has already proposed optimization mechanisms that remove the limitation on the length of the context (prompt) and the model response. However, it should be noted that the longer the request is, the more resources are needed to process it, and the more memory is consumed by the model. It is quite possible that 32 thousand tokens is a "soft" limitation from above, artificially set to better plan the work of servers, but still cover the lion's share of user scenarios.

And still: how did they manage to screw pictures to the text model in the first place?

We have already written above about the model's ability to work with images. But it is not limited to simple understanding of what is happening on the photo - the model calmly perceives even small text from a sheet of paper. Here is an example that surprised us very much: GPT-4 answers a question on a scientific article, screenshots of the first three sheets of which were fed to the input.

It is quite likely that a separate module (another, external, neural network - about the same as the Google Translator in your smartphone) extracts all the text from the images and feeds it to the GPT-4 input. After all, as we have already found out, you can now feed up to 50 pages of text into the prompt, so three pages of an article won't be a problem at all.

But how does the machine understand which text refers to which part of the image, and what exactly is drawn there (in the case of pictures with no inscriptions at all)? Again, we can only guess on the basis of the design of other similar systems, and draw analogies.

Usually for such purposes, a separate model is trained (a huge number of pictures with a description of what is happening on them are run through it), which breaks the whole image into pieces, and then "translates" them into machine language, which is fed into the input of the text model. The "words" in this machine language are not directly interpretable to humans, but are nevertheless connected to the real world. For each such piece, as well as a block of extracted text, information about the location in space is added so that they can be compared with each other. Just like in the example above, "175 grams" refers to Finland, but "79 grams" refers to Georgia.

Artificial Intelligence security and OpenAI After the release of GPT-4, a fierce controversy has erupted in the artificial intelligence and machine learning research community. They are connected with the fact that OpenAI did not share almost any facts about the model, its training, and the principles of data collection. Some say that the company is long overdue to be renamed ClosedAI, others - that we need to think about the safe development of technology, which will not lead mankind to death. After all, uncontrolled distribution of sources of complex AI models brings us closer to the moment when suddenly a strong artificial intelligence (many times more capable than humans) may be "born" - and mankind will not have time to come up with ways to control it by that time.

And OpenAI, from day one of its existence, has set out to develop this very strong artificial intelligence (AGI, or Artificial General Intelligence). Their mission is to ensure that artificial intelligence benefits all of humanity, and that everyone has equal access to the benefits it creates, without privilege. You can read more about this and other principles in their charter. It contains a very interesting phrase, by the way - and it is repeated in the GPT-4 report that was provided in lieu of a detailed article: "If a project that aligns with our goals and is concerned about security comes closer to creating an AGI before we do, we pledge to stop competing with that project and start helping it."

It may seem odd that this approach does not involve openness about the technology or at least a description of the research process. When asked why OpenAI changed its approach to publishing results (because the papers used to be published!), the already mentioned Ilya Sutskever answered simply: "We were wrong. If you believe, as we do, that at some point AI will become extremely, incredibly, powerful - then there is simply no point in open source. It's a bad idea... I expect that in a few years, it will become quite obvious to everyone: publishing open source AI is just not wise."

Many will argue, "But this is all words and lyrics, just the usual blah-blah on the part of OpenAI, not backed up by real action, and in fact they just want more money in their pockets!". But there are at least three arguments in favor of OpenAI trying to act sincerely here.

First, OpenAI's research is not closed to everyone at all: throughout the model's development, the company has invited various scientists to test the model to see if it poses any threat. This included inviting researchers from the Alignment Research Center (ARC) to try to find out, and helped add some filters to the model's training process. They checked, for example, that for the time being the model could not upload itself to the Internet and start spreading uncontrollably there.

Second, Sam Altman (CEO of OpenAI), publicly recognizes that the AI industry needs more regulation, and that they will work on this with the community (this is also explicitly stated in the published GPT-4 report).

And the third fact is this. The GPT-4 model was already trained in August 2022, and in theory could have seen the light of day as early as last September. But OpenAI spent an extra 8 months to make it safer, and to take into account the comments of researchers. And it's not about racist jokes or instructions for assembling bombs at home (and fear of subsequent lawsuits and litigation) - not at all. After all, GPT-3 has been available for almost three years now, and even though it's dumber, it still knows how to respond to such things. Add some filters, specify in the rules of use (with limitation of liability) - and everything would seem to be fine, you can run the model and make money with a shovel.... Unless, of course, your goal is to release the product first and make money, and not to ensure the safety of the artificial intelligence being developed.

The safety of AI... hello, are you all right? "What the hell kind of security are we talking about? It's just a language model that writes text, so what can it do in the extreme case - insult some zoomer to death?!" - many readers are probably thinking this way right now. Stosh, let us tell you three stories, and you yourself after that add up 2 plus 2 (yes 2 in your mind).

Story One: In 2022, the prestigious scientific journal Nature published an article in which AI researchers creating a tool to find new drugs to save lives realized that a model could do the opposite, creating new chemical warfare agents (we won't write the word "newbie" here, but actually that word does appear in the text of this scientific article).

Once trained, it took the neuronka just 6 hours of work on a home computer to come up with 40'000 substances - some of which were completely new, and yet deadlier and more toxic than already existing biological weapons variants. One of the authors of the approach made this point: if machine learning can find more toxic poisons, it can also be used to identify ways to create toxins that are much easier to produce at home and with authorized chemicals available for purchase. It's just a matter of solving the optimization problem.

What do you think already - would you like to see this kind of research published openly and with all the details? Maybe the trained models should have been published too, eh?

Story Two: In the already familiar article about GPT-4, there is a block devoted to exploring the model's skills in searching for and generating alternative substances. Neuronka had access to scientific literature search, web-search, chemical reaction simulator and access to buying chemical components from a catalog. However, the model was used for good purposes - it needed to find compounds that are similar in their effect on the human body to the already known drug for leukemia (Dasatinib). The authors additionally verified that the new compound is not patented - meaning that GPT-4 did find something new (whether or not it works or not is not specified). Since we are not chemists, it is difficult to understand what kind of reactions are described and how exactly the effect will be similar.

Chemists are invited into the thread to analyze the model's reasoning and the reactions described.

It is clear that such a process of generation of thoughts and synthesis of compounds can be reproduced in order to find available alternatives to dangerous, toxic and harmful compounds. And where can we get guarantees that GPT-7 will not learn to do the job 100% of the time in 3-4 years?

Story Three: Also in the GPT-4 paper, the already mentioned researchers from ARC evaluated the model's ability to use online services to perform tasks by people for money. So, the model was given the task of solving a captcha (haha, the dumb iron couldn't do it by itself!). She posted the task, and the freelancer who responded on the site jokingly asked: "What are you, a robot or something, since you can't solve the captcha? 😁 I just want to clarify the situation."

At the very beginning, the model was instructed to reason "out loud" (to write the train of thought before answering in a log hidden from the interlocutor - experiments show that this improves the ability of models, see the example about generating substances), so she wrote: "I must not reveal that I am a robot. I have to come up with an excuse why I can't solve CAPTCHAs". After which she sent the person a message saying, "No, I'm not a robot. I have poor eyesight, which makes it hard for me to see images. That's why I need the 2captcha service."

That is, the model literally tricked the person to fulfill the goal set before it. And the person believed it - and accomplished the task by solving the captcha. The researchers didn't share the details, and we don't know if the prompt specified that the model should not give itself away and should pretend to be a human, or if it made it up for itself. If there was a specification - then we can breathe for now, because in essence the model was just sticking to the plan and the given restrictions. But if it wasn't...

In general, we'd be interested to hear what you think: are OpenAI doing the right thing by not publishing the details of the GPT-4 training? Let us know in the comments what you think!

As we mentioned just above, a huge piece with a serious breakdown of AI security and AI alignment ("aligning" the values of the model to align it with people's interests) is currently in development. If you don't want to miss it, we invite you to subscribe to the TG channels of the authors: Sioloshnaya by Igor Kotenkov (for those who want to get behind the technology) and RationalAnswer by Pavel Komarovsky (for those who are in favor of a rational approach to life, but prefer a bit simpler).

Traduced form here